Data Flow

Last Updated: 2021-06-14

The Grouparoo Application's main job is to collect data from Sources, store that data to build a robust Records, build Groups of Records, and finally syndicate this data with Destinations, your other marketing tools.

This document is place to centrally describe how all of this works!

Core Data Models

Grouparoo Data Models

Records and Record Properties

The main object in Grouparoo is a Record. A Record by itself is simply an id and the time it was created and last updated. It is a "join table" to collect many Record Properties, which act as a key-value store for the data about Records. For example, a common Record Property would be key='email', value='evan@grouparoo.com', type='string'. We also store when each Record Property was created and updated.

Groups

Groups are a collection of Records.

There are 2 types of Groups: "Manual" and "Calculated".

  • Manual Groups contain records that a Grouparoo user has added to the Group. The membership of Manual Groups does not change.
  • Calculated Groups are constantly recalculated against Record Properties. These dynamic lists of Records are how you build up cohorts that change over time. You may have a Group of Abandoned Carts Last Week that checks a Record Property of "last_abandoned_card_date" and makes sure it was within the last 7 days. You might also choose to segment your customers by the language they speak with a "Locale" Record Property.

Records can be in many groups at once.

Apps

Apps are how you define connections to your tools like your databases, email vendors, CRMs, etc. With each App you tell Grouparoo how to connect so we can import and export your data.

Properties

Properties describe each Record Property and how they work. It's a bit like a "Schema" for the Records:

  • What Record Properties exist? You cannot just add any Record Property to a Record. You need to define it first.
  • Is this Record Property unique? If it is, Grouparoo will make sure that no other Record can have the same value for that Record Property. Common examples would be email or user_id.
  • What is the type of this Record Property? A number, a string? We actually stringify all of our Record Property data for searching, but wen we render it back out again, we convert it. This also helps us know how to build search queries for Calculated Groups.
  • How is this Record Property to be retrieved? By which App? We talk more about this below.

Every Record Property also includes a reference to an App (perhaps "MySQL-Datawarehouse") and a statement/query/option of how to get it for each Record. Grouparoo will always be keeping your data in sync so we always need to know how to retrieve a Record's Record Properties.

  • For an App "MySQL-Datawarehouse" which is linked to the Property for "First Name (string)" you might provide a SQL statement like select first_name from users where users.id={{ user_id }}.
  • You can also have more complex queries, like you could connect the App "MySQL-Datawarehouse" to the Property for "ltv (number)" which might do select (sum(purchases.total) - sum(refunds.total)) as ltv from users join purchases on purchases.user_id = users.id join refunds.user_id = users.id where users.id={{ user_id }}
  • For a an App CSV responsible for "VIP (boolean)" you might choose "column" from a dropdown indicating that any column labeled "VIP" should be allowed to set this Record Property

Data into Grouparoo

Sources, Schedules, and Runs

When you add an App to Grouparoo, you can also configure if you want Grouparoo to periodically check that App for new data. This might be accomplished by checking a database table for new/updated rows or asking an API for recent changes. This is called a Schedule. You can create many Schedules from a single App, against each Source. An App can have many Sources, often correlating to a logical collection in the App, e.g. Tables in a Database.

Perhaps the App "MySQL-Datawarehouse" has 4 Schedules:

  1. "MySQL-Datawarehouse:users". This Schedule checks the "users" table every 10 minutes for new or updated records
  2. "MySQL-Datawarehouse:carts". This Schedule checks the "carts" table every hour for only new carts
  3. "MySQL-Datawarehouse:purchase". This Schedule checks the "purchase" table every hour for only new purchases.
  4. "MySQL-Datawarehouse:refunds". This Schedule checks the "refunds" table every hour for only new refunds

Each App works a little differently, but you will be asked what to check for and how often. Each instance of a periodic check is called a Run. When new data is found, the Run will produce Imports.

Imports

Imports are created when a Schedules' Run finds new or updated data. Either way, an Import tells Grouparoo that something has changed that we should take note of.

Most Imports trigger the update of a Record. An Import can be as simple as {user_id: 3, firstName: 'Evan'}. This tells us that we should update the Record we have for user #3, and if we don't have a Record for User #3, we should create one.

We know that the Import's data we get should be saved to the related Record if:

  1. The Import's payload contains a unique Record Property as defined by the Properties
  2. The other data key-value pairs are also defined by the Properties.

Imports are stored in Grouparoo for later use. You can choose to delete old Imports after a period of time.

Record Hydration

Any time an Import is recorded by Grouparoo, and that Import can be linked to a Record, Grouparoo will fully update that Record, checking all the Apps that are connected to it via Record Properties and asking for new data. You can see which Record Properties should be updated by checking if they are in the pending state or not. In this way, you can be sure that your Records are always up to date, and we can recover quickly from any new or missed data. We call this step "Record Hydration".

This means that if an Import comes in from the "MySQL-Datawarehouse:purchase" Import Schedules, Grouparoo will automatically check all Record Properties, including "ltv", "first_name" and everything else.

Data out of Grouparoo

Destinations

When configuring your Destinations, you'll be asked which Groups and Record Properties you want to send (or all of them). You may want to sync "email" and "first_name" to your email tool for everyone in the Group "lapsed customers" and "USA Customers" so you can send them a coupon as part of your re-engagement campaign. Oh, and the Group "lapsed customers" is Dynamic, based on LTV>0, and last_purchase at least 2 months ago, while "USA Customers" is Dynamic where locale=en-us... all of which Grouparoo is keeping up-to-date for you.

To accomplish this you, would create an Destination for the App "Mailchimp", toggle on the "lapsed-customers" and "USA Customers" groups, and finally the requested Record Properties of "email" and "first_name".

When Grouparoo sends data via a Destination, Exports are created.

Exports

The Grouparoo App for Mailchimp will handle using the Mailchimp API to send the Export for you. Exports keep track of the exact data which was sent to the Destination. Grouparoo will try to send data in bulk when possible, handle retries, and skip any updates that won't result in any new information being sent to the Destination.

You can choose to delete old Exports after a period of time.

Plugins and the Grouparoo Marketplace

Plugins are Grouparoo's way of providing more Apps to Grouparoo, allowing you to import and export data from more and more sources and destinations.