Data Sync Apps

A data sync app connects customer data to one or more external systems, keeping them up-to-date.

At least one data sync app is built in every company by the time it reaches the scale to focus on the efficiency of the teams that require this data. For example, your Marketing team can do their job better if they know who has made a purchase. Sales, Customer Success, and Operations have similar needs.

We have researched the approaches and distilled the best practices here.

Data

Query the source of truth
Be incremental
Be idempotent
Batch when possible
Groups matter
Minimize data out
Destinations have nuances
Record the data flow

Organization

Iteration is key
Normalize the workflow
Create a shared language

Data

The first set of tenants are about the data. They focus on ease, reliability, observability, and smart integration to build confidence in the toolset.

Query the source of truth

Every piece of data has a source of truth. For example, a customer’s lifetime value lives in your product database or data warehouse in the purchases table. You should use that table when calculating the value.

Specifically, it’s important to not rely on querying artifacts that are a side effect. You might also have an event stream of clicks on the Purchase button. Don’t use this.

Querying the source of truth gives your more flexibility to change the definition. Next month, you might want to subtract returns from that lifetime value number. It will evolve with the product because it’s the primary artifact. You will also be able to use the history when you get a new idea that wasn’t part of those early events.

Be incremental

It’s untenable to process every user every time. Intelligently monitoring the source of truth for changes will allow your App to grow with the company while maintaining near real-time service level agreements (SLAs).

The most tried and true approach is for critical tables in your database to have a timestamp indicating the last time it has changed. You can use these values to maintain a high-water mark to use the next time to calculate a change set.

There are other event-driven approaches like subscribing to a message bus system or change data capture (CDC) approaches.

In all cases, the goal is to reliably know which records have changed (or may have changed), so you can query the source of truth to get the new values.

Be idempotent

It is tempting to take the incremental approach when calculating Property values. For example, you might receive an event that the user has bought a $30 item. You could simply add $30 to the previous lifetime value. Resist this temptation.

The source of the truth can answer this question idempotently. This means you don’t need to rely on previous information. Always doing that query allows you to have reliable data. The benefit starts with the full history from the beginning and continues through to flexibility to make changes later.

You will also be able to handle adjacent triggers from your incremental change sets. For example, maybe you have discovered a new purchase was made. You’ll want to recalculate that lifetime value. But a new purchase means you might also need to recalculate the user’s current address, their number of purchases, and their favorite products. Don’t be scared to make those queries. Reliable and consistent data is worth it.

Batch when possible

You should be willing to query a lot in order to get correct data, but batching multiple users together allows you to offset that approach and use resources efficiently.

Not all data needs to be synced in real-time. For that matter, “real-time” might mean something different for each piece of data. For example, it is likely acceptable to update a customer’s lifetime value in Salesforce within 30 minutes of the purchase.

In this case, you can discover new purchases with that frequency to determine the users that need to be updated. Then, you can fetch all of those users’ lifetime value at once with a query like this: SELECT userId, SUM(price) FROM purchases WHERE userId IN (long, list, of, user, ids) GROUP BY userId;

Groups matter

Cohorts (or audiences, or groups, or segments) are a key outcome of data sync, particularly for marketing tools. A customer’s membership in a Group like “High Value” will determine how the company interacts with them. It is valuable to share this definition, even as it changes, across multiple apps to create a consistent customer experience.

As it is calculating the Property values, a data sync App should also read from shared definitions and calculate these Group memberships. It can then synchronize the memberships to multiple tools. For example, a “High Value” user might be on a special marketing mailing list and also have their tickets routed more quickly in customer support.

Minimize data out

In many cases, the specific Property value for a user is not relevant; however, that value is a factor in cohort creation. In the lifetime value example, it is unlikely that an email tool literally needs the number of dollars spent; however, bucketing users into groups is critically important.

At the same time, personally identifiable information (PII) can be dangerous. Policies like GDPR, CCPA, and others make it important to limit the PII flowing out of your internal system.

The opportunity for your data sync App is to minimize what is sent to external tools in the first place. If you are calculating a Group called “High Value California Customers” for use in a marketing campaign, only sync over a user’s membership in that Group. Keep their lifetime value and their address within your environment.

Destinations have nuances

Every Destination service is unique. For sanity, you will want to standardize how you interact with each one, but there will always be deviations.

The Mapping difference is how to project your customers and Group into that system. For Mailchimp, this means contacts with merge vars that have tags. Marketo has users with fields on lists. Salesforce has many customized options. Research the available APIs and decide on how the data gets mapped.

APIs have differences around data formatting. For example, datetimes are often formatted in different ways. Intercom sends seconds since the epoch. Pipedrive uses the full ISO value. HubSpot requires epoch milliseconds at midnight UTC.

Finally, each service has its own approach towards rate limiting. Understand the limits in order to make best use of your allotment. Iterable has 500 calls per second. Pardot gives you 25,000 per day. Sailthru varies depending on the endpoint.

Record the data flow

Log the values, the time, and the Destination of all data leaving your system. This serves many purposes across observability, compliance, and efficiency.

It is inevitable that you will be asked something along the lines of, "Why did I receive this push notification yesterday at noon? "Everyone will be perplexed because only customers without a purchase should have been sent that notification and this user clearly has a purchase. A Record of the data flow will show you that they moved out of that cohort just after noon. Similar questions are often asked around PII compliance and the log will help.

You will also be able to use the Record to be more efficient with your Destination systems. For example, if the data this Destination is to receive does not change, you can skip the API call and use your rate limit more wisely.

Organization

No app stands alone. The goal is to create value for the whole organization. These last few tenants help achieve the promise of a well-built data sync app.

Iteration is key

Growth engineers like to run tests. Accordingly, optimizing the product is about how many times you can go around the build-learn-iterate loop. The same is true in Marketing, other teams, and the company as a whole.

Because fast iteration leads to business success, the organizational value of a data sync app is a function of how quickly it enables you to respond to new data requests. Change is inevitable, so make them zero point stories.

The best way to make this a reality is to treat your data model as configuration for a sync engine. Have a layered approach where you define the “what” and keep it separate from the “how.” In the day-to-day, this enables you to focus on the data definitions and not be consistently debugging the pipes that move things around.

Normalize the workflow

This kind of app is different from your normal product in many ways, right? Look at all these rules!

The best results come from treating this app just like all the other product-focused codebases. Manage data sync just like you would any other part of your stack. You test the configuration, check it into git, run it on CI, review, merge, and deploy.

When you make the workflow familiar, you make it approachable to everyone on the team. Success looks like an engineer responding to curiosity to a Marketing request instead of avoiding the conversation. When they say, “Sure, no problem” to data enablement, the whole company wins.

Create a shared language

Often, struggles around data sync are because the stakeholders are far apart in how they are talking about it. Marketers speak about their email automation use cases. Engineers discuss the database and have never actually seen the email tool.

Creating a robust data sync app is the best way to bridge the gap. It’s a stop-over on this data journey that we can use to discuss requirements concretely. Marketers understand how the data properties in the App end up in their tool. Engineers understand how queries create these properties. Now, both can focus on the correct data definition and not worry always be worrying about the mechanism.

The data rules are designed to get to you to this point. They focus on ease, reliability, observability, and smart integration in order to build this confidence across the whole organization.

Get Started with Grouparoo

Start syncing your data with Grouparoo Cloud

Start Free Trial

Or download and try our open source Community edition.