The Grouparoo Blog

Development workflow for Reverse ETL

Tagged in Product Engineering
By Brian Leonard on 2021-07-19

Update (January 2022)

The Grouparoo community is continually working to improve the developer experience for Reverse ETL. Here's our guide to Getting Started with Grouparoo to lead you through installation, configuration, running, and deploying projects. Grouparoo's recommend way to configure the application is through UI Config.

An important enhancement to the workflow is the addition of Models. Models fit in the process between the steps of setting up Apps and choosing Sources. They provide a way for developers to define the shape of the data to be extracted from the Source, depending on the type of record you are working with. Using Models allows developers to use only the properties needed for the data transfer in order to keep Records clean.

Additionally, we are now proud to share our cloud offering! Grouparoo Cloud helps make deployments quick and easy, optimizes Infrastructure for your Grouparoo instance, and provides Github integration for Continuous Deployment and Testing. Learn more about running Grouparoo in the cloud here.

Workflow Overview

An important trend in data engineering is the move towards the software development workflow that is the norm for more product-focused teams. This includes aspects such as config-as-code checked into git, tests and continuous integration, code reviews, staging environments, and production deployments.

Building on our config-as-code approach, the Grouparoo 0.5 release now makes this workflow even easier by enabling the UI to generate this config. The UI adds much more confidence to the process by:

Introspecting your source tables/columns and your destination fields
Verifying credentials and previewing data
Working with sample profiles to iterate on nuances
Automatically writing configuration files to check into source control

Take a look at how it works.

Evolution

Over the last few years, data teams have invested a lot of resources into making their data warehouse the source of truth for their customer data. They've generated reports and insights that has helped their business make better decisions.

In this environment, it was not always necessary to go through the whole software development workflow. Many things were ad hoc anyway. Analysts were not always collaborating. The worst-case scenario was an incorrect report. That's not great, but it's also not the end of the world.

As this data got more and more useful, other teams wanted to be able to put it into action. In response to that need, we work with companies that are looking to operationalize their data warehouse. Data teams are calling this Reverse ETL because it is writing data back to the tools the business uses.

These integrations create more leverage, but the worst case scenario has escalated. A Marketo data integration could send a million emails to the wrong people. A more rigorous workflow is needed.

Grouparoo Process

In Grouparoo, our JSON-based config files map the data from your source data warehouse to destinations like our Salesforce data integrations, Zendesk integration, and others.

The new UI developer tool makes it easy to create and iterate on this pipeline.

Enter your credentials or use environment variables to connect to your warehouse. Select the tables and columns where your customer data lives. If you need it, you can use aggregations from many tables or write your own custom SQL.

When mapping to a destination, the UI pre-fills tool-specific details. For example, a Mailchimp integration will prompt you to pick the Audience List from your account. It will then know all the Merge Vars so you can map to them. A Salesforce integration will know all about your Objects.

Many companies have millions of customers they want to sync. We have designed the developer tooling, though, to work on just a few sample profiles at a time. This allows you to gain confidence while configuring the pipeline. Use them to preview how those million will work when it goes to production.

To gain more confidence, you can also write tests to lock in the behaviors. Run these tests on CI to make sure everything is working as expected. Don't worry, it won't actually change your destinations. We have seen people use either seeded, staging, or production data in these tests.

All of this results in a changes to your configuration that is reflected in the JSON files and tests. Let's make a Pull Request and get it reviewed. When deployed to staging or production, the configuration goes into effect and Grouparoo automatically syncs the right data.

These screenshots are from the Snowflake to Salesforce example app. You can also check out another example of our developer tooling with less requirements in our local data example.

Written by Brian Leonard on 2021-07-19
Tagged in Product Engineering
See all of Brian Leonard's posts.

Brian is the CEO and co-founder of Grouparoo, an open source data framework that easily connects your data to business tools. Brian is a leader and technologist who enjoys hanging out with his family, traveling, learning new things, and building software that makes people's lives easier.

Learn more about Brian @ https://www.linkedin.com/in/brianl429

Share this post

Get Started with Grouparoo

Start syncing your data with Grouparoo Cloud

Start Free Trial

Or download and try our open source Community edition.