The Grouparoo Blog


Reverse ETL with dbt and Grouparoo

Tagged in Engineering Product 
By Brian Leonard on 2021-03-31

Teams are centralizing their data in their data warehouse by loading data in and transforming it as necessary. Increasingly, we are seeing teams turn to dbt to do this transforming. The idea is to write *.sql files that, when run in the right order, create useful rollup tables or materialized views of the data.

We've been asked by teams using dbt how Grouparoo can then sync their data to their cloud-based apps. This is what some are calling Reverse ETL. Reverse ETL is taking data from the warehouse and writing it back to line-of-business tools.

Data stack workflow with dbt doing transform and Grouparoo doing reverse ETL sync.

I've created an example project to show how the tools work together.

Approach

A beautiful thing about dbt is its simplicity. All it takes to apply the SQL transforms is to execute dbt run. This can be scheduled to run every so often as a cron job.

While Grouparoo can run as an application to continuously sync and serve its web UI, you can also execute grouparoo run. This will execute everything that needs to sync and exit.

By putting these two commands together, we can transform the data and sync it to the configured tools. In the example project, this means putting ./transform_and_sync in your cron instead of dbt run. The right data ends up in the destionation(s) of our choosing.

That's it!

Example project

The example project has instructions to seed users and purchases into an test warehouse like Postgres. Then it has a few transforms to roll up customer data into a few materialized views. One of these views is who we should sync to a destination, Mailchimp.

The configuration in the Grouparoo project sets up the sync pipeline:

Demo users are in sync with Mailchimp by using Grouparoo.

After running the cron job, all 1000 users are in Mailchimp CRM based on the mappings defined. In this case, they have their first names filled out and are tagged with they are a high value Spanish speaker.

dbt plugin

In order to make this work smoothly, we created a dbt plugin for Grouparoo.

Grouparoo configuration files can be JSON and completely declarative, but they can also be any Javascript code. With this in mind, we created dbt helpers to make Grouparoo smarter about how to connect to the data warehouse.

Instead of repeating the credentials, the app knows how to read them from the various ways that dbt can be set up.

const { dbtProfile } = require("@grouparoo/dbt");

exports.default = async function buildConfig() {
  // fetch warehouse connection details from parent dbt profile
  const { type, options } = await dbtProfile();
  return [
    {
      class: "app",
      id: "warehouse",
      name: "Warehouse",
      type,
      options,
    },
  ];
};

This also includes the ability to automatically change environments based on the dbt setup, so if you have different dev and prod databases or warehouses, Grouparoo will use the right one using the current dbt profile.

We look forward to expanding on this plugin based on how people use it. Let us know in Github or Slack how else you want to integrate.




Get Started with Grouparoo

Start syncing your data with Grouparoo Cloud

Start Free Trial

Or download and try our open source Community edition.