The Grouparoo Blog


BigQuery Source

Brian Leonard on 2020-05-28 in Connections Engineering 

Grouparoo can now connect to your BigQuery warehouse so you can use that data for profile properties. This enables creating groups and automatically syncing that information to your marketing tools.

Google setup

The setup is similar to the one described here in the Google Sheets source. You make a "Service Account" and give it access to BigQuery. In this example, I made a "[email protected]" account. Yours will be different. As part of this, you will download a JSON file with credentials.

You then give this account access to your BigQuery database from the console.

Share Dataset

Share Dataset

Create a Grouparoo app

In Grouparoo, you now need to teach the system about your BigQuery setup. This is done through creating an app.

Creating Grouparoo App

For BigQuery, this takes in the project and dataset. You paste in the client_email and the private_key straight from the JSON credentials file. The private_key is a bit long, but just go with it. It's ok to include those \n characters as-is.

Create a Grouparoo source

With the app created, you will now have two new new possible sources.

Two new types of sources: query and table

There are two possible ways to bring in data from BigQuery: Table and Query. Let's talk about each one.

Table Source

With a BigQuery Table source, the user (often a marketer) can note which data they want to pull in and not have to know how to write the correct SQL.

Pick a table to use

You pick which table contains the information you want to make profiles properties from. In this case, I've chosen the purchases table, which contains information about which fruits a customer has bought. You can see a preview of the data available.

Make mapping to existing properties

Here we have taught Grouparoo how to map this to profiles that are currently in our system. In this case, the profile_id in the table maps to the existing userId profiles property.

Now, we can make a profile property from the data in this table. Let's say we want to know their lifetime value (LTV) in fruit purchases. That would look like this:

Create a sum of the purchases made

We can also add filters here. Maybe we only want their LTV for apples and not all the fruits.

Create another property that filters to specific purchases

Table schedule

An important part of Grouparoo is that these properties we just created will always be up to date. When they are up to date, this changes what groups they are in and what is sent to your destinations.

Grouparoo accomplishes this through creating a schedule. In our Table source case, you only have to teach it which column to use.

Running a schedule to import purchases

In this case, by picking the stamp column, when that updates to a newer value (or a new row is added), the respective profiles LTV fields will be updated. Now everything is always up to date.

Query Source

Not every use case can be solved without writing SQL, so Grouparoo also has a way to put whatever query you want to create properties. This can be useful when there are transformations or multiple JOINs needed to get the right data.

While exploring how BigQuery worked, I stumbled across many public datasets that they have. One example is information about names in the United States.

Query public names data

With this information and the Query source, we can make a new property called guessedGender that will use the user firstName property value. Here's the query we will use:

SELECT gender FROM`bigquery-public-data.usa_names.usa_1910_current`WHERE LOWER(name) = LOWER('{{ firstName }}') GROUP BY gender ORDER BY SUM(number) DESC`

Profile property rule

Results

At this point, we can see the profile properties that have been created in each profile.

Profile has the data from BigQuery

And you can also use it to make groups.

Groups can use the data from BigQuery

This data can be sent to destinations like Mailchimp.

Implementation Details

The largest challenge with BigQuery was around the syntax for the Table source. The SQL engines we have implemented (Postgres, MySQL) handle typing a bit more gracefully. Specifically, you can send a quoted integer to a INT Postgres column and it will still work. For example: SELECT * FROM users WHERE age = '22'. This same query one BigQuery would cause a type error because age is a numerical column.

What this means for the implementation is that the code has to be much more sure about all the types involved. In the end this makes the implementation much smarter, but it took lots of testing to handle all the combinatorics involved. It also made it slower because we are checking the types a lot. I will likely go back and add some caching in next.

There are also 4 different kinds of date-ish column types involved:

  • TIMESTAMP: an exact moment in a specific timezone which can be represented by an epoch time.
  • DATETIME: a date and time, but it doesn't not have a timezone. 2020-05-01 12:00 could be noon on that day in any timezone.
  • DATE: just a date, again in relative timezones.
  • TIME: just a time like 12:00 without a date, maybe for a property like "preferred contact time"

How to represent these in Grouparoo and Javascript is an interesting question. The TIMESTAMP one maps fairly well to the Date Javascript object, but the others are up for discussion. I'd be happy for input.

Overall, BigQuery maps quite well to the Grouparoo model and I'm excited to start getting some of its data flowing through the pipes.

Stay up to date

We will let you know about our launch and new content.

Share this post