The Grouparoo Blog
Grouparoo can now connect to your BigQuery warehouse so you can use that data for profile properties. This enables creating groups and automatically syncing that information to your marketing tools.
Google setup
The setup is similar to the one described here in the Google Sheets source. You make a "Service Account" and give it access to BigQuery. In this example, I made a "grouparoo-db@sample-sources.iam.gserviceaccount.com" account. Yours will be different. As part of this, you will download a JSON file with credentials.
You then give this account access to your BigQuery database from the console.


Create a Grouparoo app
In Grouparoo, you now need to teach the system about your BigQuery setup. This is done through creating an app.

For BigQuery, this takes in the project and dataset. You paste in the client_email
and the private_key
straight from the JSON credentials file. The private_key
is a bit long, but just go with it. It's ok to include those \n
characters as-is.
Create a Grouparoo source
With the app created, you will now have two new new possible sources.

There are two possible ways to bring in data from BigQuery: Table and Query. Let's talk about each one.
Table Source
With a BigQuery Table source, the user (often a marketer) can note which data they want to pull in and not have to know how to write the correct SQL.

You pick which table contains the information you want to make profiles properties from. In this case, I've chosen the purchases
table, which contains information about which fruits a customer has bought. You can see a preview of the data available.

Here we have taught Grouparoo how to map this to profiles that are currently in our system. In this case, the profile_id
in the table maps to the existing userId
profiles property.
Now, we can make a profile property from the data in this table. Let's say we want to know their lifetime value (LTV) in fruit purchases. That would look like this:

We can also add filters here. Maybe we only want their LTV for apples and not all the fruits.

Table schedule
An important part of Grouparoo is that these properties we just created will always be up to date. When they are up to date, this changes what groups they are in and what is sent to your destinations.
Grouparoo accomplishes this through creating a schedule. In our Table source case, you only have to teach it which column to use.

In this case, by picking the stamp
column, when that updates to a newer value (or a new row is added), the respective profiles LTV
fields will be updated. Now everything is always up to date.
Query Source
Not every use case can be solved without writing SQL, so Grouparoo also has a way to put whatever query you want to create properties. This can be useful when there are transformations or multiple JOIN
s needed to get the right data.
While exploring how BigQuery worked, I stumbled across many public datasets that they have. One example is information about names in the United States.

With this information and the Query source, we can make a new property called guessedGender
that will use the user firstName
property value. Here's the query we will use:
SELECT gender FROM`bigquery-public-data.usa_names.usa_1910_current`WHERE LOWER(name) = LOWER('{{ firstName }}') GROUP BY gender ORDER BY SUM(number) DESC`

Results
At this point, we can see the profile properties that have been created in each profile.

And you can also use it to make groups.

This data can be sent to destinations like Mailchimp.
Implementation Details
The largest challenge with BigQuery was around the syntax for the Table source. The SQL engines we have implemented (Postgres, MySQL) handle typing a bit more gracefully. Specifically, you can send a quoted integer to a INT
Postgres column and it will still work. For example: SELECT * FROM users WHERE age = '22'
. This same query one BigQuery would cause a type error because age
is a numerical column.
What this means for the implementation is that the code has to be much more sure about all the types involved. In the end this makes the implementation much smarter, but it took lots of testing to handle all the combinatorics involved. It also made it slower because we are checking the types a lot. I will likely go back and add some caching in next.
There are also 4 different kinds of date-ish column types involved:
TIMESTAMP
: an exact moment in a specific timezone which can be represented by an epoch time.DATETIME
: a date and time, but it doesn't not have a timezone.2020-05-01 12:00
could be noon on that day in any timezone.DATE
: just a date, again in relative timezones.TIME
: just a time like12:00
without a date, maybe for a property like "preferred contact time"
How to represent these in Grouparoo and Javascript is an interesting question. The TIMESTAMP
one maps fairly well to the Date
Javascript object, but the others are up for discussion. I'd be happy for input.
Overall, BigQuery maps quite well to the Grouparoo model and I'm excited to start getting some of its data flowing through the pipes.
Tagged in Connections Engineering
See all of Brian Leonard's posts.
Brian is the CEO and co-founder of Grouparoo, an open source data framework that easily connects your data to business tools. Brian is a leader and technologist who enjoys hanging out with his family, traveling, learning new things, and building software that makes people's lives easier.
Learn more about Brian @ https://www.linkedin.com/in/brianl429