Blending data sources in Tableau has a steep, although quick, learning curve. It can be easily frustrating to get everything to work, so much so that a user may opt to push all data joins to the database.
Also, there are cases (many) where data sources simply cannot exist in the same database:
- Leveraging external or public data within Tableau requires blending.
- Your data exists in multiple machines, and you haven’t yet an ETL process or application in place.
Notwithstanding the above examples, most Tableau gurus would agree to do all or most of the data modelling in the database or ETL layer before connecting to Tableau.
However, here are two cases where blending in Tableau may be the best choice for combining your data:
1. Comparing Aggregated Measures in a One-to-Many Relationship
Translation: The Sales vs. Quota problem
Sales figures exist in one table. Quotas exist in another table. I’d like to see how the sum of all of my sales are performing against my monthly quotas.
Easy, right? Just join up the tables on the fields [Salesperson] and the date fields (Month and Year only).
To keep the flexibility in my Tableau worksheets and any drill-down capabilities, start the join from the Sales table in order to keep all of the Order IDs intact. The join (left in this case) will return a quota amount for every Order ID that is the total quota for the month. No big deal, until Tableau wants to look at it:
Tableau’s default aggregation level is SUM. Simply put, Tableau will want to add all of the rows’ quotas up, resulting in completely bloated numbers.
It is worth noting that we could change the aggregation on Quota to MIN or ATTR in order to retain the correct monthly quota per Salesperson. However, if we want to look at quotas aggregated up to Salesperson instead of Month, or aggregate quotas up to a Year, we’ll have to employ some table calculations that would make our views far less flexible and chart types less varied.
We need to join our tables but not duplicate the monthly quota for every Order ID. BLENDING gives us the solution.
If we connect to the Sales and Quotas tables separately, and then BLEND on the Salesperson, Month(Date) and Year(Date) fields, we can retain the correct values from both tables and still retain the flexibility of Tableau’s drag-and-drop viz.
Expanded out to include more Salespeople, see it in viz action:
Notice that both green pills are aggregated at the default: SUM. This makes everything SO much easier for any further chart making.
For clarification: Why is it necessary to blend on Salesperson, Month, and Year?
In the quotas table, notice that each amount represents a Salesperson's quota per a single Month of a single Year. Blending on the three fields essentially makes sure we are aggregating the quotas table only to the level of Salesperson by Month by Year. If we left out the Month and Year relationship, the Blend would result in the quota being a sum of all quota amounts per Salesperson and won't partition it by Month or Year.
2. Creating Fast and Easy Look-Up or Dimensional Tables for One-to-One Relationships
Translation: I can’t access the database itself to make changes, but I need to make changes.
Regardless of what data sources to which our Tableau worksheets are connected, we can leverage small and simple Excel worksheets to help further dimensional-ize our data.
To expand on the previous example, I need to assign each of my Salespeople to a Sales Region. However, I have no access to add dimensional tables to the database (or I don’t have three weeks to wait for a submitted IT ticket). Also, I need the logic to be more flexible and easy to change than a Tableau Group.
The solution is simple: BLEND.
We can create and save a very simple Excel worksheet with the names of our Salespeople and then assign them to Regions there. Then, we simply connect to this sheet with Tableau and quickly blend to our other data sources.
Since we are creating a new one-to-one relationship, none of our aggregated measures are affected by the blend. As long as we keep the Lookup table simple, and we use it as a secondary source, everything is easy and quick.
No database access necessary!
Also, since the aggregated measures aren’t affected, we can blend our Regions table into our previous Sales/Quotas example easily.
Usually a buzz-list posted on the internet would have at least five items. There aren’t a ton of reasons where blending in Tableau would be preferred over moving data modelling and logic to the database or ETL layer. However, the two examples discussed are the two definitive (in my experience) cases where blending is actually the preferred method of solution.
For the second example, in its current simplicity, a Group in Tableau could have the same result. However, due to the static nature (as of version 8.3) and difficulty in managing them from the UI, I avoid using them in most, if not all, cases.
For a walkthrough on how to blend data sources in Tableau, view this and the other great videos on Tableau’s website: http://www.tableau.com/learn/tutorials/on-demand/data-blending-0
To play along with the workbook, and to view the data relationships and the blend, download the attached workbook.