To Extract or Not to Extract? - 7 Guidelines for Tableau Data Extracts
Fine-tuning the performance of your Tableau Server environment can seem at times to be more alchemy than science, particularly with the ever-changing diversity of the needs and requirements of your user base. That is little consolation for an administrator tasked with ensuring an efficiently running system at peak usage.
There are things you can do that can help. One of those is determining how you can best access your data. Should you connect live? Or, should you use a Tableau Data Extract (*.tde)?
As always, Tableau is only as fast as your data source.
Obviously, there are no hard and fast rules on exactly how to set up an enterprise deployment. Requirements vary considerably from system to system. Is the data large and/or wide? How many users are there? How many of those users will need top performance at the same time?
Even so, there are valuable questions that you can ask to help determine whether using a data extract is the most efficient solution for your requirements and how you can get the most from an extract.
There is a little bit of a balancing act in regards to your hardware requirements depending on how you access your data. A live connection does not require a lot of storage space within a Tableau deployment, but you’re relying on the speed of that data source not to be a bottleneck for performance.
On the other hand, data extracts require storage, potentially a great deal of storage depending on the specific characteristics of your data. In addition, you’ll also require faster disks for processing within your Tableau Server environment.
You can load test several different scenarios to determine which initial configuration best suits your performance requirements and cost parameters.
2. Optimizing Your Extract
When you create a data extract, one of the options you can select is to optimize your extract. This can assist the extract in performing more quickly. Optimizing means that Tableau will look at your data for any calculated fields and then pre-calculate those in the data extract. This improves performance because each calculation does not need to be re-computed locally each time that field is accessed.
If your data source has calculated fields, particularly complex calculations or a wide usage, then you should test to see if an optimized data extract can improve performance over a live connection.
3. Data Source Type
If your data source is a large text file or Excel file, you will find immediate benefits from creating a data extract. This is because the queries running in those files are slow. There’s no amount of optimization in your deployment that can fix the performance inadequacies in these types of files.
The reason you’ll find that performance has improved is that text and Excel data are extracted from the source files and loaded into the Tableau high-performance columnar data engine, specifically designed for visual analytics.
As you extract data into a Tableau Data Extract, you can actually begin the process of preparing your data for analysis. In the Extract options under Data > Extract, you’ll be prompted to add Filters to your data extract.
While the data source may be very large, you may not need to entire dataset to complete the analysis required. Filters can help pare down a very large data source into only the essential records, thus creating a streamlined data sub-set.
Smaller data extracts require less computing power.
5. Portable Data
Using a Tableau Data Extract also has the added benefit of portability. Data extracts can provide offline access to your data if they reside on your computer. By combining your extract with Tableau Desktop, you have a self-contained analytics workhorse.
Even if there is no performance improvement in using a data extract, it still may present value in allowing your end users offline access by creating ad hoc extracts to facilitate their travel or special circumstances.
6. Incremental Extracts
If you have decided to use a data extract, you have two options for refreshing the data. The first is a full extract, which rewrites the existing data extract in the Tableau Data Engine with a new file from the data source. The other option, though, could be more efficient depending on your configuration.
Incremental extracts append new records that have been added since the last extract was created. This can be particularly useful if your data extract must be refreshed daily, for instance. You can do this by selecting the Incremental Refresh checkbox in the Data Extract dialog box.
7. Adding a File to an Extract
Finally, the last tip on how to leverage a data extract for efficiency and performance is how to add rows from a separate file. This allows you to append data from a different data source into your data extract.
For instance, you might have a legacy data source with historical information on sales or inventory, but the new performance is located in a different data source. Tableau can facilitate this situation with the Add Data From File command. You’ll need to make sure that your fields marry appropriately across the two files.
The real skill in building a Tableau environment that runs efficiently and smoothly is knowing the types of questions to ask. With the right questions in mind, you can begin to test and prepare your deployment for optimal performance.
If you have questions for us, we’d love to help.