TL;DR: Hyper is much faster at extracting data. Like, so fast.
Not many Tableau releases have had the size and expectations surrounding it like 10.5. The only other release I can think of that had similar expectations was Tableau 10. Version 10 brought goodies with it such as clustering, the widely anticipated cross-database join, new colour palettes and device designer capabilities among other gems. It’s hard to believe Tableau 10 was launched just 18 months ago since most of those features feel like they have been available for a lot longer.
And now, 10.5 is finally here! Last week, Tableau launched the widely anticipated 10.5. The release meant Tableau’s CEO, Adam Selipsky, was making the rounds on TV and online news platforms to promote 10.5. New features like Viz in Tooltip are lighting up Twitter. Linux is also helping Tableau make friends with many IT departments who have asked for the ability to install Tableau Server in Linux machines for years. Some of our consultants who are well-versed in all things Tableau Server have already written a comprehensive post of what you can expect from Linux.
Hyper, Hyper, Hyper
Today, I’d like to talk about Hyper. Yes, the patent pending, in-memory data engine technology designed for faster data ingest and analytical query processing. In other words, the replacement to your TDEs (Tableau Data Extracts).
To show you how it works, we need a dataset to work with. I picked an airline data CSV containing:
- 2.5 million rows
- 49 columns wide
- Mildly complex information, including URLs and non-standard characters, as you can see below:
Hyper Versus TDE
While testing 10.5 last week with the same dataset, I noticed the following:
- 10.4 extract time = 4 Minutes 40 Seconds
- 10.5 extract time = 1 Minute 40 Seconds
If all you care about are extracts, you can stop reading now and there will be no hard feelings between us. Anyone interested in extracts these days will see a huge improvement when upgrading to Hyper. For organisations running clusters for extraction, you may be able to allocate some of your resources elsewhere, all because of the amount of extract time saved.
Above: My airline data in 10.5.
But what happens to disk? Does this mean that Hyper will use a lot more disk space? Not at all! The TDE example above had 667MB compared to Hyper’s 672M. So, rest assured that disk space requirements will remain unchanged. What happens to migration in 10.5? Head here for Tableau’s advice, but these are the main points:
- When you refresh or append a TDE using Tableau Desktop 10.5, the extract is automatically upgraded to Hyper’s format.
- A scheduled refresh on Tableau Server 10.5 will upgrade the extract to Hyper.
- Tableau Desktop 10.5 and Tableau Server 10.5 can read TDE and Hyper extracts. You can open and view workbooks with extracts in either format.
- Tableau Desktop 10.4 cannot open and read a 10.5 workbook or use a Hyper extract.
- As always, upgraded workbooks cannot be opened in previous versions of Tableau Desktop.
Digging Deeper into 10.5
Using Tableau’s built-in recording performance tool, I extracted the airline data mentioned above, resulting in 316 seconds or just over five minutes to run. Take a look at 10.4 below:
For some reason, the performance recorder in 10.5 splits the generated extract query into thirds, resulting in 135 seconds or about 2.5 minutes. I think it’s fair to say that if most customers see their extract times cut in half, they will be very happy indeed. You can see this tool in action in 10.5 below:
I also wanted to know if Hyper had improved upon Tableau’s query capabilities. While we don’t see the same level of improvement in a query time of 63 seconds for 10.4 versus 53 seconds for 10.5, there are still gains to be had.
Above: Performance Summary in 10.4.
Above: Performance Summary in 10.5.
Further improvements can be found in the Performance Recording tool. Just look at the query being sent below. The fact that it's possible to receive the query written like that means all we have to do is test it against our database without having to re-write it.
I have also noticed in 10.4 that it didn’t record the time it took to generate those views, even though it took quite some time to display over two million rows, as expected. In 10.5, it also took some time to display those records, but in this case computing layout is part of the performance recorder. This allows us to dig deeper when analysing workbook performance.
Above: Performance Recording tool in 10.4.
Above: Performance Recording tool in 10.5.
Of course, we shouldn’t be displaying over two million items in our view, because that doesn’t make sense. This does show, however, Tableau’s capabilities and the improvements made to the performance tool.
Circling back to Hyper, I can’t wait to hear stories from our customers on how they have saved time since upgraded to 10.5. Tell us below which feature you are most excited about in Tableau 10.5. Thank you for reading!