It seems like yesterday Tableau 10.2 came out with a basket of new features which are proving really useful, shapefiles being my favourite. See it in action here. Early this week however, Tableau announced the release of Tableau 10.3 beta, and as always, I wanted to see what new things await us. One of the features that got one of the biggest responses in Austin last year was the PDF converter, and it’s now time to fully test it out.
I downloaded 10.3 beta from this website after signing up to take part in the beta program, the first feature I wanted to test was how well would it cope with a large PDF table. I have this PDF with two pages of data for 2016 property taxes which would take me forever to type in and I’d probably make a ton of mistakes. Even copying and pasting in Excel doesn’t seem to work that well because of the format that it ports across. So, this seemed like the perfect use case to road test the new Tableau PDF Converter in 10.3.
Here’s a sample of the PDF table below. You can see some merged cells and titles spilling over into three rows that I thought would be a nuisance to make sense of:
Alas that was not the case. Follow along below. The first thing I did was to connect to the PDF using the new connector:
The next screen allows us to define a specific page to be scanned or a range for instance from eight to 10 only. In my case, I wanted all of my pages to be scanned:
As I connected, I had a look at the data structure, but it didn’t look too auspicious. I had quite a few NULLS and my headers looked all over the place:
AH! But as most of you will know, Tableau has a great Data Interpreter that can understand headers pretty well, so I switched that on and voila ...
As if by magic, my headers are now all tidied up and I can’t see any NULLS either. But there is still data on Page 2 that I need to bring in. Tableau split the data per pages in the same way it would split tabs in Excel. Time to use another Tableau 10 feature: UNIONS. I drag my Table 2 underneath Page 1 in the connection area and UNION the two tables together.
My data is now imported from a PDF and looking good. I have checked all headers and the majority are correct with exception to the merged cells for Reduction Factor and Effective Rate. However, I can easily rename those and off we go.
I imagine the work that the Tableau devs had to go through to make it this simple. It’s an incredibly useful feature to be used with open data or old documents that we thought we could never get the data from. With a few simple clicks, Tableau is able to extract the data and make it ready for analysis.
Thank you for reading, I’m off to check out what else is available in this new 10.3 beta version.