Introducing the Python Tableau Data Extract API with a CSV Extract Example

Data

Introducing the Python Tableau Data Extract API with a CSV Extract Example

by Brian Bickell
//

After the amazing show Tableau Software put on for us at this year’s Tableau Customer Conference, the BI community was abuzz with excitement about the upcoming Tableau 8. Fortunately, we weren’t kept waiting long. Beta access is widely available at this point. My colleagues have already written about some of the great features in the version 8 elsewhere on our blog. In this post I’m going to focus on a powerful but lesser publicized feature in Tableau 8: the Tableau Data Extract API.

I’d be remiss if I didn’t admit that I’m a little bit late to the game here. I didn’t even realize the Tableau Data Extract API was available to Tableau 8 beta users until I saw Ryan Robitaille’s examples here. So, clearly I have some catching up to do.

Anyway, on with it.

The Tableau Data Extract API does exactly what it sounds like it does – it allows users to create extracts from arbitrary data sources outside of Tableau Desktop. As an interesting aside, it doesn’t even require that Tableau Desktop or Server be installed on the same machine. Everything you need to get to work building extracts is contained in a zip file on the Tableau beta download site. The TDE API is currently available in Python, C, C++ and Java flavors. Each has a 32bit and a 64bit version. I’ve elected to use Python while working with the TDE API because it’s the darling of the data analysis arena and pretty good at getting things done. I’m sure things look quite similar in the other languages if you want to go that route.

I didn’t manage to find any documentation for the API so I’m working from what I could figure out from the comments in the Base.py file in the API package and the examples that Ryan posted.

Generally speaking the usage of the API looks like this:

  1. Import API modules (as tde in this example)
  2. Create an Extract object with tdeFile = tde.Extract(tdeFileName)
  3. Create a Table Definition object with tdeTableDef = tde.TableDefinition()
  4. Add column definitions to the Table Definition object with tdeTableDef.addColumn()
  5. Add a Table to the Extract with tdeTable = tdeFile.addTable(“Extract”,tdeTableDef)
  6. Create rows with tdeRow = tde.Row(tdeTableDef)
  7. Add column values to row with row.set(colnum, value) ex: row.setInteger(c, v)
  8. Insert row into tableu with tdeTable.insert(tdeRow)
  9. Close row with tdeRow.close()            
  10. After all rows are inserted close the new TDE with tdeFile.close()

Let’s look at an example putting this together to tackle the task of importing from a CSV.

###############################################################################
## 
## csvload.py
##
## Example import from csv to tde using Tableau v8 tde api
## Schema is SuperStore Orders sample data.
##
## USAGE: python csvload.py SuperStoreOrders.csv
##
## Brian Bickell / @brianbickell / InterWorks / 12.06.2012
##
###############################################################################

from sys import argv
import os, csv, datetime, time
import dataextract as tde 

## Functions

# This function makes adding the columns to each row in the extract a bit easier.
def add_tde_col(colnum, row, val, t):
    # Date format used below
    dateformat = '%m/%d/%Y'    

    if t == tdeTypes['INTEGER']:
        try: 
            convert = int(val)
            row.setInteger(colnum, convert)
        except ValueError:
            #if we bomb the cast then we just add a null 
            row.setNull(colnum)
            
    elif t == tdeTypes['DOUBLE']:
        try: 
            convert = float(val)
            row.setDouble(colnum, convert)
        except ValueError:                        
            row.setNull(colnum)

    elif t == tdeTypes['BOOLEAN']:
        try: 
            convert = int(val)
            if convert > -1 and convert ' % (index, k, tdeTypes.keys()[tdeTypes.values().index(v)])
        tdeTableDef.addColumn(k, v)

# Add table to extract
tdeTable = tdeFile.addTable("Extract",tdeTableDef)

print 'Writing records to %s' % (tdeFileName)

# iterate through rows and columns of csv -> add to tde
rownum = 0
for row in reader:
    if rownum == 0:
        header = row
    else:
        colnum = 0
        tdeRow = tde.Row(tdeTableDef)
        for col in row:
            if colnum+1 > len(csvSchema): # don't blow up if you don't take every column
                break
            add_tde_col(colnum, tdeRow, row[colnum], csvSchema[colnum].values()[0]) # the magic
            colnum += 1
        tdeTable.insert(tdeRow)
        tdeRow.close()        
    rownum += 1

print '%i rows added in total in %f seconds' % (rownum-1, time.clock()-startTime)

tdeFile.close()

csvFile.close()

In the above example we’re taking the name of a CSV as an argument and loading into a CSV reader object, setting up our TDE bits, iterating over the rows and columns of that CSV, and finally storing into the new TDE. If you look carefully you can spot all 10 of the steps that I listed above. The CSV handling here isn’t as robust as I’d like it to be, but it’s passable. You probably noticed that I’ve defined the schema of the CSV as a list of dictionary objects. Making this dynamic is an exercise I’m leaving for a later time. As configured the file will load the Superstore Orders dataset provided in the Tableau Desktop install.

Additionally, I’ve added some extra bits to make working with the API a little bit more friendly.

First, I added a dictionary called tdeTypes that maps the integer keys the API uses for it’s data types to more friendly strings. Second, I added a function to help with the set step that has to be performed for every column in every row. Something like this was necessary for any shot at making this dynamic down the road. The add_tde_col(colnum, row, val, t) function takes a column number, a TDE Row object, a value, and a type and tries to perform the correct row.set call. Additionally, it has some limited error handling. If you’re observant you’ll notice that I actually bailed on finishing this function. Smart developers only implement things when they need them.

Despite having to sharpen up my Python skills to get this implemented, working with the Tableau Data Extract API was mostly painless. If I had to ask for any additional features I’d probably ask for some kind of handy row.setX() method that would determine the strongest data type and handle type casting for the user. Python being the weakly typed language that it is doesn’t really jive with the current row.set() notion. Really though, I feel greedy for even asking for that.

The ability to programmatically generate extracts has been an often-requested feature by many of my clients and I’m happy to see that Tableau Software has finally implemented it and that it works well in my initial tests. I look forward to all of the shenanigans interesting uses that the community will come up with in the coming months.

More About the Author

Brian Bickell

Global Data Practice Director
Introducing the Snowflake Data Cloud: How We Got Here Five years ago, in the distant past that still included large, boozy, in-person technology conferences, I met two guys from an upstart ...
InterWorks’ Influential Role at the KeyBanc Capital Markets Emerging Technology Summit Two weeks ago, KeyBanc Capital Markets and the Mosaic network held their 15th Annual Emerging Technology Summit in San Francisco, ...

See more from this author →

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!