Using Tableau Prep to Clean Your Address Data
Trying to Geocode with Messy Address Data? Tableau Prep to the Rescue
In my previous post (Batch Geocoding On the Cheap), I write that geocoding your addresses is key to being able to fully use them in your analyses. However, in order to geocode your data, you need clean address data to work with. While Excel and direct SQL are good solutions for relatively clean data, for truly messy address data, I recommend the tool, Tableau Prep.
Tableau Prep is part of the Tableau Desktop product suite. Tableau Prep is not free. It does, however, come free with Tableau Desktop. For a variety of reasons which I won’t go into entirely here, in my honest (and definitely not sponsored) opinion, Tableau is entirely worth the money. A single Tableau Creator license (~$840 annually, with discounts available for non-profits and government and FREE licenses for educators and students) is all you really need to get started with data cleansing and manipulation.
With Tableau Prep, using a more or less drag and drop interface, you are able to create the functional equivalent of a data cleansing script than you can then use to automate your data cleansing in the future.
And here I will walk you through my process and the magic I was able to accomplish with Tableau Prep.
To recap from my previous post, I had 600k+ of customer addresses to geocode. I had found a Data Key that contained all addresses in my regional area, along with latitudes and longitudes. I wanted to match the addresses in my dataset to the addresses in my Data Key. However, my address data was very messy and riddled with typos. In order to be able to match it to my data key, it would need significant cleaning.
Enter Tableau Prep.
This is what my complete ‘flow’ ended up looking like in Tableau Prep. A flow is basically a series of data cleansing and data manipulation steps that transforms your data into your desired format.

In the complete Flow above, you can see the higher level changes made to the data sources. On the very left, various data sources are connected to Tableau Prep. In this case, a ‘cleaning’ step (represented by the solid horizontal colored bars) is applied to each data source to reformat the addresses. This reformatted data is then joined and unioned together in various ways until I arrive at my final output.
Cleaning My Address Data
Address data comes in different shapes and forms — and thus also with different cleansing challenges. Here are some of the steps I took in cleaning my address data.
Applying Close Matches and Loose Matches
You may notice that there are a lot of data sources in the complete flow — and a lot of steps. There are in fact only two data sources — the customer data and the Data Key. However, I had to duplicate the data sources several times. I ended up doing this because I applied first a close match to the data — matching addresses if they matched exactly — with the same street endings, Lane, Court, Drive, etc.
But my customer data is very flawed and many of the addresses are missing street endings. Rather than discard those addresses, I decided to then apply a second loose match that ignored those street endings. If an address (minus street endings) was the same and the postal codes and cities were the same, I allowed them to match. But because I wanted the close match to take precedence, I had to apply the loose match only to data that didn’t match in the close match — which I obtained through an outer join.
Adding Numerical Suffixes
My Data Key had addresses with numerical suffixes. For example, NW 23rd St and SW 122nd St. However, much of customer data didn’t have those suffixes. The same addresses were stored as NW 23 St and SW 122 St. I added suffixes to my data by adding conditional logic that looked for numerical street names without suffixes (a numerical street name ending in a number instead of a letter). Then, the correct suffix was added based on the final number (0, 4, 5, 6, 7, 8, 9 = TH, 1 = ST, 2 = ND, 3 = RD).
Finding and Correcting Typos
In the loose matches, I matched on address, post code, and city. However, in my customer data, I noticed a lot of typos in city names. These are manually entered and a combination of abbreviations, misspellings, and fat fingers resulted in a significant number of misspelled cities. Tableau Prep has a great feature where you can Group different spellings of your field into one grouped value. For example, looking at the city, ‘Oklahoma City,’ I can group ‘OKC’, ‘Okla. City,’ ‘Okla City,’ ‘Olkahoma City,’ ‘OKLAHOMA CITY,’ and any other variations.
Tableau Prep allows you to visually see the different variations in spellings instead of having to guess what possible variations might exist in the data.

Checking for Patterns in Non-Matches -> Adding New Cleaning Steps & Repeat
With Tableau Prep, with each join you can see the matched and unmatched values. After each change, I would review the data that still wasn’t matched, and based on the patterns that I saw, I would go back and add additional steps.
And then, once you are done….
Only do the cleaning once! Reuse your Tableau Flow with your new data.
One of the things I LOVE about Tableau Prep is that once you’ve created your flow ONCE, you are set. In that, when you get new customer data — as long as it’s in the same format, just update your data source and then let your saved flow run. All of the data reformatting and manipulation steps that you worked so hard on the first time round are applied to your updated data. Geocoding new addresses then becomes extremely easy.