Tableau Prep is new tool from Tableau which helps to help people transform, clean and massage their data for analysis quickly. With simple drag and drop features to simplify the complicated tasks of joins, unions, pivots, aggregate or create calculate fields the data quickly.

Let’s quickly see how Tableau prep works and for that one needs to download Tableau prep and install it from here. For this exercise, we are going to use superstore data which can be download from here.

Objective of this exercise is to get output something like this:

Final prep file will look like this:

 

Open Tableau Prep after installing and you can see below screen:

Step 1: Connect to Data source:

Tableau Prep provides flexibility to connect more than 25 data sources right from csv to Amazon redshift to greenplum database.

Click on Connections –> Select Microsoft Excel –> select the file downloaded earlier from path

Once data file is loaded into Prep, you will get data like below. Here, on left hand side we see various sheets in the excel file (source) and on right side we get to see plain white area where we can drag one or multiple source files or tables. Next is we will drag orders into white area and we will see bottom half of right side list of columns in the file.

Here we can select the columns required for further processing or deselect the unwanted fields. For the purpose of this post, I selected Customer id, order date, product id, category, sales field from the input section:

Step 2:  Finding the customer level first purchase date with its sales, number of distinct product ids and categories

Tableau prep gives various options to perform next step in data preparation:

 

Select aggregate option, then drag customer id into group fields and order date into Aggregated fields. Change the aggregation level to minimum for order date (shown in below image)

Add another aggregate from the source orders, then drag customer id, order date into group fields while drag sales, product id and category into aggregation levels. Ensure Aggregation level for sales is sum and for product id & category should be distinct counts. It will look something like below screenshot.

We can directly set the type aggregation on field by clicking on field and selecting the type as shown in above image.

Now, its time to use Join operation to join both the aggregate results to get first purchase date, sales, count of products and categories:

Add Join from aggregate 2 then drag Aggregate 1 into join as shown in image:

Then add the join condition on customer id and order date as shown below:

Finally, we have all required fields but we also got duplicate fields for customer id and order date. We will remove this field using cleaning step:

Along with this we will rearrange the data and rename fields for better clarification (order date -> 1st purchase date, sales -> 1st purchase sales, Product id -> 1st purchase products and category -> 1st purchase categories)

We can anytime look at the changes made on the left side under changes section.

With this we have customer ids with their first purchase date, sales, number of distinct products and distinct categories

Step 3: Customer ids with second purchase date, sales, number of distinct products and distinct categories:

Add New join operation on Aggregate 1 and Aggregate 2 with slightly different join criteria where matching customer id but order dates are not equal. (highlighted in Yellow in below image)

Add step to remove duplicate order date-1 and customer id-1 field:

Add Aggregate (Aggregate 3) to find the minimum order date from this data (as we had removed first order date when we applied second join (Join 2))

Add Join (Join 3) from aggregate 3 and apply join with Aggregate 2 on customer id and order date as shown in below images:

Now we have got desired output along with duplicate fields of customer id-1 and order date-1. We will remove this fields using add step operation for cleaning of our dataset.

Along with this we will rearrange the data and rename fields for better clarification (order date -> 2nd purchase date, sales -> 2nd purchase sales, Product id -> 2nd purchase products and category -> 2nd purchase categories)

Now we have both the dataset ready with required information.

Step 4: Combine the dataset (output from clean 1 & clean 3) using join on Customer id

Here we have duplicate field of customer id which we will remove using step operation and rearrange the data to see proper arrangement:

Final step is to export the data into either .hyper or CSV or .tde file using output operation as shown below:

 

Click on Run now and generate the file. Now, this file can be directly imported into Tableau for data visualisation:

Overall, I really liked the easy nature of Tableau Prep which allows non-technical users to create their own data flow or kind-off ETL to generate the output dataset with necessary transformations.

The final version of Tableau prep file is available Here:

From the output file generated from above process, I had designed the following Tableau data visualisation:

 

Happy Visualisation!!!

Thanks for visiting blog. Please do let me know your feedback and if any particular topic you would like me to write on.

Do subscribe to Tabvizexplorer.com to keep receive regular updates.

 

 

 

 

For #MakeoverMonday week 25, Dataset was about influenza cases in the US which was visualized in line chart:

Here is the original viz by CDC:

Data is available on data.world week 25:

Here is what I did:

  • Filtered the data to use from season 2006-07 to 2017-18 (Season 2009-10 had few duplicate weeks in data shared)
  • Used highlight table to show number of patients for each week in season
  • Added tooltip to show number of patients compared to total patients reported

Click on the viz to use the interactive version.

Thanks for visiting blog. Please do let me know your feedback and if any particular topic you would like me to write on.

Do subscribe to Tabvizexplorer.com to keep receive regular updates.

This week’s #workoutwednesday by Ann Jackson was to prepare dataset using Tableau Prep then to design visualisation to show customers spend more on their first or second purchase.

The list of full requirements can be found here but on a high level:

  • Create a dataset using Tableau prep which will return customer level first and second order information with regards to sales, no. of categories and no. of products sold in the order
  • Dataset will have 9 columns i.e. Customer id, 1st Purchase Date, 2nd Purchase Date, 1st Purchase Sum Sales, 2nd Purchase Sum Sales, 1st Purchase # Categories, 2nd Purchase # Categories, 1st Purchase # Products, 2nd Purchase # Products
  • Create scatter plot with First purchase sales vs second purchase sales with 2 strip plots
  • Sheets aligned in dashboard using float method

It was good reason to try out Tableau prep for first time and create the dataset for the challenge. Here is how my workflow looked like in Tableau prep:

developing the viz was pretty straight forward with scatter and strip plot but the tricky part was to arrange all the sheets into float order with pixel perfect output. Took a bit of time to arrange but was able to make it at last. Overall, enjoyed working on Tableau Prep. I will be writing on Tableau prep in coming days to explain how it works.

Here is my output for the challenge (Click on Image for interactive version)

Thanks for visiting blog. Please do let me know your feedback and if any particular topic you would like me to write on.

Do subscribe to Tabvizexplorer.com to keep receive regular updates.

 

For Makeovermonday week 24, it was about Tourism density index i.e. how many tourists come to country compare to population.

Here is the original viz by Intrepid Travel:

Data is available on data.world week 24:

Here is what I did:

  • Got the idea of ratio from Eva’s Blog and tried to show number of tourists per 100 local people
  • Used rows and columns method using groups in dataset (inspired by Andy’s output)
  • Make it easier to see for each country given in dataset

Click on the viz to use the interactive version.

Thanks for visiting blog. Please do let me know your feedback and if any particular topic you would like me to write on.

Do subscribe to Tabvizexplorer.com to keep receive regular updates.

This week’s #workoutwednesday by Rody Zakovich was about playing with dates. challenge was about comparing current period with previous period of same range of days.

Requirements:

  • User can select a Start and End date for the Current Period
  • The “Previous Period” contains the same number of days as the Current Period, and ends the Day before the Start Date
  • Both the “Current” and “Previous” Periods must be on the same Date Axis (No dual Axis!)
  • The Current Period must be distinguishable from the previous period
  • Must show the Current Period Range, as well as, the Prior Period Range

This week data uses the superstore dataset

Here is my solution with the requirements given:

Thanks for visiting blog. Please do let me know your feedback and if any particular topic you would like me to write on.

Do subscribe to Tabvizexplorer.com to keep receive regular updates.