This week #workoutwednesday challenge was set up by Ann Jackson. It was slightly difficult challenge as this week it was about finding signals or patterns in data with use of control chart. It was more about finding statistical signals using standard deviations and mean/ medians with signals to indicate change in pattern or trend or finding an outlier within the dataset. This is good way to analyze the data and get statistical insights/ signals from the data about pattern and behavior.

Requirements: 

Available here

Data for the workout can be downloaded from here

Here is my output for the challenge (Click on Image for interactive version):

It took me time to understand the requirement and figure out the way to design the visualisation. lookup() & windows functions did help a lot to build this requirement

Let me share the steps required to create this visualisation post data is imported into Tableau:

Step 1: Create 2 parameters for the requirement

Select a Middle line: This parameter to provide drop down option for either median or mean

Select A Test: This parameter to select the type of test we want to perform (outliers, Trend or change)

Step 2: Create the calculated fields required for the visualisation

Middle Line: This field is based on parameter “SELECT A MIDDLE LINE” and based on selection we will have either median or mean of sales

 

+3SD: This field is to calculate + 3 standard deviation based on middle line

-3SD: This field is to calculate – 3 standard deviation based on middle line

TEST – Outliers: Boolean field to highlight the outliers in the dataset i.e. if the sales value is either above +3SD or -3SD

 

TEST – TREND: Boolean field to highlight the trend whether current sales > previous month > previous to previous month or current sales < previous month < previous to previous month.

TEST – CHANGE: Boolean field to highlight whether 3 consecutive fields are below or above middle line.

TEST – SELECTION: This field is based on “SELECT A TEST” Parameter and uses above 3 calculated field to show the required value.

Tooltip – Signal: Tooltip to show signal if the sales value matches with any of the parameter selection criteria

Show Text: This is the field to show the necessary text based on parameter selection

STEP 3: There are 3-4 more fields which we need to create but lets start creating data visualisation for first 2 sheets

Drag Order date to column shelf (convert to month)

Drag Sum(sales) and TEST Selection to rows shelf then change it to dual axis followed by sync axis

Drag Middle line, +3SD, -3SD to details in Marks Area

Add a 3 reference lines on the basis of Middle line, +3SD and -3SD

Add Parameter control for Parameters “Select a Middle line” and “Select A TEST”

Then format the sheet as per requirement with tooltips and it will look like following:

Create an new sheet for Text data and drag Show text field to Text

Step 4: Final sheet for Monthly strip chart

For Strip chart, we will need to add following fields into the dataset

TREND: This is boolean field which we will use to highlight the pattern for Trend

CHANGE:This is boolean field which we will use to highlight the pattern for Change

SymbolCOLOR: Based on Parameter selected for the test, this is will return values for the test select and we will use this as also to segregate data into 3 color bucket of blue, amber and orange to show following:

  • Meets the test criteria
  • Part of test pattern

Finally, we will create 3rd sheet with strip chart:

Drag Order date to column shelf (convert to month)

Then Drag symbolColor to color and shape area under Marks area then create the tooltip to show the values

Step 5: Create the Dashboard

Add all 3 sheets into dashboard as per below image:

Now we are ready with visualization with user experience which we wanted to show.

Click here for Tableau file

Thanks Ann Jackson for this workout.

Happy Data Visualisation!!!!

Thanks for visiting this post. Please do let me know your feedback or if you have any questions about the blog do not hesitate to contact me on twitter (@Desaimithun)

Do subscribe to Tabvizexplorer.com to keep receive regular updates.

Today’s blog we will learn about running R codes with Tableau. R is the most used statistical tools by data scientests and researchers across the globe. Together, R & Tableau could be extremely useful in data science arena as it can help organisations to unravel end to end discovery needs.

Here in the blog we will go step by step process of integrating R & Tableau

Prerequisite: R studio & Tableau already installed on machine

Step 1

Install Rserve Package on your R software

install.packages("Rserve", repos="https://cran.rstudio.com/bin/windows/contrib/3.4/Rserve_1.7-3.zip")
## Error in install.packages : Updating loaded packages

Once the package is installed, we will run below command to load the package

library(Rserve)
Rserve()
## Starting Rserve...
##  "C:\Users\HP-LAP~1\DOCUME~1\R\WIN-LI~1\3.4\Rserve\libs\x64\Rserve.exe"

This will start the server in the background to enable Tableau to execute commands on R

Step 2

Now we will connect Tableau to R server using following

Help –> Setting and Performance –> Manage external Service connection

This will open the below shown window with auto populated values for Server as “Localhost” and port as “6311”.

Click on Test connection to check the connection, it will show successfully connected to Rserve in pop-up

click on ok
This will help Tableau to execute script commands like:
Script_BOOL
Script_INT
Script_STR
Script_REAL

Step 3

We will take below FMCG example as our data source and import into Tableau

Step 4

Now, we will create an calculated field “Outliers” using following code to find outliers in the FMCG data:

IF SCRIPT_REAL(“library(pracma); a <- rep(1, length(.arg1)); a[findpeaks(.arg1,threshold=quantile(.arg1,.99),sortstr=FALSE)[,2]]=0;a;”, SUM([Sales])) == 0
THEN “Outlier”
ELSE “OK”
END

One more calculated field “Cluster” with following code

SCRIPT_INT(
‘set.seed(42);result <- kmeans(data.frame(.arg1,.arg2,.arg3,.arg4), .arg5[1]);result$cluster;’,
max([Sales]), SUM([Profit]),min([Freight Expenses]),MIN([Discount offered]),3)

Step 5

Drag order date on column shelf and sales on row shelf then drag “Outliers” into color

As you can see all the outlier values are highlighted in orange while other values are in blue color. This way we can find outliers in our data very easily by using dream team of Tableau and R

Step 6

Lets go one step further and see if we can do clustering in Tableau with R.

Drag Sales into column shelf and profit into row shelf (unselect analysis –> aggregate measure) then drag calculated field “Cluster” to color in marks area and add trendline in the chart. We will see it creates 3 clusters as shown below:

Hope this blog helps to understand how to run Rserver in background and execute R scripts using Tableau.

you can download tableau file from here

Thanks for reading the blog

Do subscribe to my blog and keep receiving new posts 🙂