In today’s post we leave aside data analysis we have accustomed you so far and we will focus on statistics. Statistics is an important part of data analytics process because it provides us with information on data evaluation and correlation. There are many situations in which we need statistics, whether we are talking about market analysis, population analysis and even business. Regardless of the field for which we need statistics, data correlation is a starting point. For this, we need a coefficient to calculate the relationship between two variables. In the following lines we discuss how we can calculate Pearson correlation coefficient in Tableau, to identify the association level between two continuous variables.
Pearson correlation coefficient is often used to measure statistical relationship and association between certain data. The result offered by this coefficient can be negative or positive. When Pearson correlation coefficient has a positive value, it means that each increase of one of the analyzed variables is also an increase of the other correlated variable. A negative value tells us that each increase of one variable, is a decrease of the other. Calculating Pearson correlation coefficient can be done in Tableau Software, and below we will show you how.
First, let’s see how this coefficient could help us and why it is useful to include it in our analyzes. In any business, Pearson correlation coefficient provides us with valuable information regarding certain indicators that we usually follow. For example, we can understand what is the relationship between sales and profit and whether these two depend on each other. Specifically, we can understand if sales growth is also associated to profit growth and whether these variables are influenced by each other.
Also, Pearson correlation coefficient value helps us identify a relationship direction and association between two variables, profit and sales in our case. If sales growth is not correlated with profit growth, it is very likely that we have a problem in terms of sales activities. Calculating Pearson correlation coefficient in Tableau helps end users and decision makers in an organization take informed decisions about improving activities.
Below you will find all the steps you need to go through to calculate Pearson correlation coefficient in Tableau.
→ In Tableau Desktop, connect to Superstore sample data provided by Tableau.
→ Create a scatterplot.
→ Drag Profit to Columns and Sales to Rows.
→ In the Analysis menu, uncheck Aggregate Measures.
→ Right-click the view and choose Trend Lines -> Show Trend Lines.
→ Right-click the view again and select Trend Lines -> Describe Trend Model.
→ Locate the R-Squared value in the Describe Trend Model dialog box.
→ In this example, the R-Squared value is 0.229503.
→ Calculate the Pearson correlation by using a calculator or other program.
→ Calculate the square root of the R-squared value. Which will be your correlation (r): √0.229498 = 0.4791
→ Rounded to two digits, the value in this example is 0.48.
→ Create a calculated field named R using the CORR function with the formula:
CORR([Profit], [Sales])
→ Drag the field R using CORR on Text on the Marks Area.
→ Create a calculated field named R using CORR Table Calc using the WINDOW_CORR function, with the formula:
WINDOW_CORR(SUM([Profit]), SUM([Sales]))
→ Drag the field Row ID on Rows, Profit, Sales and R using CORR Table Calc on Text on the Marks Area.
By Adelina Popescu
In Tableau, “Measure Names” is a special field that automatically includes all the measure names (numeric fields) in your data source. It’s a dynamic field that allows you to switch between different measures in your visualizations without having to manually […]
On October 18 we hosted a new private event btProvider & Tableau, titled “Data & AI: Unveiling Tableau’s Magic”.
A diverging bar chart in Tableau is an efficient method for comparing two categories based on a single measure, clearly highlighting the differences. These charts are particularly useful when we want to emphasize discrepancies between two sets of data or […]