Scatter Plot: An Effective Tool to Represent Relationship Between Two Variables

Studies often involve describing the relationship between two numeric variables. Scatter plot is one such statistical tool which is used in the studies to observe and represent the relationship between the variables. 

Scatter plot, commonly known as scatter diagram, is a two dimensional plot that uses dots to illustrate the values for two numeric variables. In scatter plot, the independent variable lies on the X-axis and dependent variable on the Y-axis. The major purpose of this plot is to describe the correlation between the variables.

Correlation in scatter plots 

Put simply, correlation is the relationship between the variables. For example, how much an individual weighs is correlated with the amount of food consumed. 

The correlation has two properties: direction and strength. The direction of the correlation is identified by determining if the correlation is positive or negative. Whereas, strength (indicating how strong the relationship is) is determined by the numerical value. 

The correlation are of three types

  1. Positive correlation – Here, both variables (X & Y) move in the same direction. In other words, as one variable increases, the other variable also increases. When one variable decreases, the other variable also decreases. E.g. as the number of cars on the road increases, the traffic increases.
  2. Negative correlation – In this type of correlation, variables move in opposite directions. As one variable increases, the other variable decreases and vice-versa. E.g. as the number of number of absence increases, the score increases. 
  3. No correlation – There exists no relationship between the variables. That is, the dots are scattered around the plot area. E.g., there is no relationship between the shoe size and the salary.

In the perfect correlations (negative or positive), the data lies on the line of fit. In weaker correlations, the data does not lie on the line of fit. When comparing the correlation, one must look only at the numerical value and not look for the positivity or negativity. The correlation with the highest numerical value is considered to be the strongest. However, if the numerical values are equal, then they have the same strength, irrespective of positivity and negativity. 

Scatter plot can be developed manually or by using excel. Here let’s have a look at developing  a scatter plot through excel and manually. 

Creating a scatter plot manually is the easiest process. The steps involved here are:

  1. Draw X and Y axis. Select a range including maximum and minimum from given data set. 
  2. Mark the X-axis values and their relevant Y-axis values. 

Another approach for developing scatter plot is by using excel tool. 

  1. Creating the plot in excel involves arranging the data in the first place. To arrange the data, prepare a table and put the independent variable on the left side of the table and dependent variable that goes on Y-axis on the right side of the tables. 
  2. Organisation of data is followed by creating the scatter plot. 
  • Choose two columns with column headers and numerical values
  • Click on tab -> chats group -> choose the preferred type of scatter plot. The required plot will be displayed on the worksheet.

If required the scatter diagram can be customised as per the individual requirements. 

Reducing the white space 

If the data is clustered at a specific position in the plot, then the surrounding white space can be cleaned up. 

  • Click X-axis -> format axis
  • Choose the maximum and minimum bounds 
  • Click Y-axis -> format axis
  • Choose the maximum and minimum bounds

Adding chart labels 

When creating plot with small number of data points, some labels may be missed out which can be added manually. 

  • Choose the plot and click chart elements
  • Click data labels -> more options
  • Click format data labels -> value from cells and choose the range of data labels
  • Clear X and Y value box to eliminate the numerical values
  • Specify label position

Adding a trendline

Trendline, also known as line of best fit lets you visualise the relationship between the variables in a better manner. 

  • Click on the data point and select “add timeline” 
  • To add equation to timeline, simply click “display equation on chart box on format trendline”

Scatter diagram represents independent variable on the X-axis and dependent variable on the Y-axis. However, if the study demands, the X and Y variables and be swapped. 

  • Click on axis (any) and select the data
  • In the “select data source” dialogue box click on the edit button
  • Copy Y-axis values to X-axis and vice versa
  • Click OK

Why use scatter plot? 

A scatter diagram is useful in determining patterns in data. It also shows gaps in the data and presence of outliers. These information plays a crucial role in segmentation of data into various parts.

Leave a Reply