What happens if you find a correlational relationship between two variables?

Distinguishing between what does or does not provide causal evidence is a key piece of data literacy. Determining causality is never perfect in the real world. However, there are a variety of experimental, statistical and research design techniques for finding evidence toward causal relationships: e.g., randomization, controlled experiments and predictive models with multiple variables. Beyond the intrinsic limitations of correlation tests (e.g., correlations cannot not measure trivariate, potentially causal relationships), it's important to understand that evidence for causation typically comes not from individual statistical tests but from careful experimental design.

Example: Heart disease, diet and exercise

For example, imagine again that we are health researchers, this time looking at a large dataset of disease rates, diet and other health behaviors. Suppose that we find two correlations: increased heart disease is correlated with higher fat diets (a positive correlation), and increased exercise is correlated with less heart disease (a negative correlation). Both of these correlations are large, and we find them reliably. Surely this provides a clue to causation, right?

In the case of this health data, correlation might suggest an underlying causal relationship, but without further work it does not establish it. Imagine that after finding these correlations, as a next step, we design a biological study which examines the ways that the body absorbs fat, and how this impacts the heart. Perhaps we find a mechanism through which higher fat consumption is stored in a way that leads to a specific strain on the heart. We might also take a closer look at exercise, and design a randomized, controlled experiment which finds that exercise interrupts the storage of fat, thereby leading to less strain on the heart.

All of these pieces of evidence fit together into an explanation: higher fat diets can indeed cause heart disease. And the original correlations still stood as we dove deeper into the problem: high fat diets and heart disease are linked!

But in this example, notice that our causal evidence was not provided by the correlation test itself, which simply examines the relationship between observational data (such as rates of heart disease and reported diet and exercise). Instead, we used an empirical research investigation to find evidence for this association.

Correlation means association - more precisely it is a measure of the extent to which two variables are related. There are three possible results of a correlational study: a positive correlation, a negative correlation, and no correlation.

  • A positive correlation is a relationship between two variables in which both variables move in the same direction. Therefore, when one variable increases as the other variable increases, or one variable decreases while the other decreases. An example of positive correlation would be height and weight. Taller people tend to be heavier.
  • A negative correlation is a relationship between two variables in which an increase in one variable is associated with a decrease in the other. An example of negative correlation would be height above sea level and temperature. As you climb the mountain (increase in height) it gets colder (decrease in temperature).
  • A zero correlation exists when there is no relationship between two variables. For example there is no relationship between the amount of tea drunk and level of intelligence.
  • Scattergrams

    A correlation can be expressed visually. This is done by drawing a scattergram (also known as a scatterplot, scatter graph, scatter chart, or scatter diagram).

    A scattergram is a graphical display that shows the relationships or associations between two numerical variables (or co-variables), which are represented as points (or dots) for each pair of score.

    A scattergraph indicates the strength and direction of the correlation between the co-variables.

    Types of Correlations: Positive, Negative, and Zero

    When you draw a scattergram it doesn't matter which variable goes on the x-axis and which goes on the y-axis.

    Remember, in correlations we are always dealing with paired scores, so the values of the 2 variables taken together will be used to make the diagram.

    Decide which variable goes on each axis and then simply put a cross at the point where the 2 values coincide.

    Some uses of Correlations

    Some uses of Correlations

    Prediction

    • If there is a relationship between two variables, we can make predictions about one from another.

    Validity

    • Concurrent validity (correlation between a new measure and an established measure).

    Reliability

    • Test-retest reliability (are measures consistent).
    • Inter-rater reliability (are observers consistent).

    Theory verification


    Correlation Coefficients: Determining Correlation Strength

    Correlation Coefficients: Determining Correlation Strength

    Instead of drawing a scattergram a correlation can be expressed numerically as a coefficient, ranging from -1 to +1. When working with continuous variables, the correlation coefficient to use is Pearson’s r.

    Correlation Coefficient Interpretation

    The correlation coefficient (r) indicates the extent to which the pairs of numbers for these two variables lie on a straight line. Values over zero indicate a positive correlation, while values under zero indicate a negative correlation.

    A correlation of –1 indicates a perfect negative correlation, meaning that as one variable goes up, the other goes down. A correlation of +1 indicates a perfect positive correlation, meaning that as one variable goes up, the other goes up.

    There is no rule for determining what size of correlation is considered strong, moderate or weak. The interpretation of the coefficient depends on the topic of study.

    When studying things that are difficult to measure, we should expect the correlation coefficients to be lower (e.g. above 0.4 to be relatively strong). When we are studying things that are more easier to measure, such as socioeconomic status, we expect higher correlations (e.g. above 0.75 to be relatively strong).)

    In these kinds of studies, we rarely see correlations above 0.6. For this kind of data, we generally consider correlations above 0.4 to be relatively strong; correlations between 0.2 and 0.4 are moderate, and those below 0.2 are considered weak.

    When we are studying things that are more easily countable, we expect higher correlations. For example, with demographic data, we we generally consider correlations above 0.75 to be relatively strong; correlations between 0.45 and 0.75 are moderate, and those below 0.45 are considered weak.


    Correlation vs Causation

    Correlation vs Causation

    Causation means that one variable (often called the predictor variable or independent variable) causes the other (often called the outcome variable or dependent variable).

    Experiments can be conducted to establish causation. An experiment isolates and manipulates the independent variable to observe its effect on the dependent variable, and controls the environment in order that extraneous variables may be eliminated.

    A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable. A correlation only shows if there is a relationship between variables.

    causation correlationg graph

    While variables are sometimes correlated because one does cause the other, it could also be that some other factor, a confounding variable, is actually causing the systematic movement in our variables of interest.

    Correlation does not always prove causation as a third variable may be involved. For example, being a patient in hospital is correlated with dying, but this does not mean that one event causes the other, as another third variable might be involved (such as diet, level of exercise).

    Summary

    "Correlation is not causation" means that just because two variables are related it does not necessarily mean that one causes the other.

    A correlation identifies variables and looks for a relationship between them. An experiment tests the effect that an independent variable has upon a dependent variable but a correlation looks for a relationship between two variables.

    This means that the experiment can predict cause and effect (causation) but a correlation can only predict a relationship, as another extraneous variable may be involved that it not known about.


    Strengths of Correlations

    Strengths of Correlations

      1. Correlation allows the researcher to investigate naturally occurring variables that maybe unethical or impractical to test experimentally. For example, it would be unethical to conduct an experiment on whether smoking causes lung cancer.

      2. Correlation allows the researcher to clearly and easily see if there is a relationship between variables. This can then be displayed in a graphical form.

    Limitations of Correlations

    Limitations of Correlations

      1. Correlation is not and cannot be taken to imply causation. Even if there is a very strong association between two variables we cannot assume that one causes the other.

      For example suppose we found a positive correlation between watching violence on T.V. and violent behavior in adolescence. It could be that the cause of both these is a third (extraneous) variable - say for example, growing up in a violent home - and that both the watching of T.V. and the violent behavior are the outcome of this.

      2. Correlation does not allow us to go beyond the data that is given. For example suppose it was found that there was an association between time spent on homework (1/2 hour to 3 hours) and number of G.C.S.E. passes (1 to 6). It would not be legitimate to infer from this that spending 6 hours on homework would be likely to generate 12 G.C.S.E. passes.

    How to reference this article:

    How to reference this article:

    McLeod, S. A. (2018, January 14). Correlation definitions, examples & interpretation. Simply Psychology. www.simplypsychology.org/correlation.html

    Home | About Us | Privacy Policy | Advertise | Contact Us

    Simply Psychology's content is for informational and educational purposes only. Our website is not intended to be a substitute for professional medical advice, diagnosis, or treatment.

    What happens when two variables are correlated?

    What are correlation and causation and how are they different? Two or more variables considered to be related, in a statistical context, if their values change so that as the value of one variable increases or decreases so does the value of the other variable (although it may be in the opposite direction).

    What happens if there is a strong correlation between two variables?

    If one says that there is a strong correlation between two variables then it means a change in one variable tends to change the other variable by a large amount. For example, a change in the expense of advertising changes the number of sales, or the change in the outside temperature influences the sales of ice cream.

    When I find a significant correlation between 2 variables it means that?

    The points fall close to the line, which indicates that there is a strong linear relationship between the variables. The relationship is positive because as one variable increases, the other variable also increases.

    What happens to the relationship between two variables if the value of the correlation goes closer to zero?

    A correlation coefficient of zero, or close to zero, shows no meaningful relationship between variables. A coefficient of -1.0 or +1.0 indicates a perfect correlation, where a change in one variable perfectly predicts the changes in the other.