Calculate Pearson’s Correlation Using Microsoft Excel Correlation Analysis using Excel The correlation coefficient allows researchers to determine if there is a possible linear relationship between two variables measured on the same subject (or entity). When these two variables are of a continuous nature (they are measurements such as weight, height, length, etc. ) the measure of association most often used is Pearson’s correlation coefficient. This association may be expressed as a number (the correlation coefficient) that ranges from –1 to +1.
Need essay sample on Quantitative Methods Using Excel ?We will write a custom essay sample specifically for you for only $13.90/pageorder now
The population correlation is usually expressed as the Greek letter rho (r) and the sample statistic (correlation coefficient) is r. The correlation measures how well a straight line fits through a scatter of points when plotted on an x – y axis. If the correlation is positive, it means that when one variable increases, the other tends to increase. If the correlation is negative, it means that when one variable increases, the other tends to decrease. When a correlation coefficient is close to +1 (or –1), it means that there is a strong correlation – the points are scattered along a straight line. For example, a correlation r = 0. may be considered strong. However, the closer a correlation coefficient gets to 0, the weaker the relationship, where the cloud (scatter) of points is not close to a straight line. For example, a correlation r = 0. 1 might be considered weak. For scientific purposes, a t-test is utilized to determine if the correlation coefficient is “strong” or “significant” or not. This will be discussed later. Assumptions: Before using the Pearson correlation coefficient as a measure of association, you should be aware of its assumptions and limitations. As mentioned earlier, this correlation coefficient measures a linear relationship.
That is, the relationship between the two variables measures how close the two measurements form a straight line when plotted on an x-y chart. Therefore, it is important that data be graphed before the correlation is interpreted. For example, it is possible that data, when plotted, may show a curved relationship instead of a straight line. When this is the case, a Pearson correlation may not be the best measure of association. There are other conditions when a correlation coefficient may appear important, but when considered in light of a graph, is not a good measure of relationship.
In the following graphs, all of them have a correlation coefficient of about 0. 72, yet most do not fit the assumption of a linear relationship. To avoid misinterpreting a correlation, always accompany the calculation with a graph. Another assumption of correlation is that the both of the variables (the measurements) be of continuous data measured on an interval/ratio scale. Data that are not continuous, such as categorical (i. e. hair color) or binomial (i. e. , gender) data would not be acceptable. Also, each variable should be approximately normally distributed. For this example, we’ll look at the data set called EXAMPLE.
XLS. The first few records are shown here: GROUP| AGE| TIME1| TIME2| TIME3| TIME4| STATUS| A| 12| 22. 3| 25. 3| 28. 2| 30. 6| 5| A| 11| 22. 8| 27. 5| 33. 3| 35. 8| 5| B| 12| 22. 8| 30. 0| 32. 8| 31. 0| 4| A| 12| 18. 5| 26. 0| 29. 0| 27. 9| 5| B| 9| 19. 5| 25. 0| 25. 3| 26. 6| 5| B| 11| 23. 5| 28. 8| 34. 2| 35. 6| 5| C| 8| 22. 6| 26. 7| 28. 0| 33. 4| 3| B| 8| 21. 0| 26. 7| 27. 5| 29. 5| 5| Suppose you want to want to find the Pearson’s correlation between the variables TIME1 and TIME2. Note: Example 1 requires that you’ve installed the Analysis Toolpak (Tools/Addins/Analysis Toolpak) Example 1
Step1: To make this calculation select Tools/Data Analysis/Correlation… The following dialog box is displayed: Step 2: In the input range textbox enter the range of the data (include the first row containing the variable name) or click on the data selection icon and mark the range to use. Step 3: Notice that the “Labels in First Row” checkbox is checked. Step 4: Click on OK and the following information will appear in a new worksheet: | | A| B| 1| | TIME1| TIME2| 2| TIME1| 1| | 3| TIME2| 0. 763957| 1| The Pearson’s correlation for these two variables is 0. 764 (rounded. ) Example 2
A second way to calculate the correlation is with a function. Step1: In the Example worksheet, enter some labels in column I to indicate that you are calculating a correlation. Step 2: In the J3 (or wherever you want it) cell, you will enter an Excel function that will calculate the desired correlation. Step 3: Enter the formula =CORREL(C2:C51,D2:D51) Note that it is of the form, =CORREL(array1,array2) Where the first array and second array contain the paired numbers to correlate. It is IMPORTANT that the numbers be paired correctly. ) The answer will appear in the cell. In this case, the Pearson’s correlation is 0. 764 (rounded. Example 3: Visualize the relationship between the two variable by displaying a scatterplot. Step 1: Select Insert/Chart (or click chart icon) Step 2: From the Chart Wizard step 1, select XY(Scatter) and select the top chart sub-type. Click Next. Step 3: Indicate the data range as =EXAMPLE! $C$1:$D$51 (either type it in or use the data selector to highlight the values in the spreadsheet – include the column names). Indicate that the data are in columns as shown here. Click Next. Step 4: Indicate an appropriate chart title and axis labels. Click Next. Step 5: Indicate to either place the chart in the same worksheet or in another.
The initial scatterplot looks like this: Step 6: Click on the axis labels for both the x and y axis and adjust the scale to display the data better. For example, by adjusting the axes to begin at 15 instead of 0, the plot becomes: The scatterplot helps you understand the correlation and to determine if it is indeed a linear relationship or something else. Two Sample (independent group) t-test Using Microsoft Excel Independent Groups t-test When the means of two groups are to be compared (where each group consists of subjects that are not related) then the Excel two-sample t-test procedure is used to perform these calculations. NOTE: If your observations are related across “group” as paired or repeated measurements, this in an INCORRECT version of the t-test. For that case, see the tutorial on the Paired t-test. ) Assumptions: Subjects are randomly assigned to one of two groups. The distribution of the means by group is normal with equal variances. Sample sizes between groups do not have to be equal. Test: The hypotheses for the comparison of means from two independent groups are: Ho: ? 1 = ? 2 (means of the two groups are equal) Ha: ? 1 ? ?2 (means are not equal)
The test statistic is a student’s t-test with N2 degrees of freedom, where N is the total number of subjects. A low pvalue indicates evidence to reject the null hypothesis in favor of the alternative. In other words, there is evidence that the means are not equal. For example, suppose we are interested in comparing SCORES across GROUPS, where there are two groups. The purpose is to determine if the mean SCORE on a test is different for the two groups tested (i. e. , control and treatment groups). The example data is shown here: Group| Scores| 1| 20| 1| 23| 1| 32| 1| 24| 1| 25| 1| 28| | 27. 5| 2| 25| 2| 46| 2| 56| 2| 45| 2| 46| 2| 51| 2| 34| 2| 47. 5| In this example, GROUP contains two values, 1 or 2, indicating which group each subject was in. The t-test will be performed on the values in the variable (column) named SCORE. An independent group t-test is done in two steps: Step 1: Decide if the variances are equal in both groups, which determines the type of t-test to perform (one that assumes equal variances or one that doesn’t make that assumption. ) A conservative approach suggested in some texts is to always assume unequal variances. Another approach is to do a tatistical test to determine equality. Step 2: Depending on you decision about the equality of variances you either perform the version of the t-test that assumes equality of variances or other one that doesn’t make that assumption. Determine Equality of Variance If you take the conservative approach, skip this test and proceed to the version of the t-test that does not assume equality of variance. To do a statistical test to determine equality of variance, follow these instructions. (The test for equality of variances is an F-test. ) 1. In Excel, select Tools/ Data Analysis / F-Test Two Sample for Variance. 2.
In the F-Test Two Sample for Variance dialog box: For the Input Range for Variable 1, highlight the seven values of Score in group 1 (values from 20 to 27. 5). For the input range for Variable 2, highlight the eight values of Score in group 2 (values from 25 to 47. 5). Leave the other items at their default selections. This dialog box is shown below. Click OK. 3. The following results are produced by Excel: F-Test Two-Sample for Variances| | | | | | | Variable 1| Variable 2| Mean| 25. 64285714| 43. 8125| Variance| 15. 22619048| 96. 42410714| Observations| 7| 8| df| 6| 7| F| 0. 157908545| | P(F;=f) one-tail| 0. 019378053| |
F Critical one-tail| 0. 23771837| | Notice the highlighted probability p=0. 01937. This is a one-tail p-value associated with the test for equality of variance. Generally, if this value is less than 0. 05 you assume that the variances are NOT equal. a. If the variances are assumed to NOT be equal, proceed with the t-test that assumes non-equal variances. b. If the variances are assumed to be equal, proceed with the t-test that assumes equal variances. Perform the t-test The process of doing the t-test in Excel is similar for both the equal and unequal variances case – the main difference is which version you select from the menu.
Suppose you select the unequal version of the two-sample t-test – this is how you proceed: 1. Select Tools/ Data Analysis/ t-Test: Two Sample assuming Unequal Variances 2. For the Input Range for Variable 1, highlight the seven values of Score in group 1 (values from 20 to 27. 5). For the input range for Variable 2, highlight the eight values of Score in group 2 (values from 25 to 47. 5). Leave the other items at their default selections. This dialog box is shown below. Click OK. 3. The following output is created: t-Test: Two-Sample Assuming Unequal Variances| | | | | | Variable 1| Variable 2| Mean| 25. 64285714| 43. 8125| Variance| 15. 22619048| 96. 42410714| Observations| 7| 8| Hypothesized Mean Difference| 0| | Df| 9| | t Stat| -4. 816944724| | P(T;=t) one-tail| 0. 000475506| | t Critical one-tail| 1. 833112923| | P(T;=t) two-tail| 0. 000951012| | t Critical two-tail| 2. 262157158| | Notice that the two sample mean values (variance) are 25. 64(15. 23) and 43. 81(96. 42). The two tailed calculated t-statistic is 4. 82 and the highlighted p-value for this test is p=0. 001. (0. 000951012) Since the p-value is less than 0. 5, this provides evidence to reject the null hypothesis of equal means. As an example of how this might be reported in a journal article: Methods: A preliminary test for the equality of variances indicates that the variances of the two groups were significantly difference F=. 157, p=. 02. Therefore, a two-sample t-test was performed that does not assume equal variances. Results: The mean score for group 1 (M=25. 64 SD= 3. 9021, N= 7) was significantly smaller than the scores for group 2 (M=42. 81, SD=9. 82, N= 8. ) using the two-sample t-test for unequal variances, t(9) = -4. 82, p ;= 0. 01. (Technically, the degrees of freedom for this unequal variances t-test should be 9. 4 instead of 9, but Excel unfortunately rounds off the DF, so it is reported incorrectly. Years ago, it used to be conventional to round down if you were consulting a table for a probability level, but most statistical programs now calculate the correct p-value using a fractional DF though interpolation. ) Notice that the standard deviation is reported rather than the variances as shown in the Excel results table. You can calculate the standard deviation using Tools/ Data Analysis / Descriptive statistics.
When the variances are assumed equal, the analysis is similar, select Tools/ Data Analysis/ t-Test: Two Sample assuming Equal Variances Paired t-test Using Microsoft Excel Performing a Paired t-test in Excel To compare two paired values (such as in a before-after situation) where both observations are taken from the same or matched subjects, you can perform a paired t- For example, suppose your data contained the variables BEFORE and AFTER, (before and after weight on a diet), for 8 subjects. The hypotheses for this test are: Ho: mLoss = 0 (The average weight loss was 0) Ha: ? Loss ? 0 (The weight loss was different than 0)
For example, the following weight loss data is used in this example (DIET. XLS) Before| After| 162. 00| 168. 00| 170. 00| 136. 00| 184. 00| 147. 00| 164. 00| 159. 00| 172. 00| 143. 00| 176. 00| 161. 00| 159. 00| 143. 00| 170. 00| 145. 00| 1. To perform a paired t-test, select Tools/ Data Analysis / t-test: Paired two sample for means. 2. In the t-test: Paired two sample for means dialog box: For the Input Range for Variable 1, highlight the 8 values of Score in group “Before” (values from 162 to 170). For the input range for Variable 2, highlight the eight values of Score in group “After” (values from 168 to 145).
For now, leave the other items at their default selections. This dialog box is shown below. Click OK. This dialog box is shown below: 3. The results are shown in the output below: t-Test: Paired Two Sample for Means| | | | | | | Variable 1| Variable 2| Mean| 169. 625| 150. 25| Variance| 65. 125| 121. 9285714| Observations| 8| 8| Pearson Correlation| -0. 176747772| | Hypothesized Mean Difference| 0| | df| 7| | t Stat| 3. 706873373| | P(T;=t) one-tail| 0. 003792994| | t Critical one-tail| 1. 894578604| | P(T;=t) two-tail| 0. 007585988| | t Critical two-tail| 2. 364624251| | Thus, the two-tail p-value for this t-test is p=0. 08 (. 007585988) and t=3. 71. Excel actually does a poor job providing what you need to report the results of this test – for a more complete understanding, you need to realize that the paired t-test is actually a test on the DIFFERENCE between the two values. Thus, to make this a better analysis, first calculate the difference between BEFORE and AFTER, creating the following new column called “DIFF” using a formula such as =A2-B2 in cell C2 and copying the formula for the appropriate remaining cells in the worksheet. Notice also that the average difference is calculated (19. 38) Before| After| DIFF| 62. 00| 168. 00| -6. 00| 170. 00| 136. 00| 34. 00| 184. 00| 147. 00| 37. 00| 164. 00| 159. 00| 5. 00| 172. 00| 143. 00| 29. 00| 176. 00| 161. 00| 15. 00| 159. 00| 143. 00| 16. 00| 170. 00| 145. 00| 25. 00| Average Diff=| 19. 38| Look back up at the original hypotheses – what you are testing is that the average loss is different than zero (0). Thus, the t-test is actually testing to determine if the value 19. 38 is sufficiently different from 0 to claim significance. Thus, the number you are interested in most is the average difference (loss) and not as much as the individual means of Before and After.
Therefore to report these results properly, you need the mean difference and standard deviation. You can get this be calculating descriptive statistics on the difference values. (Tools/Data Analysis/ Descriptive Statistics) – choose the Summary Statistics and 95% confidence interval options. The results in the following output: Column1| | | Mean| 19. 375| Standard Error| 5. 226776868| Median| 20. 5| Mode| #N/A| Standard Deviation| 14. 78355747| Sample Variance| 218. 5535714| Kurtosis| -0. 52419581| Skewness| -0. 575291944| Range| 43| Minimum| -6| Maximum| 37| Sum| 155| Count| 8| Confidence Level(95. %)| 12. 35936334| Notice that the mean divided by the standard error (19. 375/5. 227 = 3. 71) is same as the value of the “t Stat” in the previous table. Another piece of information that is usually reported is a 95% confidence interval. Using the Confidence Level (95%) value of 12. 359 in the table, the confidence interval is the mean plus or minus this value. Thus, a 95% C. I. about Mean Difference is (7. 01, 31. 74). To report these results in a journal article, you could use something like this: “A paired t-test was performed to determine if the diet was effective. The mean weight loss (M=19. 8, SD =14. 784, N= 8) was significantly greater than zero, t(7)=3. 71, two-tail p = 0. 008, providing evidence that the diet is effective in producing weight loss. A 95% C. I. about mean weight loss is (7. 01, 31. 74). ” NOTE: The researcher should interpret the results using his or her knowledge of the subject matter – thus giving the variability of the sample, the weight loss could have been as low as an average of 7 pound to a high of 32 pounds (see the 95% confidence interval). If it is as low as 7 pounds, would this still mean that the diet was effective (in terms of the researcher’s experience. ? NOTE: The test could have also been performed as a one-tail test. If so, use the appropriate t-statistic and p-value from the Excel table. NOTE: Also, you could do this test using an hypothesized value of the difference other than zero – although zero is almost always used. Excel provides the opportunity to enter another hypothesized value to test in the paired t-test dialog box. ANOVA Using Microsoft Excel One-Way Analysis of Variance Performing an ANOVA in Excel Definition: An Independent Group ANOVA is an extension of the independent group t-test where you have more than two groups.
This test is used to compare the means of more than two independent groups and is also called a One Way Analysis of Variance. Assumptions: Subjects are randomly assigned to one of n groups. The distribution of the means by group are normal with equal variances. Sample sizes between groups do not have to be equal, but large differences in sample sizes by group may effect the outcome of the multiple comparisons tests. Test: The hypotheses for the comparison of independent groups are: (k is the number of groups) Ho: u1 = u2 … uk (means of the all groups are equal) Ha: ui ;; uj (means of the two or more groups are not equal) The test is performed in an Analysis of Variance (ANOVA) table. The test statistic is an F test with k-1 and N-k degrees of freedom, where N is the total number of subjects. A low p-value for this test indicates evidence to reject the null hypothesis in favor of the alternative. In other words, there is evidence that at least one pair of means are not equal. Example: Independent Group ANOVA (One-Way Analysis of variance) The FEED_ANOVA.
XLS file contains information on four different feeds and weight gain of animals after they had been fed one of the feeds for a period of time. You want to know if any feed is better for producing weight gain. A| B| C| D| 60. 8| 78. 7| 92. 6| 86. 9| 67| 77. 7| 84. 1| 82. 2| 54. 6| 76. 3| 90. 5| 83. 7| 61. 7| 79. 8| | 90. 3| Step 1: Open the file FEED_ANOVA or enter thedata into an Excel datasheet. Step 2: In Excel 2003 or earlier, pull down “Tools” to “Data Analysis” In Excel 2007 click on Data then Data Analysis. Step 3: Select Anova: Single Factor.
Step 4: In the following Dialog box, enter the input range that corresponds to the data columns ($A$1:$D$5) and click OK. Check the option “Labels in First Row”. The tesults appear in a new worksheet, as shown here: In this output, the test statistic, F, is reported in the analysis of variance table, F(3,11) = 39. 82 . The p-value for this statistics is p; 0. 001 (reported in the table as 3. 36E-E06). This means that there is evidence that there are differences in the means across groups. Unfortunately, Excel does not include a standard multiple comparison test you can use to determine which means are different from the others.
Step 5: One way to determine specific difference is to perform paired analyses of the group, two at a time. For example, compare the mean for group A vs the mean for group B, then A vs C then A vs D and so on. In Excel, your option is to do this using multiple two-sample t-tests. If you do these pairwise comparisons, you should modify the resulting p-value for each t-test, since performing multiple t-tests increases the probablity of finding an incorrect significance. To correct for this problem you should multiple the p-values for each of the pair-wise comparisons by the number of comparisons. This is called a Bonferonni adjustment.
For example, in this case your comparisons are A vs B, A vs C, A vs D, B vs C, B vs D, and C vs D — 6 pairwise comparisons in all. Thus, you’d correct each t-test p-value by multiplying it by 6. For example, a t-test comparison of Mean A vs Mean C (61. 025 vs 89. 067) yields an unadjusted two-tail p-value p=0. 0006. The adjusted p-value (the one you should report) would be 0. 0006*6 = 0. 0036 Determine Equality of Variance for two groups Equality of Variance using Excel In several statistical analyses, in particular an independent group t-test, and assumption is that the groups have equal variances.
This example illustrates how you can test that assumption. The hypotheses for this test are Ho: ? 1 = ? 2 (null hypothesis, variances are equal) Ho: ? 1 ? ?2 (alternative hypothesis, variances are not equal) For example, suppose we are interested in comparing SCORES across GROUPS, where there are two groups. The purpose is to determine if the mean SCORE on a test is different for the two groups tested (i. e. , control and treatment groups). The example data is shown here: Group | Scores | 1 | 20 | 1 | 23 | 1 | 32 | 1 | 24 | 1 | 25 | 1 | 28 | 1 | 27. 5 | 2 | 25 | | 46 | 2 | 56 | 2 | 45 | 2 | 46 | 2 | 51 | 2 | 34 | 2 | 47. 5 | In this example, the GROUP variable contains two group indicators, group “1” or “2”, indicating which group each subject was in. To decide if the variances are equal in both groups, (which can determine the type of t-test to perform) you can perform a statistical test to determine equality. To do a statistical test to determine equality of variance, follow these instructions. (The test for equality of variances is an F-test. ) 1. In Excel, select Tools/ Data Analysis / F-Test Two Sample for Variance. 2.
In the F-Test Two Sample for Variance dialog box: For the Input Range for Variable 1, highlight the seven values of Score in group 1 (values from 20 to 27. 5). For the input range for Variable 2, highlight the eight values of Score in group 2 (values from 25 to 47. 5). Leave the other items at their default selections. This dialog box is shown below. Click OK. The following results are produced by Excel: F-Test Two-Sample for Variances | | | | | | | Variable 1 | Variable 2 | Mean | 25. 64285714 | 43. 8125 | Variance | 15. 22619048 | 96. 42410714 | Observations | 7 | 8 | df | 6 | 7 |
F | 0. 157908545 | | P(F;=f) one-tail | 0. 019378053 | | F Critical one-tail | 0. 23771837 | | Notice the highlighted probability p=0. 01937. This is a one-tail p-value associated with the test for equality of variance. Generally, if this value is less than 0. 05 you assume that the variances are NOT equal. How to use this information when doing an independent group t-test: 1. If the variances are assumed to NOT be equal, proceed with the t-test that assumes non-equal variances. 2. If the variances are assumed to be equal, proceed with the t-test that assumes equal variances.