The CORRELATION function returns the correlation coefficients for the pairs of data values in two expressions. A correlation coefficient indicates the strength of relationship between the data values. The closer the correlation coefficient is to positive or negative 1, the stronger the relationship is between the data values in the expressions. A correlation coefficient of 0 (zero) means no correlation and a +1 (plus one) or -1 (minus one) means a perfect correlation. A positive correlation coefficient indicates that as the data values in one expression increase (or decrease), the data values in the other expression also increase (or decrease). A negative correlation coefficient indicates that as the data values in one expression increase, the data values in other expression decrease.
A dimensioned numeric expression with at least one dimension in common with expression2.
A dimensioned numeric expression with at least one dimension in common with expression1.
Calculates the Pearson product-moment correlation coefficient. Use this method when the data is interval-level or ratios, such as units sold and price for each unit, and the data values in the expressions have a linear relationship and are distributed normally.
Calculates Spearman's rho correlation coefficient. Use this nonparametric method when the expressions do not have a linear relationship or a normal distribution. In computing the correlation coefficient, this method ranks the data values in expression1 and in expression2 and then compares the rank of each element in expression1 to the corresponding element in expression2. This method assumes that most of the values in the expressions are unique.
Calculates Kendall's tau correlation coefficient. This nonparametric method is similar to the SPEARMAN method in that it also first ranks the data values in expression1 and in expression2. The KENDALL method, however, compares the ranks of each pair to the successive pairs. Use this method when few of the data values in expression1 and in expression2 are unique.
An optional list of dimensions along which CORRELATION computes the correlation coefficient. Both expression1 and expression2 must be dimensioned by all of the dimension-list dimensions. CORRELATION correlates the data values of expression1 to those of expression2 along all of the dimension-list dimensions. CORRELATION returns an array that contains one correlation coefficient for each cell that is dimensioned by all of the dimensions of expression1 and expression2 except those in dimension-list.
When you do not specify a dimension-list argument, then CORRELATION computes the correlation coefficient over all of the common dimensions of expression1 and expression2. When all of the dimensions of the two expressions are the same, then CORRELATION returns a single correlation coefficient. When either expression contains dimensions that are not shared by the other expression, then CORRELATION returns an array that contains one correlation coefficient for each cell that is dimensioned by the dimensions of the expressions that are not shared.
The Effect of NASKIP on CORRELATION
CORRELATION is affected by the NASKIP option. When NASKIP is set to YES (the default), then CORRELATION ignores NA values. When NASKIP is set to NO, then an NA value in the expressions results in a correlation coefficient of NA.
Example 7-51 Correlating with the PEARSON Method
Assume that your analytic workspace contains two variables named units and price. The two dimensions of the price variable, month and product, are shared by the units variable, which has a third dimension, district.
The following CORRELATION statement does not specify a dimension-list argument. The output of the CORRELATION function in the statement is one correlation coefficient for each of the dimension values in the dimension that the variables do not have in common.
REPORT CORRELATION(units price pearson)
The preceding statement produces the following output.
               CORRELATION
                 (UNITS
                  PRICE
DISTRICT        PEARSON)
-------------- -----------
Boston               -0.75
Atlanta              -0.85
Chicago              -0.83
Dallas               -0.66
Denver               -0.83
Seattle              -0.69
The following statements limit the month and product dimensions.
LIMIT month to 'Jan96' TO 'Mar96' LIMIT product TO 'Tents' TO 'Racquets'
The following statement reports the correlation coefficient based on the product dimension for the limited dimension values that are in status.
REPORT CORRELATION(units price pearson basedon product)
               CORRELATION(UNITS PRICE PEARSON-
               --------BASEDON PRODUCT)--------
               -------------MONTH--------------
DISTRICT         Jan96      Feb96      Mar96
-------------- ---------- ---------- ----------
Boston              -0.96      -0.90      -0.89
Atlanta             -0.97      -0.97      -0.97
Chicago             -0.96      -0.95      -0.95
Dallas              -0.98      -0.98      -0.99
Denver              -0.97      -0.97      -0.97
Seattle             -0.89      -0.83      -0.83
The following statement reports the correlation coefficient based on the month dimension for the limited dimension values.
REPORT CORRELATION(units price pearson basedon month)
               CORRELATION(UNITS PRICE PEARSON-
               ---------BASEDON MONTH)---------
               ------------PRODUCT-------------
DISTRICT         Tents      Canoes    Racquets
-------------- ---------- ---------- ----------
Boston              -0.59      -0.92      -0.55
Atlanta             -0.73      -0.83       0.03
Chicago             -0.91      -0.84      -0.68
Dallas              -0.86      -0.92       0.31
Denver              -0.98      -0.94      -0.67
Seattle             -0.98      -0.89      -0.70