# Correlation

Economics 225
Excel Assignment 4: Correlation and Linear Regression (Office 2016 Version)
30 Points Total, Submit via eCampus by 8:00 AM on Tuesday, May 1, 2018
Important Note for PC Users:
Data Analysis/Analysis ToolPak is needed for this assignment. If Data Analysis is not shown on your computer, use the instructions that follow:
(1) If you are using the 2010 or 2013 version of Excel, click on the File tab in the upper left corner of the screen (for the 2007 version of Excel, click on the Office button in the top left corner of the screen).
(2) At the bottom of the box that appears, click on Excel Options, then click on Add-Ins in the left
margin.
(3)At the bottom of the Add-In box, make sure that Excel Add-Ins shows in the Manage box, then click the Go button.
(4)In the new Add-In box, check mark (select) Analysis ToolPak (at the top of the add-in list), then click OK.
(5) The Analysis group should now show at the top right of your screen. In this group, select Data Analysis.
Note: The Analysis ToolPak is available in the B&E computer lab, but may not be available in other campus computer labs. Also, if you open the Excel file within your Internet browser, such as Internet Explorer, the Analysis ToolPak is not likely to be available.

Important Note for Mac Users:
Mac users with the 2016 version of Excel will use the link for add-ins and then will follow instructions similar to those above. The attached document on how to find analysis ToolPak for Mac users might help.
Mac users with an earlier version of Excel will need to use the StatPlus for Mac add-in:
http://www.analystsoft.com/en/products/statplusmacle. (Download the free version.) There are plenty of tutorials on the Internet on how to do things like regression analysis. I have copied the url for what seem to be two decent tutorial websites for regression using StatPlus for Mac:

http://faculty.chicagobooth.edu/drew.creal/teaching/basicCourseMaterial/tutorials/MAC%20Tutorial%2
04%20-%20How%20to%20run%20a%20Regression.pdf

Assignment Instructions Begin on Next Page
For this assignment, you will use “Excel 4_Correlation and Linear Regression.xlsx”. This file contains data on the 55 counties of West Virginia. The definitions of the variables are listed below the table of data.
Save your work early and often. Make sure your name is part of the document name.
Part I: Linear Correlation
We will investigate the linear correlation between a county’s per capita income (Income) and the percent of the county’s population with at least a college diploma (College). One should probably expect a positive correlation between Income and College: on average, counties with a greater percent of the population with at least a college diploma should have relatively more workers in higher-paying jobs and thus should have a higher average income.
Click Data, then look for the Analysis group and click on Data Analysis. After selecting Data
Analysis, select Correlation, then click OK.
Click to put the cursor inside the Input Range box. Highlight the Income and College data, including the column headings.
Mark the other input information as appropriate. For example, check the “labels” box to let Excel know that you highlighted the column heading along with the data. Tell Excel where to put the output usually I recommend a separate worksheet for our statistical results, but the results here are so few, that you should go ahead and place them on the same worksheet, just to the right of your data. When done,
click OK.
Your results should be in the form of what is called a triangular matrix. Each number shows the simple correlation between the pairs of variables listed in the row and column headings. The bottom left number is the estimated (sample) correlation coefficient between Income and College. The numbers on the diagonal are all equal to 1 because they show the correlation between a variable and itself, which is a perfect positive correlation.
Analysis
Basic Question:
I.1) Very briefly, but precisely, interpret the estimated correlation coefficient (tell me the direction of the correlation and the strength of the correlation).
Go to Next Page
Part II: Simple Linear Regression (one explanatory variable)
Click Data, then look for the Analysis group and click on Data Analysis. After selecting Data
Analysis, select Regression, then click OK.
Now you have to tell Excel where to find the data for the dependent variable (Y Range) and the
independent variable (X Range). Click to put the cursor inside the Input Y Range box. Highlight the column of Income data (including the column heading).
Repeat the above procedure for the Input X Range box, using the data in the College column.
Mark the other input information as appropriate. For example, check the “labels” box to let Excel know that you highlighted the column heading along with the data.
Tell Excel where to put the output (the default is to place it in a separate worksheet – this is what I would recommend for this output).
Excel always provides the basic results from your regression. This is all that I want you to turn in. If you want some additional information on residuals (prediction errors), select one or more items in the Residuals section. Often the Line Fit Plots are interesting. Do not turn in this type of additional information. When you have made all your selections, click OK.
Your regression results table may be scrunched. If so, while the regression results are still highlighted, click on the Home tab. Find the Cells group, then select Format, then AutoFit Column Width. Excel will automatically widen the columns so that you can read everything.
If you selected one of the plots, you can expand the size of the chart using the graphing methods discussed in your first Excel assignment.
Analysis
Basic Questions:
II.1) List (and label) the dependent variable and the independent variable.
II. 2) Based on the regression results, if a county’s percent of the population with at least a college diploma increases by 1, by how much and in what direction does a county’s Per Capita Income change?
II.3) What is the predicted Per Capita Income for a county with 26 percent population having at least a College’s diploma?
II.4) What proportion of the variation in Income is explained by our regression model (explained by variation in College)?
Part III: Multiple Regression (more than one explanatory variable)
Begin the same way as for Part II, by telling Excel where to find the data for the dependent variable (Input Y Range).
For the Input X Range, highlight all four of the other data columns (columns C-F), including the
column headings. Mark the other input information as appropriate.
Tell Excel where to put the output (I recommend a separate worksheet), and then click OK.
As you did in Part II, use Format/AutoFit Column Width to make your regression results readable.
Analysis of the Regression Results
Basic Questions:
III.1) List (and label) the dependent variable and the independent variables.
III.2) Based on the regression results, if a county’s percent of Foreign Born Population increases by one, by how much (and in what direction) does Per Capita Income change (all else equal)?
III.3) What is the predicted Per Capita Income for a county, where College =30, Foreign= 3, Nature = 2 and Urban= 3? (Hint: This is all done in one calculation to get a single prediction; you do not get a separate prediction for each explanatory variable.)
III.4) What proportion of the variation in Income is explained by our regression model (variations in all of our explanatory variables)?
Statistical Inference Questions:
III.5) Provide the upper and lower bounds for a 95 percent confidence interval estimate for the effect a county’s college population percentage on the county’s per capita income.
III.6) Provide the upper and lower bounds for a 90 percent confidence interval estimate for the effect a county’s college population percentage on the county’s per capita income.
III.7) At the ten percent significance level, does a higher percent of population born in a foreign country, lead to lower per capita income for the county? Use the standard decision rule approach and show all steps.
III.8) At the five percent significance level, is there evidence that a county’s urban rank affect a county’s per capita income? Use the standard decision rule approach and show all steps.

What to submit via eCampus by 8:00 AM on Tuesday, May 1:
An Excel file that contains at least the following:
1) Your correlation coefficient matrix from Part I;
2) Your regression outputs (tables) for Parts II and III – just the basic output, not all of the extraneous graphs or charts);
3) Answers to the analysis questions from Parts I, II, and III. You can do your answers on the same sheets as the Excel output or put them on separate pages.
Submit your saved assignment through eCampus.
1) Click on the “Excel Assignments” link (on the left side of the Course Content page)
2) Click on the “Excel 4_Correlation and Linear Regression” folder
3) Click on the “Excel 4_Correlation and Linear Regression” assignment.
4) The submission box is the second part of this screen. Browse to find your file and then be sure to click on the Submit button at the bottom of the screen. You should get an acknowledgement of your submission.
5) Make sure the file that you submit is an Excel file and not something like a .lnk file. Please make sure that you have submitted a readable file.
6) If you realize that you made a mistake or submitted an incomplete assignment, you can go back in and do an additional submission.
7) If, for some reason, you are unable to submit a readable file via e-campus, please email me a readable excel copy of your work by the due date.