# Correlation, Regression and Normal Distribution Practice Questions

a. The headline of a January 31, 2005 USA Today article read, “‘January Barometer’ predicts a pretty lousy year.” Referring to the stock market (and in particular the S&P 500), the article goes on to say that the month of January was “turning out to be a loser and chances are 2005 will be, too.” To support the claim, the article presents S&P 500 performance data for the past 10 years. Besides the year, the columns are the S&P 500 returns for the month of January and for the entire year.

As you can see, increases (shown in green) in January are typically associated with increases for the full year, while decreases (shown in red) in January are typically coupled with decreases for the full year. There is, however, a possibility that the association that is seen in the data is due to random chance. What could we do to see if the asserted relationship between January returns and the Full Year’s returns is real (i.e., not due only to sampling error)? Using cell Q5, indicate your answer using the number associated with the best choice below.

1. Use Excel’s regression analysis tool to estimate the relationship between the full year’s returns (the dependent variable) and January’s returns (the independent variable). If Significance F value is large (bigger than .05, say), the relationship is real.

2. Use Excel’s regression analysis tool to estimate the relationship between the full year’s returns (the dependent variable) and January’s returns (the independent variable). If the Lower 95% value associated with January’s returns is negative and the upper 95% value is positive, the relationship is real.

3. Draw a scatter diagram with January’s returns on the x-axis and the full year’s returns on the y-axis and use Excel’s Add Trendline feature to estimate the relationship between the two variables. If the slope of the line is not zero, then the relationship is real.

4. Use Excel’s regression analysis tool to estimate the relationship between the full year’s returns (the dependent variable) and January’s returns (the independent variable). If the coefficient associated with January returns is not zero, then the relationship is real.

5. Use Excel’s regression analysis tool to estimate the relationship between the full year’s returns (the dependent variable) and January’s returns (the independent variable). If the p-value associated with January returns is less than our level of significance, the relationship is real.

b. When analyzing the slope in a regression analysis (i.e., the relationship between the dependent variable and one of the independent variables), which of the following would be a Type II error? Indicate your answer in cell Q11.

1. To conclude that the slope (relationship) is not significant when it really is.

2. To conclude that the slope (relationship) is significant when it really is.

3. To conclude that the slope (relationship) is not significant when it really is not.

4. To conclude that the slope (relationship) is significant when it really is not.

5. There cannot be a Type II error in this situation.

c. Which of the following is **not** something we can learn from a scatter plot? Give your answer in cell Q17.

Whether or not there are outliers in the data.

Whether or not there is any relationship between the two variables.

Whether or not there is a curved relationship between the two variables.

Whether or not there is a causal relationship between the two variables.

All of the above can be learned from a scatter plot.

d. The weekly demand for a particular automobile manufacturer follows a normal distribution with a mean of 50,000 cars and a standard deviation of 10,000. There is a 2% chance that this company will sell more than what number of cars during the next week? Report your answer as an integer. Put your answer in cell S23.

e. An automotive repair shop has determined that the average service time on an automobile is 1.5 hours with a standard deviation of 35 minutes. A random sample of 70 services is selected. What is the probability of finding a sample mean of 96 minutes or larger if the population mean is still 1.5 hours? Give your answer to 4 decimal places. Put your answer in cell S24.

f. A news account of a nationwide survey taken by Lou Harris (a well known and reputable opinion polling organization) says 25% of the 1,604 persons responding named the Democrats as the best able to handle the nation’s problems. The news report does not give a margin of error. Based on the information available, compute the margin of error (for 95% confidence). If you believe that there is not enough information to compute the margin of error, enter 0 as your answer. Otherwise, give your answer to 4 decimal places. Put your answer in cell S25.

g. Carpetland salespersons have averaged $8000 per week in sales. Steve Conois, the firm’s vice president, proposes a compensation plan with new selling incentives. Steve hopes that the results of a trial selling period will enable him to conclude (prove) that the compensation plan increases the average sales per salesperson. Which of the following would be a Type I error? Enter your answer in cell S26.

1. To conclude that the average sales per salesperson have not increased when they really have.

2. To conclude that the average sales per salesperson have not increased when they really have not.

3. To conclude that the average sales per salesperson have increased when they really have.

4. To conclude that the average sales per salesperson have increased when they really have not.

5. There cannot be a Type I error in this situation.

h. Consider two experiments. First, from a population that is normally distributed with mean 10, we select one item and find its weight. Let D1 be the distribution of possible outcomes from experiment 1. Second, we take a sample of 5 items from the same population and calculate the average weight of the 5 items. Let D2 be the distribution of possible outcomes (sample averages) resulting from experiment 2. Which of the following statements is true and which is not true? Use cells Q34:Q38 to make your selections.

1. D1 and D2 have the same mean.

2. D1 and D2 are both normally distributed.

3. D1 is wider than D2

4. D1 is narrower than D2

5. D1 and D2 have the same spread (standard deviation)

i. Suppose that we have sampled n observations from a normal distribution and found a 99% confidence interval for a population mean. If the sample size decreases and the confidence level decreases from 99% to 95%, indicate whether the interval will definitely get wider, narrower, or will not be definite either way. Assume that the sample standard deviation remains constant with the new sample size. Use cell Q40.