Wednesday, July 17, 2019

Sta302 – Assignment 2

From the Scatter dapple of Revenue vs. Circulation, it can be seen that the variance of the dependent variable, Revenue, is increasing. This is a assault of the Gauss-Markov condition of constant variance in the error terms. Also, since a linear kind seems appropriate, transformation of some(prenominal) the dependent and unconditional variables atomic number 18 necessary. 2) Fitting polynomial works to the data whitethorn be better than accommodate a straight line model to the untransformed data because this allows for curve ball and can conciliate the data more closely.However, this might non be sufficient because it does non bank none for nonconstant variance. 3) The subjective log transformation of some(prenominal) variables provides the outgo model of the three. From the plot of the retrogression Line for lnRevenue vs. lnCirculation, it can be seen that the points be relatively equally scattered just about the regression line. Also, the nonconstant variance se ems to be placemented. This is unmistakable in the plot of the residuals vs. predicted values, as the points are randomly scattered about the revolve around line.The square answer transformation of both variables improves linearity, as indicated in the plot of the turnabout Line for sqrtRevenue vs. sqrtCirculation, but does not fix the problem of non-constant variance. This can be clear seen in the plot of the residuals vs. predicted values. The points are not randomly scattered around the tenderness line, but seem to be bunched up on the left side and give outwards, indicating increasing variance. The inverse transformation of both variables does not improve linearity, as curvature can be seen in the plot of the Regression Line for invRevenue vs. invCirculation.Although non-constant variance is roughly improved over the square root transformation, as can be seen in the plot of the residuals vs. predicted values, it is still insufficient. Therefore, both variables natural l og transformed seems to be the best model of the three choices. 4) The model utilise is . This implies that . From this result, it can be seen that a k-fold transform in the circulation in millions results in a variety in revenue in thousands of dollars. From the regression, =0. 5334. This means that if circulation changes by a means of k, its revenue will also change by a factor of k0. 334. 5) From SAS, a 95% divination interval with a circulation of 1 million for the natural log of the revenue is (4. 3005, 5. 0202) with a predicted value of 4. 6604.This translates to a expectancy interval of ($73 736. 65, $151 441. 59) with a predicted revenue of $105 678. 35. 6) Since the threshold for realises D is 4/(n-2), where n=70, the threshold is 0. 059. There are fin values with Cooks D greater than 0. 059, which indicates that they are influential points. From the blueprint Q-Q plot of the residuals, these 5 points can be seen to be utliers at the ends of the graph. Therefore, t hey can greatly affect the fit of the model. Also from the chemical formula Q-Q plot, it can be seen that the residuals are not exactly normally distributed. The curvature at the ends of the plot indicates heavy go after in the distribution. By the Central Limit Theorem cartel intervals, and the values for , , and E(Y) are valid. However, since a prediction interval deals only with a champion point, it is not valid. Due to the heavy tails in the distribution of the error terms, the prediction interval calculated in 5) may not be accurate.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.