1、Chap 12-1 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Chapter 12 Simple Linear Regression?性回?Business Statistics 学?目?1.?性回?模型2.?的可靠程度3.?与真?之?的?差范?4.回?的前提条件 n残差分析;时间序列的自相关DW统计 5.斜率与截距的?推断:n斜率的标准误,斜率t检验与F检验,斜率的置信区间 6.相关系数t?7.均?估?与个?8.回?分析的缺陷与措施Chap 12-2 Copyright 2013 Pearson Educ
2、ation,Inc.publishing as Prentice Hall Chap 12-3 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Correlation相关 vs.Regression回?nA scatter plot can be used to show the relationship between two variables 散点?可以用来反映两?量?的关系 nCorrelation analysis is used to measure the strength of the assoc
3、iation(linear relationship)between two variables相关分析可以用来分析两?量?的关系 nCorrelation is only concerned with strength of the relationship 相关分析只能反映相关关系的强度 nNo causal effect is implied with correlation相关不代表因果关系 nScatter plots were first presented in Ch.2 散点图见第二章 nCorrelation was first presented in Ch.3 相关分析见
4、第三章 DCOVA Chap 12-4 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Introduction to Regression Analysis nRegression analysis is used to回?分析用于:nPredict the value of a dependent variable based on the value of at least one independent variable 利用至少一个因变量的值来预测自变量 nExplain the impact of c
5、hanges in an independent variable on the dependent variable 解释自变量对因变量的影响 Dependent variable:the variable we wish to 因?量 predict or explain希望?的 Independent variable:the variable used to predict 自?量 or explain the dependent variable 用于?因?量 DCOVA Chap 12-5 Copyright 2013 Pearson Education,Inc.publishin
6、g as Prentice Hall Simple Linear Regression Model?性回?nOnly one independent variable,X。n 只有一个自?量X nRelationship between X and Y is described by a linear function。nX和Y的关系用一个?性方程来描述 nChanges in Y are assumed to be related to changes in X。nY的改?受X的改?影响 DCOVA Chap 12-6 Copyright 2013 Pearson Education,Inc
7、.publishing as Prentice Hall Types of Relationships 相关的?型 Y X Y X Y Y X X Linear relationships?性相关性相关 Curvilinear relationships 曲曲?相关相关 DCOVA Chap 12-7 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Types of Relationships相关的?型 Y X Y X Y Y X X Strong relationships?相关相关 Weak relation
8、ships弱相关弱相关 Chap 12-8 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Types of Relationships Y X Y X No relationship不相关不相关(continued)DCOVA Chap 12-9 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall ii10iXY+=Linear component?性元素 Simple Linear Regression Model?性回?模型 Po
9、pulation Y intercept 截距 Population Slope 斜率 Coefficient Random Error term 随机?差 Dependent Variable 因?量 Independent Variable 自?量 Random Error component 随机?差 DCOVA Chap 12-10 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall (continued)Random Error for this Xi value Y X Observed Value?of
10、 Y for Xi Predicted Value?of Y for Xi ii10iXY+=Xi Slope=1 Intercept=0 i Simple Linear Regression Model?性模型 DCOVA Chap 12-11 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall i10iXbbY+=The simple linear regression equation provides an estimate of the population regression line?性回?方程提供了
11、?体估?的回?Simple Linear Regression Equation?性回?方程(Prediction Line)Estimate of the regression intercept 截距估?Estimate of the regression slope斜率估?Estimated (or predicted)Y value for observation i Value of X for observation i DCOVA Chap 12-12 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall
12、 The Least Squares Method 最小二乘法 b0 and b1 are obtained by finding the values of that minimize the sum of the squared differences between Y and :使平方差最小的b0 和b1 2i10i2ii)Xb(b(Ymin)Y(Ymin+=YDCOVA Chap 12-13 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Finding the Least Squares Equati
13、on nThe coefficients b0 and b1,and other regression results in this chapter,will be found using SPSS nb0 and b1 用SPSS求解 DCOVA Chap 12-14 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall nb0 is the estimated mean value of Y when the value of X is zero nb0 is X?0?候的Y?nb1 is the estimat
14、ed change in the mean value of Y as a result of a one-unit increase in X nb1 是X?化1?位?Y?的改?Interpretation of the Slope and the Intercept DCOVA Chap 12-15 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Simple Linear Regression Example nA real estate agent wishes to examine the relati
15、onship between the selling price of a home and its size(measured in square feet)一个房地?人欲?一个房?面?大小与?售价格之?的关系 nA random sample of 10 houses is selected随机?10?房子 nDependent variable(Y)=house price房价 Independent variable(X)=square feet房屋面积 DCOVA Chap 12-16 Copyright 2013 Pearson Education,Inc.publishing a
16、s Prentice Hall Simple Linear Regression Example:Data House Price in$1000s(Y)Square Feet (X)245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 1700 DCOVA Chap 12-17 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall 050100150200250300350400450050010001
17、500200025003000House Price($1000s)Square FeetSimple Linear Regression Example:Scatter Plot散点?House price model:Scatter Plot DCOVA Chap 12-18 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall 050100150200250300350400450050010001500200025003000Square FeetHouse Price($1000s)Simple Linear
18、 Regression Example:Graphical Representation House price model:Scatter Plot and Prediction Line feet)(square 0.10977 98.24833 price house+=Slope =0.10977 Intercept =98.248 DCOVA Chap 12-19 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Simple Linear Regression Example:Interpretatio
19、n of bo nb0 is the estimated mean value of Y when the value of X is zero(if X=0 is in the range of observed X values)b0 是x?0?的Y?nBecause a house cannot have a square footage of 0,b0 has no practical application因?房屋不可能面?0,因此在?个例子里,b0没有?含?feet)(square 0.10977 98.24833 price house+=DCOVA nb1 estimates
20、the change in the mean value of Y as a result of a one-unit increase in X nb1 是X?化1?位?Y?的改?nHere,b1=0.10977 tells us that the mean value of a house increases by.10977($1000)=$109.77,on average,for each additional one square foot of size Chap 12-20 Copyright 2013 Pearson Education,Inc.publishing as P
21、rentice Hall Simple Linear Regression Example:Interpreting b1 feet)(square 0.10977 98.24833 price house+=DCOVA Chap 12-21 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall 317.7800)0.10977(20 98.24833(sq.ft.)0.10977 98.24833 price house=+=+=Predict the price for a house with 2000 squa
22、re feet:The predicted price for a house with 2000 square feet is 317.78($1,000s)=$317,780 Simple Linear Regression Example:Making Predictions DCOVA Chap 12-22 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall 050100150200250300350400450050010001500200025003000Square FeetHouse Price($1
23、000s)Simple Linear Regression Example:Making Predictions nWhen using a regression model for prediction,only predict within the relevant range of data Relevant range for interpolation Do not try to extrapolate beyond the range of observed Xs 不要?推断超出?察到的X的范?的Y?DCOVA n利用?个模型所?的Y?是完全正确的??n有多可靠?n如何?估?的可靠
24、程度?Chap 12-23 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Chap 12-24 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Measures of Variation 离差的估?nTotal variation is made up of two parts:SSE SSR SST+=Total Sum of Squares?平方和 Regression Sum of Squares回?平方和 Error Su
25、m of Squares 残差平方和=2i)YY(SST=2ii)YY(SSE=2i)YY(SSRwhere:=Mean value of the dependent variable Yi=Observed value of the dependent variable =Predicted value of Y for the given Xi value iYYDCOVA Chap 12-25 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall nSST=total sum of squares (Total
26、Variation?离差)nMeasures the variation of the Yi values around their mean Y 衡量所有的观测值Y与其均值之间的离差 nSSR=regression sum of squares (Explained Variation)nVariation attributable to the relationship between X and Y 衡量受X影响的预测值Y与其均值之间的离差 nSSE=error sum of squares (Unexplained Variation)nVariation in Y attributa
27、ble to factors other than X 衡量由X以外的因素所造成的离差(continued)Measures of Variation DCOVA Chap 12-26 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall (continued)Xi Y X Yi SST=(Yi-Y)2 SSE=(Yi-Yi)2 SSR=(Yi-Y)2 _ _ _ Y Y Y _ Y Measures of Variation DCOVA Chap 12-27 Copyright 2013 Pearson Educat
28、ion,Inc.publishing as Prentice Hall nThe coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable 回?平方和/?平方和 nThe coefficient of determination is also called r-squared and is denoted as r2 Coefficient of D
29、etermination 可决系数(?合?度),r2 1r02note:squares of sum total squares of sum regression2=SSTSSRrDCOVA Chap 12-28 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall r2=1 Examples of Approximate r2 Values Y X Y X r2=1 r2=1 Perfect linear relationship between X and Y:100%of the variation in Y
30、is explained by variation in X 100%的的Y的的?化均由化均由X决定决定 DCOVA Chap 12-29 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Examples of Approximate r2 Values Y X Y X 0 r2 1 Weaker linear relationships between X and Y:Some but not all of the variation in Y is explained by variation in X Y的
31、一部分的一部分?化均由化均由X决定决定 DCOVA Chap 12-30 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Examples of Approximate r2 Values r2=0 No linear relationship between X and Y:The value of Y does not depend on X.(None of the variation in Y is explained by variation in X)Y不受不受X影响影响 Y X r2=0 DCOVA
32、 n?的偏差会有多大?n?与?的偏离程度Chap 12-31 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Chap 12-32 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Standard Error of Estimate估?的?准?nThe standard deviation of the variation of observations around the regression line is estimated
33、 by Y的?与?的偏离程度 2)(212=nYYnSSESniiiYXWhere SSE =error sum of squares n=sample size DCOVA Chap 12-33 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Comparing Standard Errors Y Y X X YXS smallYXS largeSYX is a measure of the variation of observed Y values from the regression line Y的?与
34、?的偏离程度 The magnitude of SYX should always be judged relative to the size of the Y values in the sample data?SYX?估?基于它相?于Y?的大小 i.e.,SYX=$41.33K is moderately small relative to house prices in the$200K-$400K range DCOVA Chap 12-34 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Assump
35、tions of Regression L.I.N.E 回?的前提假?nLinearity?性 nThe relationship between X and Y is linear nIndependence of Errors?差?相互独立 nError values are statistically independent n如:Ei不受Ei-1影响 nNormality of Error?差?呈正?分布 nError values are normally distributed for any given value of X nEqual Variance(also called
36、 homoscedasticity)同方差 nThe probability distribution of the errors has constant variance误差的概率分布有相同的方差 DCOVA Chap 12-35 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Residual Analysis残差分析 nThe residual for observation i,ei,is the difference between its observed and predicted value残差
37、是?与?之?的差?nCheck the assumptions of regression by examining the residuals残差?是?了?回?分析的前提假?是否成立 nExamine for linearity assumption 线性假设 nEvaluate independence assumption 独立性假设 nEvaluate normal distribution assumption 正态性假设 nExamine for constant variance for all levels of X(homoscedasticity)同方差假设 nGraphi
38、cal Analysis of Residuals nCan plot residuals vs.X 绘制残差与X的散点图 iiiYYe=DCOVA Chap 12-36 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Residual Analysis for Linearity?性假?的残差分析 Not Linear Linear x residuals x Y x Y x residuals DCOVA Chap 12-37 Copyright 2013 Pearson Education,Inc.publ
39、ishing as Prentice Hall Residual Analysis for Independence 独立性的残差分析 Not Independent Independent X X residuals residuals X residuals DCOVA Chap 12-38 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Checking for Normality 正?性的残差分析 nExamine the Stem-and-Leaf Display of the Residuals?制茎
40、叶?nExamine the Boxplot of the Residuals 箱?nExamine the Histogram of the Residuals 直方?nConstruct a Normal Probability Plot of the Residuals 正?概率?DCOVA Chap 12-39 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Residual Analysis for Normality Percent Residual When using a normal proba
41、bility plot,normal errors will approximately display in a straight line QQ?-3 -2 -1 0 1 2 3 0 100 DCOVA Chap 12-40 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Residual Analysis for Equal Variance 同方差?Non-constant variance Constant variance x x Y x x Y residuals residuals DCOVA C
42、hap 12-41 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall nUsed when data are collected over time to detect if autocorrelation is present 用于?序列,?自相关是否存在 nAutocorrelation exists if residuals in one time period are related to residuals in another period 如果残差不是独立的(下一个?点的残差受之前的残差影响),自相关
43、存在 Measuring Autocorrelation:The Durbin-Watson Statistic 自相关:DW?DCOVA Chap 12-42 Autocorrelation自相关 nAutocorrelation is correlation of the errors(residuals)over time 自相关指残差在?度上的相关 nViolates the regression assumption that residuals are random and independent 回?的假?不成立 Time(t)Residual Plot-15-10-505101
44、502468Time(t)ResidualsnHere,residuals show a cyclic pattern(not random.)Cyclical patterns are a sign of positive autocorrelation循?模式是正的自相关的?志 DCOVA Chap 12-43 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall The Durbin-Watson Statistic DW?量=n1i2in2i21iie)ee(D The possible range is 0
45、D 4 D should be close to 2 if H0 is true D=2?,H0成立,D2?,?自相关 D less than 2 may signal positive autocorrelation,D greater than 2 may signal negative autocorrelation nThe Durbin-Watson statistic is used to test for autocorrelation H0:residuals are not correlated H1:positive autocorrelation is present D
46、COVA Chap 12-44 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall Testing for Positive Autocorrelation 正自相关?Calculate the Durbin-Watson test statistic=D 1.?算DW?量 (The Durbin-Watson Statistic can be found using Excel or Minitab)Decision rule:reject H0 if D dL H0:positive autocorrelatio
47、n does not exist H1:positive autocorrelation is present H0:正自相关不存在;H1:正自相关存在 0 dU 2 dL Reject H0 Do not reject H0 Find the values dL and dU from the Durbin-Watson table (for sample size n and number of independent variables k)根据?本容量和自?量k?找DW表得到DL和Du Inconclusive DCOVA Chap 12-45 Copyright 2013 Pears
48、on Education,Inc.publishing as Prentice Hall nSuppose we have the following time series data:nIs there autocorrelation?是否存在自相关?y=30.65+4.7038x R2=0.8976020406080100120140160051015202530TimeSalesTesting for Positive Autocorrelation 正自相关?DCOVA Chap 12-46 Copyright 2013 Pearson Education,Inc.publishing
49、 as Prentice Hall nExample with n=25:Durbin-Watson Calculations Sum of Squared Difference of Residuals 3296.18 Sum of Squared Residuals 3279.98 Durbin-Watson Statistic 1.00494 y=30.65+4.7038x R2=0.8976020406080100120140160051015202530TimeSalesTesting for Positive Autocorrelation(continued)Excel/PHSt
50、at output:1.004943279.983296.18e)e(eDn1i2in2i21ii=DCOVA Chap 12-47 Copyright 2013 Pearson Education,Inc.publishing as Prentice Hall nHere,n=25 and there is k=1 one independent variable nUsing the Durbin-Watson table,dL=1.29 and dU=1.45 nD=1.00494 dL=1.29,so reject H0 and conclude that significant po