1、2.1,Chapter Two,Graphical and TabularDescriptive Techniques,2.2,Introduction&Re-cap,Descriptive statistics involves arranging,summarizing,and presenting a set of data in such a way that useful information is produced.Its methods make use of graphical techniques and numerical descriptive measures(suc
2、h as averages)to summarize and present the data.,Data,Statistics,Information,2.3,Populations&Samples,The graphical&tabular methods presented here apply to both entire populations and samples drawn from populations.,Population,Sample,Subset,2.4,Definitions,A variable is some characteristic of a popul
3、ation or sample.E.g.student grades.Typically denoted with a capital letter:X,Y,ZThe values of the variable are the range of possible values for a variable.E.g.student marks(0.100)Data are the observed values of a variable.E.g.student marks:67,74,71,83,93,55,48,2.5,Types of Data&Information,Data(at l
4、east for purposes of Statistics)fall into three main groups:Interval Data Nominal DataOrdinal Data,2.6,Interval Data,Interval data Real numbers,i.e.heights,weights,prices,etc.Also referred to as quantitative or numerical.Arithmetic operations can be performed on Interval Data,thus its meaningful to
5、talk about 2*Height,or Price+$1,and so on.,2.7,Nominal Data,Nominal Data The values of nominal data are categories.E.g.responses to questions about marital status,coded as:Single=1,Married=2,Divorced=3,Widowed=4These data are categorical in nature;arithmetic operations dont make any sense(e.g.does W
6、idowed 2=Married?!)Nominal data are also called qualitative or categorical.,2.8,Ordinal Data,Ordinal Data appear to be categorical in nature,but their values have an order;a ranking to them:E.g.College course rating system:poor=1,fair=2,good=3,very good=4,excellent=5While its still not meaningful to
7、 do arithmetic on this data(e.g.does 2*fair=very good?!),we can say things like:excellent poor or fair very goodThat is,order is maintained no matter what numeric values are assigned to each category.,2.9,Calculations for Types of Data,As mentioned above,All calculations are permitted on interval da
8、ta.Only calculations involving a ranking process are allowed for ordinal data.No calculations are allowed for nominal data,save counting the number of observations in each category.This lends itself to the following“hierarchy of data”,2.10,Hierarchy of Data,IntervalValues are real numbers.All calcul
9、ations are valid.Data may be treated as ordinal or nominal.OrdinalValues must represent the ranked order of the data.Calculations based on an ordering process are valid.Data may be treated as nominal but not as interval.Nominal Values are the arbitrary numbers that represent categories.Only calculat
10、ions based on the frequencies of occurrence are valid.Data may not be treated as ordinal or interval.,2.11,Graphical&Tabular Techniques for Nominal Data,The only allowable calculation on nominal data is to count the frequency of each value of the variable.We can summarize the data in a table that pr
11、esents the categories and their counts called a frequency distribution.A relative frequency distribution lists the categories and the proportion with which each occurs.,2.12,Example 2.1 Light Beer Preference Survey,In 2006 total light beer sales in the United States was approximately 3 million gallo
12、ns With this large a market breweries often need to know more about who is buying their product.The marketing manager of a major brewery wanted to analyze the light beer sales among college and university students who do drink light beer.A random sample of 285 graduating students was asked to report
13、 which of the following is their favorite light beer.,2.13,Example 2.1,1.Budweiser Light2.Busch Light3.Coors Light4.Michelob Light5.Miller Lite6.Natural Light7.Other brandThe responses were recorded using the codes.Construct a frequency and relative frequency distribution for these data and graphica
14、lly summarize the data by producing a bar chart and a pie chart.,2.14,Example 2.1 Xm02-01*,2.15,Frequency and Relative Frequency Distributions,2.16,Nominal Data(Frequency),Bar Charts are often used to display frequencies,2.17,Nominal Data(Relative Frequency),Pie Charts show relative frequencies,2.18
15、,Nominal Data,It all the same information,(based on the same data).Just different presentation.,2.19,Example 2.2,Table 2.3 lists the total energy consumption of the United States from all sources in 2005.To make it easier to see the details the table measures the heat content in metric tons(1,000 ki
16、lograms)of oil equivalent.For example,the United States burned an amount of coal and coal products equivalent to 545,259 metric tons of oil.Use an appropriate graphical technique to depict these figures.,2.20,Table 2.3 Xm02-02*,Non-Renewable Energy Sources Heat ContentCoal&coal products545,258Oil903
17、,440Natural Gas517,881Nuclear209,890Renewable Energy SourcesHydroelectric18,251Solid Biomass52,473Other(Liquid biomass,geothermal,20,533solar,wind,and tide,wave,&Ocean)Total2,267,726,2.21,Example 2.2,2.22,Graphical Techniques for Interval Data,There are several graphical methods that are used when t
18、he data are interval(i.e.numeric,non-categorical).The most important of these graphical methods is the histogram.The histogram is not only a powerful graphical technique used to summarize interval data,but it is also used to help explain probabilities.,2.23,Example 2.4,Following deregulation of tele
19、phone service,several new companies were created to compete in the business of providing long-distance telephone service.In almost all cases these companies competed on price since the service each offered is similar.Pricing a service or product in the face of stiff competition is very difficult.Fac
20、tors to be considered include supply,demand,price elasticity,and the actions of competitors.Long-distance packages may employ per-minute charges,a flat monthly rate,or some combination of the two.Determining the appropriate rate structure is facilitated by acquiring information about the behaviors o
21、f customers and in particular the size of monthly long-distance bills.,2.24,Example 2.4,As part of a larger study,a long-distance company wanted to acquire information about the monthly bills of new subscribers in the first month after signing with the company.The companys marketing manager conducte
22、d a survey of 200 new residential subscribers wherein the first months bills were recorded.These data are stored in file Xm02-04.The general manager planned to present his findings to senior executives.What information can be extracted from these data?,2.25,Example 2.4,In Example 2.1 we created a fr
23、equency distribution of the 5 categories.In this example we also create a frequency distribution by counting the number of observations that fall into a series of intervals,called classes.Ill explain later why I chose the classes I use below.,2.26,Example 2.4,We have chosen eight classes defined in
24、such a way that each observation falls into one and only one class.These classes are defined as follows:ClassesAmounts that are less than or equal to 15Amounts that are more than 15 but less than or equal to 30Amounts that are more than 30 but less than or equal to 45Amounts that are more than 45 bu
25、t less than or equal to 60Amounts that are more than 60 but less than or equal to 75Amounts that are more than 75 but less than or equal to 90Amounts that are more than 90 but less than or equal to 105Amounts that are more than 105 but less than or equal to 120,2.27,Example 2.4,2.28,Interpret,about
26、half(71+37=108)of the bills are“small”,i.e.less than$30,There are only a few telephonebills in the middle range.,(18+28+14=60)200=30%i.e.nearly a third of the phone billsare$90 or more.,2.29,Building a Histogram,Collect the Data Create a frequency distribution for the dataHow?a)Determine the number
27、of classes to useHow?Refer to table 2.6:,With 200 observations,we should have between 7&10 classes,Alternative,we could use Sturges formula:Number of class intervals=1+3.3 log(n),2.30,Building a Histogram,Collect the Data Create a frequency distribution for the dataHow?a)Determine the number of clas
28、ses to use.8b)Determine how large to make each classHow?Look at the range of the data,that is,Range=Largest Observation Smallest ObservationRange=$119.63$0=$119.63Then each class width becomes:Range(#classes)=119.63 8 15,2.31,Building a Histogram,2.32,Building a Histogram,2.33,Shapes of Histograms,S
29、ymmetryA histogram is said to be symmetric if,when we draw a vertical line down the center of the histogram,the two sides are identical in shape and size:,Frequency,Variable,Frequency,Variable,Frequency,Variable,2.34,Shapes of Histograms,SkewnessA skewed histogram is one with a long tail extending t
30、o either the right or the left:,Frequency,Variable,Frequency,Variable,Positively Skewed,Negatively Skewed,2.35,Shapes of Histograms,ModalityA unimodal histogram is one with a single peak,while a bimodal histogram is one with two peaks:,Frequency,Variable,Unimodal,Frequency,Variable,Bimodal,A modal c
31、lass is the class withthe largest number of observations,2.36,Shapes of Histograms,Bell ShapeA special type of symmetric unimodal histogram is one that is bell shaped:,Frequency,Variable,Bell Shaped,Many statistical techniques require that the population be bell shaped.Drawing the histogram helps ve
32、rify the shape of the population in question.,2.37,Histogram Comparison,Compare&contrast the following histograms based on data from Ex.2.6&Ex.2.7:,The two courses,Business Statistics and Mathematical Statistics have very different histograms,unimodal vs.bimodal,spread of the marks(narrower|wider),2
33、.38,Stem&Leaf Display,Retains information about individual observations that would normally be lost in the creation of a histogram.Split each observation into two parts,a stem and a leaf:e.g.Observation value:42.19There are several ways to split it upWe could split it at the decimal point:Or split i
34、t at the“tens”position(while rounding to the nearest integer in the“ones”position),2.39,Stem&Leaf Display,Continue this process for all the observations.Then,use the“stems”for the classes and each leaf becomes part of the histogram(based on Example 2.4 data)as follows,StemLeaf00000000000111112222223
35、3333455555566666667788889999991000001111233333334455555667889999200001111123446667789993001335589412444558953356663458702222455678983344578899999001122222333445559991000134444669911124557889,Thus,we still have access to our original data points value!,2.40,Histogram and Stem&Leaf,Compare the overall
36、 shapes of the figures,2.41,Ogive,(pronounced“Oh-jive”)is a graph of a cumulative frequency distribution.We create an ogive in three stepsFirst,from the frequency distribution created earlier,calculate relative frequencies:Relative Frequency=#of observations in a classTotal#of observations,2.42,Rela
37、tive Frequencies,For example,we had 71 observations in our first class(telephone bills from$0.00 to$15.00).Thus,the relative frequency for this class is 71 200(the total#of phone bills)=0.355(or 35.5%),2.43,Ogive,Is a graph of a cumulative frequency distribution.We create an ogive in three steps1)Calculate relative frequencies.2)Calculate cumulative relative frequencies by adding the current class relative frequency to the previous class cumulative relative frequency.(For the first class,its cumulative relative frequency is just its relative frequency),