1、Designation:D791522An American National StandardStandard Practice forApplication of Generalized Extreme Studentized Deviate(GESD)Technique to Simultaneously Identify MultipleOutliers in a Data Set1This standard is issued under the fixed designation D7915;the number immediately following the designat
2、ion indicates the year oforiginal adoption or,in the case of revision,the year of last revision.A number in parentheses indicates the year of last reapproval.Asuperscript epsilon()indicates an editorial change since the last revision or reapproval.1.Scope*1.1 This practice provides a step by step pr
3、ocedure for theapplication of the Generalized Extreme Studentized Deviate(GESD)Many-Outlier Procedure to simultaneously identifymultiple outliers in a data set.(See Bibliography.)1.2 This practice is applicable to a data set comprisingobservations that is represented on a continuous numericalscale.1
4、.3 This practice is applicable to a data set comprising aminimum of six observations.1.4 This practice is applicable to a data set where the normal(Gaussian)model is reasonably adequate for the distributionalrepresentation of the observations in the data set.1.5 The probability of false identificati
5、on of outliers asso-ciated with the decision criteria set by this practice is 0.01.1.6 It is recommended that the execution of this practice beconducted under the guidance of personnel familiar with thestatistical principles and assumptions associated with theGESD technique.1.7 This standard does no
6、t purport to address all of thesafety concerns,if any,associated with its use.It is theresponsibility of the user of this standard to establish appro-priate safety,health,and environmental practices and deter-mine the applicability of regulatory limitations prior to use.1.8 This international standa
7、rd was developed in accor-dance with internationally recognized principles on standard-ization established in the Decision on Principles for theDevelopment of International Standards,Guides and Recom-mendations issued by the World Trade Organization TechnicalBarriers to Trade(TBT)Committee.2.Termino
8、logy2.1 Definitions of Terms Specific to This Standard:2.1.1 outlier,nan observation(or a subset of observations)which appears to be inconsistent with the remainder of the dataset.3.Significance and Use3.1 The GESD procedure can be used to simultaneouslyidentify up to a pre-determined number of outl
9、iers(r)in a dataset,without having to pre-examine the data set and make apriori decisions as to the location and number of potentialoutliers.3.2 The GESD procedure is robust to masking.Maskingdescribes the phenomenon where the existence of multipleoutliers can prevent an outlier identification proce
10、dure fromdeclaring any of the observations in a data set to be outliers.3.3 The GESD procedure is automation-friendly,and hencecan easily be programmed as automated computer algorithms.4.Procedure4.1 Specify the maximum number of outliers(r)in a data setto be identified.This is the number of cycles
11、required to beexecuted(see 4.2)for the identification of up to r outliers.4.1.1 The recommended maximum number of outliers(r)by this practice is two(2)for data sets with six to twelveobservations.4.1.2 For data sets with more than twelve observations,therecommended maximum number of outliers(r)is th
12、e lesser often(10)or 20%.4.1.3 The recommended values for r in 4.1.1 and 4.1.2 arenot intended to be mandatory.Users can specify other valuesbased on their specific needs.4.2 Set the current cycle number c to 1(c=1).4.2.1 Assign the original data set to be assessed(in 4.1)asthe data set for the curr
13、ent cycle 1 and label it as DTS1.4.3 Compute test statistic T for each observation in the dataset assigned to the current cycle(DTSc)as follows:T 5|x 2 x|s(1)where:x=an observation in the data set,1This practice is under the jurisdiction of ASTM Committee D02 on PetroleumProducts,Liquid Fuels,and Lu
14、bricants and is the direct responsibility of Subcom-mittee D02.94 on Coordinating Subcommittee on Quality Assurance and Statistics.Current edition approved May 1,2022.Published May 2022.Originallyapproved in 1988.Last previous edition approved in 2018 as D7915 18.DOI:10.1520/D7915-22.*A Summary of C
15、hanges section appears at the end of this standardCopyright ASTM International,100 Barr Harbor Drive,PO Box C700,West Conshohocken,PA 19428-2959.United StatesThis international standard was developed in accordance with internationally recognized principles on standardization established in the Decis
16、ion on Principles for theDevelopment of International Standards,Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade(TBT)Committee.1x=average calculated using all observations in the data set,ands=sample standard deviation calculated using all observa-tions in the data set.4.4 Identify the observation associated with the largestabsolute magnitude of the test statistic T in the data set of thecurrent cycle.4.5 If current cycle c is less than r,execute 4.5.