1、Designation:F288911Standard Practice forAssessing Language Proficiency1This standard is issued under the fixed designation F2889;the number immediately following the designation indicates the year oforiginal adoption or,in the case of revision,the year of last revision.A number in parentheses indica
2、tes the year of last reapproval.Asuperscript epsilon()indicates an editorial change since the last revision or reapproval.1.Scope1.1 PurposeThis practice describes best practices for thedevelopment and use of language tests in the modalities ofspeaking,listening,reading,and writing for assessing abi
3、lityaccording to the Interagency Language Roundtable(ILR)2scale.This practice focuses on testing language proficiency inuse of language for communicative purposes.1.2 LimitationsThis practice is not intended to addresstesting and test development in the following specialized areas:Translation,Interp
4、retation,Audio Translation,Transcription,other job-specific language performance tests,or DiagnosticAssessment.1.2.1 Tests developed under this practice should not be usedto address any of the above excluded purposes(for example,diagnostics).2.Referenced Documents2.1 ASTM Standards:3F1562 Guide for
5、Use-Oriented Foreign Language Instruc-tionF2089 Guide for Language Interpretation ServicesF2575 Guide for Quality Assurance in Translation3.Terminology3.1 Definitions:3.1.1 achievement test,nan instrument designed to mea-sure what a person has learned within or up to a given timebased on a sampling
6、of what has been covered in the syllabus.3.1.2 adaptive test,nform of individually tailored testingin which test items are selected from an item bank where testitems are stored in rank order with respect to their itemdifficulty and presented to test takers during the test on thebasis of their respon
7、ses to previous items,until it is determinedthat sufficient information regarding test takers abilities hasbeen collected.The opposite of a fixed-form test.3.1.3 authentic texts,ntexts not created for languagelearning purposes that are taken from newspapers,magazines,etc.,and tapes of natural speech
8、 taken from ordinary radio ortelevision programs,etc.3.1.4 calibration,nthe process of determining the scale ofa test or tests.3.1.4.1 DiscussionCalibration may involve anchoringitems from different tests to a common difficulty scale(thetheta scale).When a test is constructed from calibrated itemsth
9、en scores on the test indicate the candidates ability,i.e.theirlocation on the theta scale.3.1.5 cognitive lab,na method for eliciting feedback fromexaminees with regard to test items.3.1.5.1 DiscussionSmall numbers of examinees take thetest,or subsets of the items on the test,and provide extensivef
10、eedback on the items by speaking their thought processesaloud as they take the test,answering questionnaires about theitems,being interviewed by researchers,or other methodsintended to obtain in-depth information about items.Theseexaminees should be similar to the examinees for whom thetest is inten
11、ded.For tests scored by raters,similar techniquesare used with raters to obtain information on rubric function-ing.3.1.6 computer adaptive test,na test administered by acomputer in which the difficulty level of the next item to bepresented to test takers is estimated on the basis of theirresponses t
12、o previous items and adapted to match theirabilities.3.1.7 construct,nthe knowledge,skill or ability that isbeing tested.3.1.7.1 DiscussionThe construct provides the basis for agiven test or test task and for interpreting scores derived fromthis task.3.1.8 constructed response,adja type of item or t
13、est taskthat requires test takers to respond to a series of open-endedquestions by writing,speaking,or doing something rather thanchoose answers from a ready-made list.3.1.8.1 DiscussionThe most commonly used types ofconstructed-response items include fill-in,short-answer,andperformance assessment.1
14、This practice is under the jurisdiction of ASTM Committee F43 on LanguageServices and Products and is the direct responsibility of Subcommittee F43.04 onLanguage Testing.Current edition approved May 1,2011.Published June 2011.DOI:10.1520/F2889-11.2Interagency Language Roundtable,Language Skill Level
15、 Descriptors(http:/www.govtilr.org/Skills/ILRscale1.htm).3For referenced ASTM standards,visit the ASTM website,www.astm.org,orcontact ASTM Customer Service at serviceastm.org.For Annual Book of ASTMStandards volume information,refer to the standards Document Summary page onthe ASTM website.Copyright
16、 ASTM International,100 Barr Harbor Drive,PO Box C700,West Conshohocken,PA 19428-2959.United States1 3.1.9 content validity,na conceptual or non-statisticalvalidity based on a systematic analysis of the test content todetermine whether it includes an adequate sample of the targetdomain to be measured.3.1.9.1 DiscussionIn order to achieve content validity,anadequate sample involves ensuring that all major aspects arecovered and in suitable proportions.3.1.10 criterion-referenced scale,na graduate