Chapter 5: How to Select Tests-Standards for Evaluating Tests

Previous chapters described a number of types of personnel tests and procedures and use of assessment tools to identify good workers and improve organizational performance. Technical and legal issues that have to be considered in using tests were also discussed. In this chapter, information and procedures for evaluating tests will be presented.

Chapter Highlights

Sources of information about tests
Standards for evaluating a test-information to consider to determine suitability of a test for your use
Checklist for evaluating a test.

Principle of Assessment: Use assessment instruments for which understandable and comprehensive documentation is available.

Sources of information about tests

Many assessment instruments are available for use in employment contexts. Sources that can help you determine which tests are appropriate for your situation are described below.

Test manual A test manual should provide clear and complete information about how the test was developed; its recommended uses and possible misuses; and evidence of reliability, validity, and fairness. The manual also should contain full instructions for test administration, scoring, and interpretation. In summary, a test manual should provide sufficient administrative and technical information to allow you to make an informed judgment as to whether the test is suitable for your use. You can order specimen test sets and test manuals from most test publishers. Test publishers and distributors vary in the amount and quality of information they provide in test manuals. The quality and comprehensiveness of the manual often reflect the adequacy of the research base behind the test. Do not mistake catalogs or pamphlets provided by test publishers and distributors for test manuals. Catalogs and pamphlets are marketing tools aimed at selling products. To get a balanced picture of the test, it is important to consult independently published critical test reviews in addition to test manuals.
Mental Measurements Yearbook (MMY) The MMY is a major source of information about assessment tools. It consists of a continuing series of volumes. Each volume contains reviews of tests that are new or significantly revised since the publication of the previous volume. New volumes do not replace old ones; rather, they supplement them.

The MMY series covers nearly all commercially available psychological, educational, and vocational tests published for use with English-speaking people. There is a detailed review of each test by an expert in the field. A brief description of the test covering areas such as purpose, scoring, prices, and publisher is also provided.

The MMY is published by the Buros Institute of Mental Measurements. The Buros Institute also makes test reviews available through a computer database. This database is updated monthly via an on-line computer service. This service is administered by the Bibliographic Retrieval Services (BRS).

Tests in Print (TIP). TIP is another Buros Institute publication. It is published every few years and lists virtually every test published in English that is available for purchase at that time. It includes the same basic information about a test that is included in the MMY, but it does not contain reviews. This publication is a good starting place for determining what tests are currently available.
Test Critiques. This publication provides practical and straightforward test reviews. It consists of several volumes, published over a period of years. Each volume reviews a different selection of tests. The subject index at the back of the most recent volume directs the reader to the correct volume for each test review.
Professional consultants. There are many employment testing experts who can help you evaluate and select tests for your intended use. They can help you design personnel assessment programs that are effective and comply with relevant laws.

If you are considering hiring a consultant, it is important to evaluate his or her qualifications and experience beforehand. Professionals working in this field generally have a Ph.D. in industrial/organizational psychology or a related field. Look for an individual with hands-on experience in the areas in which you need assistance. Consultants may be found in psychology or business departments at universities and colleges. Others serve as full-time consultants, either working independently, or as members of consulting organizations. Typically, professional consultants will hold memberships in APA, SIOP, or other professional organizations.

Reference libraries should contain the publications discussed above as well as others that will provide information about personnel tests and procedures. The Standards for Educational and Psychological Testing and the Principles for the Validation and Use of Personnel Selection Procedures can also help you evaluate a test in terms of its development and use. In addition, these publications indicate the kinds of information a good test manual should contain. Carefully evaluate the quality and the suitability of a test before deciding to use it. Avoid using tests for which only unclear or incomplete documentation is available, and tests that you are unable to thoroughly evaluate. This is the next principle of assessment.

Standards for evaluating a test-information to consider to determine suitability of a test for your use

The following basic descriptive and technical information should be evaluated before you select a test for your use. In order to evaluate a test, you should obtain a copy of the test and test manual. Consult independent reviews of the test for professional opinions on the technical adequacy of the test and the suitability of the test for your purposes.

General information
- Test description. As a starting point, obtain a full description of the test. You will need specific identifying information to order your specimen set and to look up independent reviews. The description of the test is the starting point for evaluating whether the test is suitable for your needs.
  - Name of test. Make sure you have the accurate name of the test. (There are tests with similar names, and you want to look up reviews of the correct instrument.)
  - Publication date. What is the date of publication? Is it the latest version? If the test is old, it is possible that the test content and norms for scoring and interpretation have become outdated.
  - Publisher. Who is the test publisher? Sometimes test copyrights are transferred from one publisher to another. You may need to call the publisher for information or for determining the suitability of the test for your needs. Is the publisher cooperative in this regard? Does the publisher have staff available to assist you?
  - Authors. Who developed the test? Try to determine the background of the authors. Typically, test developers hold a doctorate in industrial/organizational psychology, psychometrics, or a related field and are associated with professional organizations such as APA. Another desirable qualification is proven expertise in test research and construction.
  - Forms. Is there more than one version of the test? Are they interchangeable? Are forms available for use with special groups, such as non-English speakers or persons with limited reading skills?
  - Format. Is the test available in paper-and-pencil and/or computer format? Is it meant to be administered to one person at a time, or can it be administered in a group setting?
  - Administration time. How long does it take to administer?
- Costs. What are the costs to administer and score the test? This may vary depending on the version used, and whether scoring is by hand, computer, or by the test publisher.
- Staff requirements. What training and background do staff need to administer, score, and interpret the test? Do you have suitable staff available now or do you need to train and/or hire staff?
Purpose, nature, and applicability of the test
- Test purpose. What aspects of job performance do you need to measure? What characteristics does the test measure? Does the manual contain a coherent description of these characteristics? Is there a match between what the developer says the test measures and what you intend to measure? The test you select for your assessment should relate directly to one or more important aspects of the job. A job analysis will help you identify the tasks involved in the job, and the knowledge, skills, abilities, and other characteristics required for successful performance.
- Similarity of reference group to target group. The test manual will describe the characteristics of the reference group that was used to develop the test. How similar are your test takers, the target group, to the reference group? Consider such factors as age, gender, racial and ethnic composition, education, occupation, and cultural background. Do any factors suggest that the test may not be appropriate for your group? In general, the closer your group matches the characteristics of the reference group, the more confidence you will have that the test will yield meaningful scores for your group.
- Similarity of norm group to target group. In some cases, the test manual will refer to a norm group. A norm group is the sample of the relevant population on whom the scoring procedures and score interpretation guidelines are based. In such cases, the norm group is the same as the reference group. If your target group differs from the norm group in important ways, then the test cannot be meaningfully used in your situation. For further discussion of norm groups, see Chapter 7.
Technical information
- Test reliability. Examine the test manual to determine whether the test has an acceptable level of reliability before deciding to use it. See Chapter 3 for a discussion of how to interpret reliability information. A good test manual should provide detailed information on the types of reliabilities reported, how reliability studies were conducted, and the size and nature of the sample used to develop the reliability coefficients. Independent reviews also should be consulted.
- Test validity. Determine whether the test may be validly used in the way you intended. Check the validity coefficients in the relevant validity studies. Usually the higher the validity coefficient, the more useful the test will be in predicting job success. See Chapter 3 for a discussion of how to interpret validity information. A good test manual will contain clear and complete information on the valid uses of the test, including how validation studies were conducted, and the size and characteristics of the validation samples. Independent test reviews will let you know whether the sample size was sufficient, whether statistical procedures were appropriate, and whether the test meets professional standards.
- Test fairness. Select tests developed to be as fair as possible to test takers of different racial, ethnic, gender, and age groups. See Chapter 7 for a discussion of test fairness. Read the manual and independent reviews of the test to evaluate its fairness to these groups. To secure acceptance by all test takers, the test should also appear to be fair. The test items should not reflect racial, cultural, or gender stereotypes, or overemphasize one culture over another. The rules for test administration and scoring should be clear and uniform. Does the manual indicate any modifications that are possible and may be needed to test individuals with disabilities?
- Potential for adverse impact. The manual and independent reviews should help you to evaluate whether the test you are considering has the potential for causing adverse impact. As discussed earlier, mental and physical ability tests have the potential for causing substantial adverse impact. However, they can be an important part of your assessment program. If these tests are used in combination with other employment tests and procedures, you will be able to obtain a better picture of an individual's job potential and reduce the effect of average score differences between groups on one test.
Practical evaluation
- Test tryout. It is often useful to try the test in your own organizational setting by asking employees of your organization to take the test and by taking the test yourself. Do not compute test scores for these employees unless you take steps to ensure that results are anonymous. By trying the test out, you will gain a better appreciation of the administration procedures, including the suitability of the administration manual, test booklet, answer sheets and scoring procedures, the actual time needed, and the adequacy of the planned staffing arrangements. The reactions of your employees to the test may give you additional insight into the effect the test will have on candidates.
- Cost-effectiveness. Are there less costly tests or assessment procedures that can help you achieve your assessment goals? If possible, weigh the potential gain in job performance against the cost of using the test. Some test publishers and test reviews include an expectancy chart or table that you can consult to predict the expected level of performance of an individual based on his or her test score. However, make sure your target group is comparable to the reference group on which the expectancy chart was developed.
- Independent reviews. Is the information provided by the test manual consistent with independent reviews of the test? If there is more than one review, do they agree or disagree with each other? Information from independent reviews will prove most useful in evaluating a test.
- Overall practical evaluation. This involves evaluating the overall suitability of the test for your specific circumstances. Does the test appear easy to use or is it unsettling? Does it appear fair and appropriate for your target groups? How clear are instructions for administration, scoring, and interpretation? Are special equipment or facilities needed? Is the staff qualified to administer the test and interpret results or would extensive training be required?

Checklist for evaluating a test

Characteristic to be measured by test (skill, ability, personality trait)
Job/training characteristic to be assessed
Candidate population (education, or experience level, other background)
Test Characteristics
- Test name:
- Version:
- Type: (paper-and-pencil, computer) Alternate forms available
- Scoring method: (hand-scored, machine-scored)
Technical considerations
- Reliability: r =
- Validity: r =
- Reference/norm group:
- Test fairness evidence
- Adverse impact evidence
- Applicability (indicate any special group)
Administration considerations
- Administration time:
- Materials needed (include start-up costs, operational and scoring cost):
- Costs:
- Facilities needed:
Staffing requirements
Training requirements
Other considerations (consider clarity, comprehensiveness, utility)
Test manual
Supporting documents from the publisher
Publisher assistance
Independent reviews
Overall evaluation

A document by the:

U.S. Department of Labor
Employment and Training Administration
1999