Multiple Choice Answers

1. As the degree of reliability increases, the proportion of
a. total variance attributed to true variance decreases.
b. total variance attributed to true variance increases.
c. total variance attributed to error variance increases.
d. none of the above
2. A source of error variance may take the form of
a. item sampling.
b. testtakers’ reactions to environment-related variables such as room temperature and lighting.
c. testtaker variables such as amount of sleep the night before a test, amount of anxiety, or drug effects.
d. all of the above
3.  Which type of reliability estimate is obtained by correlating pairs of scores from the same person (or people) on two different administrations of the same test?
a. parallel-forms
b. split-half
c.  test-retest
d.    none of the above
4. Which type of reliability estimate would be appropriate only when evaluating the reliability of a test that measures a trait that is relatively stable over time?
a. parallel-forms
b. alternate-forms
c.  test-retest
d. split-half
5.  An estimate of test-retest reliability is often referred to as a coefficient of stability when the time interval between the test and retest is more than
a. 30 days.
b. 60 days.
c. 3 months.
d.  6 months.
6. What term refers to the degree of correlation between all the items on a scale?
a. inter-item homogeneity
b.  inter-item consistency
c. inter-item heterogeneity
d. parallel
e. -form reliability
7. Test-retest estimates of reliability are referred to as measures of ________, and split-half reliability estimates are referred to as measures of ________.
a. true scores; error scores
b. internal consistency; stability
c. inter-scorer reliability; consistency
d. stability; internal consistency
8. Which of the following factors may influence a split-half reliability estimate?
a. fatigue
b. anxiety
c. item difficulty
d. all of the above

9. Typically, adding items to a test will have what effect on the test’s reliability?
a. Reliability will decrease.
b. Reliability will increase.
c. Reliability will stay the same.
d. b or c
10. Which of the following is NOT an acceptable way to divide a test when using the split-half reliability method?
a. Randomly assign items to each half of the test.
b. Assign odd-numbered items to one half and even-numbered items to the other half of the test.
c. Assign the first-half of the items to one half of the test and the second half of the items to the other half of the test.
d. Assign easy items to one half of the test and difficult items to the other half.
11. Which best conveys the meaning of an inter-scorer reliability estimate of .90?
a. Ninety percent of the scores obtained are reliable.
b. Ninety percent of the variance in the scores assigned by the scorers was attributed to true differences and 10% to error.
c. Ten percent of the variance in the scores assigned by the scorers was attributed to true differences and 90% to error.
d. The test is stable.
12. Classical reliability theory estimates the portion of a test score that is attributed to ________, and domain sampling theory estimates ________.
a. specific sources of variation; error
b. error; specific sources of variation
c. the skills being measured; variation
d. the skills being measured; content knowledge
13. The standard error of measurement of a particular test of anxiety is 8. A student earns a score of 60. What is the confidence interval for this test score at the 95% level?
a. 52–68
b. 40–68
c. 44–76
e. 36–84
14. The Bayley Scales for Infant Development, Second Edition (BSID-II), contains Mental, Motor, and Behavior Rating Scales. Because these three scales are designed to measure different characteristics (that is, they are not homogeneous), it is inappropriate to combine the three scales in computing estimates of
a. alternate-forms reliability.
b.  internal-consistency reliability.
c. test-retest reliability.
d. inter-rater reliability.

15. The fact that young children develop rapidly and in “growth spurts” is a problem primarily in estimating which type of reliability for the Bayley Scales?
a. internal-consistency reliability
b. alternate-forms reliability
c.  test-retest reliability
d. inter-rater reliability
16. A test is considered valid when
a. the test measures what it purports to measure.
b. test results are consistent.
c. the test can be administered efficiently.
d. all of the above
17.  Relating scores obtained on a test to other test scores or data from other assessment procedures is typically the kind of evidence collected in an effort to establish the __________ validity of a test.
a. content
b. criterion-related
c. face
d. none of the above
18.  Face validity refers to
a. the most preferred method for determining validity.
b. another name for content validity.
c. the appearance of relevancy of the test items.
d. validity determined by means of face-to-face interviews.
19.  Which assessment technique has the highest degree of face validity?
a. asking examinees to tell what they see in inkblots for the purpose of personality analysis
b. administering a word processing test to a person applying to be a word processor
c. asking examinees to draw a picture of their family to assess family relationships
d. measuring the height of applicants to a police academy
20. Lawshes’ method for determining agreement among raters or judges who rate items on how essential they are provides a quantifying measure of what type of validity?
a. content
b. construct
c. criterion-related
d. predictive
21.  Which may best be viewed as varieties of criterion-related validity?
a. concurrent validity and face validity
b. content validity and predictive validity
c. concurrent validity and predictive validity
d. concurrent validity and content validity
22.  The form of criterion-related validity that reflects the degree to which a    test score relates to a criterion measure obtained at the same time is known as
a. predictive validity.
b. construct validity.
c. concurrent validity.
d. content validity.

23. The form of criterion-related validity that reflects the degree to which a test score relates to a criterion measure obtained subsequently is known as
a. predictive validity.
b. construct validity.
c. concurrent validity.
d. content validity.

24. What type of validity evidence best sheds light on whether a college admissions test is valid for selecting students who will complete the program within four years?
a. predictive criterion-related
b. concurrent criterion-related
c. content
d. construct
25. A “construct” may best be characterized as
a. unobservable.
b. something that describes behavior.
c. something that is assumed to exist.
d. all of the above
26. A significant, positive relationship exists between scores on a new test of intelligence and scores on the fourth edition of the Stanford-Binet intelligence scale. These data may be viewed as supportive of which type of validity evidence for the new test?
a. criterion-related
b. content validity
c. convergent evidence of construct validity
d. discriminant evidence of construct validity
27. If new predictors explain something about a predicted score that was not already explained by existing predictors, the new predictors possess
a. test-retest reliability.
b. incremental validity.
c. construct validity.
d. face validity.
28. If a student’s performance on a newly developed math achievement test is compared with his or her recent performance on another achievement test known to measure math skills, this would be an example of ________ validity.
a. content
b. concurrent criterion-related
c. predictive criterion-related
d. construct
29. Comparing a college freshman’s SAT scores obtained in high school with his or her first semester’s college GPA may provide an example of ________ validity evidence?
a. content
b. concurrent criterion-related
c. predictive criterion-related
d. construct

30. When a test developer uses multiple predictors to predict a criterion (such as, academic success) from a test score (such as the SAT)
a. all available predictors should be used.
b. only those predictors that add new information should be used.
c. only predictors highly correlated with each other should be used.
d. people who have had coaching in the SAT should be excluded from study.
31. What conclusion concerning intelligence could be drawn based on the 1921 symposium published in the Journal of Educational Psychology?
a. The experts tended to agree on the basic elements of intelligence.
b. Intelligence should be measured by group rather than individual tests.
c. Intelligence consists of a general factor and a number of specific factors.
d. There was little agreement among experts regarding what intelligence is.
32. Galton’s conception of intelligence focused on
a. sensory abilities.
b. environmental factors.
c. behavioral assets and deficits.
d. all of the above
33. On which statement would Binet, Wechsler, and Piaget agree?
a. Heredity, not environment, determines the development of intelligence.
b. Environment, not heredity, influences the development of intelligence.
c. Heredity and environment interact to influence the development of intelligence, although a person has an unlimited genetic potential.
d. Heredity and environment interact to influence the development of intelligence,
but a person may not exceed his or her genetic potential.
34. A major thread running through the theories of Binet, Wechsler, and Piaget is the concept of interactionism. In this context, interactionism refers to
a. interaction between mind and body.
b. members of different professions working together.
c. interaction between heredity and environment.
d. interaction between different psychological approaches to intelligence.
35. Crystallized intelligence includes
a. application of general knowledge.
b. nonverbal abilities.
c. sensory abilities.
d. all of the above
36. Spearman’s g factor refers to
a. what different intelligence tests have in common.
b. the specific factors assessed by different intelligence tests.
c. the fact that Galton was Spearman’s inspiration.
d. a recently discovered erogenous zone.

37. In the Kaufman Assessment Battery for Children, subtests are organized by which two abilities?
a. successive and simultaneous processing
b. general and specific abilities
c. reflective and impulsive processing
d. auditory and visual processing
38. Measuring intelligence in infancy entails the direct assessment of
a. developmental history.
b. verbal development.
c. physical development
d. sensorimotor skills.

39. One of your friends asks you the unlikely question “What is intelligence?” Based on your reading of the text, what would be your best response?
a. A multifaceted construct that is primarily determined by the environment and, in general, includes a person’s ability to appropriately and effectively care for himself or herself and interact with others.
b. An unobservable trait whose meaning researchers have failed to agree upon and that therefore has no real relevance in understanding human behavior.
c. A multifaceted construct influenced by heredity and environment that, in general, includes abilities related to problem solving and to verbal and social competence.
d. We cannot define intelligence because there is no consensus on the human abilities that reflect intelligence.
40. According to the “Flynn effect,” which of the following is TRUE?
a. Average measured intelligence rises each year from the year a test was normed.
b. Average measured intelligence rises and this is accompanied by an increase in academic achievement.
c. The “Flynn effect” has only been observable in the United States.
d. All of the above are true.