Standardized testing is probably the most important aspect of a
school-aged child’s life, in the United States. Based on the results, students
are placed in programs, assigned to remediation, special education services, or
gifted programs. Unfortunately, for culturally and linguistically diverse
students, most standardized test questions used in the United States today are
biased toward, white, middle-class students, using background information and
references to American life, references that cannot be easily understood by
students of diverse cultures. Poor scores on these assessments may cause a
student to be incorrectly classified as a special education student when,
common, in fact, the student is very bright—only unable to correctly answer
questions based on American culture, rather than their background, which is
very different.
Today, schools in the United States, are
more racially and culturally diverse than ever before. As of 2005, almost 50%
of students in the U.S. could be classified as “culturally different” (Ford,
2010, p.51). According to a 2010 report published by the Institute of Education
Sciences, in the coming years, the Hispanic population is predicted to grow at
a faster rate than most other groups in the United States (Utley, Obiakor,
& Bakken, 2011, p. 5). In addition, the growth rate of Caucasians is
expected to be slower than all other races.
In the years to come, classrooms in
the United States will only continue to grow more diverse. Unfortunately,
culturally diverse students have much less school success than the dominant
population (p.50). One major problem with the ever changing cultural climate of
classrooms in the United States is that culturally and linguistically diverse
(CLD) students typically score lower than Caucasians on educational
assessments. While the culture of students continues to change, teachers and
assessment tools in the United States are not. Around 85% of educators are
female and the majority of teachers as a whole—83%—are white (Ford, 2010,
p.51). In addition, the tests used to
assess children for special education and gifted education programs are created
for white, middle class students. The cultural atmosphere of the United States
is changing dramatically, while intelligence tests and other high-stakes
assessment practices focus primarily on Caucasian students.
Many times, in order to
answer certain questions correctly, an individual must have specific culturally
based information or knowledge. Unfortunately, many bilingual students do not
have this specific cultural information, making it hard to answer questions on
tests (Baca & Cervantes, 1989, p.165). Culturally and linguistically
diverse students are at a great disadvantage when it comes to traditional
assessments, because those assessments are extremely biased against students
that are culturally and linguistically diverse or from varying economic groups
(p.165). Also, students that are limited English proficient are often misplaced
when it comes to special education services (p.165). There is an over
representation of Hispanic students in the learning disabled population. Educators
seem to have trouble determining whether student’s problems stem from
underlying disabilities, or simply his or her lack of English proficiency (p.165).
Also, the characteristics of acculturation can be confused with signs of a
disability (p.165).
There are
four types of bias related to the differential performance between and among members
of cultural groups (Whiting & Ford). The first is bias in construct
validity—this type of bias exists when a test is shown to measure different
hypothetical constructs for members of one group than another (Whiting et al.).
The second type of bias is bias in content validity—this type of bias exists
when an item or subscale is more difficult for members of one group than
members of another, even though the ability level of both groups is generally
equal. Three examples of content bias are: items asking for information that
minority persons have not had an equal opportunity to learn, when the scoring
of an item is inappropriate because the maker of the test decided on only one
correct answer, and when minorities are penalized for giving an answer that
would have been correct in their own culture (Whiting et al.). The third type
of bias is in item selection—item selection bias occurs when the items and
tasks selected are based on experiences and language of the dominant group,
much like content validity, however, this bias is more concerned with the
appropriateness of individual test items. The fourth kind of bias is predictive
or criterion-related validity—criterion-related validity occurs when the answer
to items and tasks require prior cultural knowledge of the dominant group
(Whiting et al).
Intelligence
tests contain examples of all four typed of bias. Unfortunately, IQ tests are
thought of, by many people who are not familiar with the purpose and
limitations of testing, as a test of innate ability (Ford, 2004, p.v). Therefore,
when certain groups of people score lower than others, people unfamiliar with
intelligence tests may consider those groups to be inferior genetically or have
lesser intelligence due to heredity (p.v). This school of thought ignores many
factors that affect the results of intelligence tests, including the
environment, level and quality of education, and opportunity to learn (p.v).
Another
school of thought purports that intelligence tests measure unlearned abilities,
therefore, if a person scores low on an intelligence test, they are assumed to
have inferior cognitive ability and potential (Ford, 2004, p.v). This belief is
common among people who are not trained in testing and assessment, people who
believe “intelligence is fixed, innate and unchangeable”, and people who
believe that intelligence tests are “comprehensive, exact, and precise measures
of intelligence (p.v). Fagan (2000) hypothesized that the lack of intelligence
tests that are fair across different cultural groups originates from a
theoretical bias to associate the IQ score with intelligence rather than with
knowledge (Fagan, 2008, p.vi).
There is no consensus in education
regarding why diverse students score lower on intelligence tests than white
students (Ford, 2004, p. vii). There are two major debates surrounding the
performance of minority students (Ford, 2004, p.vi). One group of scholars believes
that the low intelligence test performance of minorities is due to the cultural
deprivation and economic disadvantage experienced by minorities (p.vi). They
believe that culturally diverse students are inferior to the norm (p.vi). On
the other side of the debate, scholars believe that minority students are
different culturally, but not disadvantaged. These scholars believe that
culture impacts test performance, but, do not see low scores as evidence of
inferiority (p.vi).
Examiners
that assess culturally and linguistically diverse children are often uncertain
about which tests provide the most reliable, valid and unbiased results
(Edwards, 2006, p.246). Historical data shows a significant discrepancy in the
intelligence test performance of different cultural groups (p.246). Edwards
(2006) reveals “on average, when adjusted for differences in socio-economic
status, individuals of Asian descent scored higher than those of European
descent, individuals of African descent scored lower than those of European
descent, and individuals of Hispanic descent scored somewhere in between the
latter two groups on tests of intelligence”(p.246).
Because
of the great debate of cultural bias in standardized tests in the 1970s, test
developers have tried to decrease or eliminate cultural bias in assessments
(Ford, 2004, p.vi). Some scholars argue that there is no longer test bias due
to the changes made by test developers (p.vi). Others contend that tests can
never be free of cultural bias because they are developed by people and they
reflect the test developers culture or cultures (p.vi). Absolute fairness of
tests is impossible to attain because tests are not perfectly reliable or valid
in any particular context (p.vi).
Currently, most intelligence test makers do not release the statistics
about the difference in performance by various ethnic groups (Edwards, 2006,
p.246). In doing this, the test developers are attempting to avoid controversy
about the differences between ethnic groups, and appear socially sensitive
(p.246). The lack of data provided by these developers makes it hard for
educators, and other test users, to make informed and fair decisions about
which test would be the most effective for a given student (p.246). The data
that is available suggests that the results of biased intelligence tests lead
to a disproportionate representation of culturally and linguistically diverse students
in special education programs (p.247). Limiting or avoiding this
overrepresentation would require data about which intelligence tests are most
representative of minority group scores, by fairly and reliably assessing them
(p.247).
In 1998,
Jensen reported that his meta-analysis showed an average IQ” range of 10-20
points between ethnic groups on different IQ tests” (Edwards, 2006, p.247). Different
tests vary in their discrepancy between ethnic groups, one test may have an
average difference of 10, while another difference may be closer to 20 (p.247).
If test producers provide data about their IQ discrepancies it will help test
users to make informed decisions regarding which intelligence test will provide
the most accurate results. The outcome of these fair and reliable results would
most likely lessen the disproportionate representation of minority groups in
special education programs (p.247).
Wasserman and Becker (2000)
reviewed studies on the WISC-III, Stanford-Binet IV, and the Woodcock-Johnson
Tests of Cognitive Ability that used samples corresponding to key demographic
variables. They found that the mean discrepancy, in favor of whites, between
standard scores for matched samples of African American and Caucasian groups
were as follows: WISC-III=11.0; Stanford-Binet IV=8.1; and Woodcock- Johnson
Test of Cognitive Ability=11.7. These considerable average score differences
imply that when these tests are used to refer students to gifted programs that fewer
culturally and linguistically diverse children may be identified as meeting the
criteria for giftedness (Ford, 2004, p.8).
High
stakes testing is widespread in the United States; Lamb (1993) observed that
test scores in student files create the basis for high stakes decisions (Ford,
2004, p.5). Hilliard (1991), Korchin (1980), Olmedo (1981), and others argue
that standardized tests have added to the continuation of barriers that diverse
groups are faced with politically, socially and economically (Ford, 2004, p.
5). Donna Ford (2004) in her paper “Intelligence Testing and Cultural
Diversity: Concerns, Cautions, and Considerations” notes that “when tests are
used for selecting and screening, the potential for denying diverse groups
access to educational opportunities, such as gifted education programs, due to
bias is great” (Ford, 2004, p.5). Many scholars believe that intelligence tests
contain cultural bias that is in favor of middle class Caucasian groups because
they assess knowledge and content normed for White, middle class students and
use language and situations that are often unfamiliar to culturally and
linguistically diverse students (p.6).
The
debates and arguments regarding intelligence and intelligence testing are common
in education, more specifically, special education and gifted education
programs. These programs rely heavily on assessments, like intelligence tests,
to make decisions regarding the placement of students in one of the programs
and what services they require to help them be successful (Ford, 2004, p.2).
Oliver
Edwards (2006) writes, in his article “Special education disproportionality and
the influence of intelligence test selection” in the Journal of Intellectual
& Developmental Disability that, “for a test to have equitable effects,
examiners need to interpret them not only in light of their statistical
properties, but also in light of the consequences of test score use (Edwards,
2006, p.247). Biased intelligence test scores can be detrimental to minority
students. The interpretation of these scores is a double edged sword affecting
entrance into gifted education programs and special education programs alike.
Low test scores can prevent minority students from being identified as gifted
and entering gifted education programs (Ford, 2004, p.vi). Depending on the
student, low scores can result in the student being identified as learning
disabled, mentally retarded etc., this label will likely follow this student
for the duration of his or her education (p.vi).
Being
mistakenly assigned to special education services has many consequences,
present and future. There is a stigma that goes along with the label “special
education student”. This stigma can follow them throughout their lives,
influencing how teachers and classmates treat them and may give them a poor
self-image and affect their self esteem (Gay, 2002, p.615).
Intelligence
tests reveal information regarding the test takers educational attainment,
social judgment, reasoning and comprehension. Because the scores of
intelligence tests are considered to accurately reveal differences in these
four areas and are used to make important educational decisions, it is
important to turn to research regarding the cultural accuracy of each test to
help guide decision on the use of each intelligence test (Edwards, 2006,
p.247). The intelligence test that is most likely to decrease the
overrepresentation of culturally and linguistically diverse students from
special education programs is the test that results in a smaller average score
variation between different ethnic groups.
Disproportionality in special education
is not a new problem. Despite efforts of teachers, school districts, Individuals with
Disabilities Education Act (IDEA) and the Education of All Handicapped Children
Act, minorities, unfortunately, are still overrepresented in special education
and underrepresented in gifted and talented programs. Educators and other test
users must place an importance on picking intelligence tests that have the
lowest discrepancy between cultural groups so that the scores from culturally
and linguistically diverse students are accurate and fairly reflect that students
needs regarding special education and gifted education programs. Educators must
never assume that everyone shares the same opportunities and
experiences—intelligence tests are rooted in this belief.
Overall,
assessments used in schools can be made more culturally fair and valid in the
following ways: by administering tests in the primary language of the person
taking the exam, have interpreters translate test questions, review tests and
eliminate questions that groups perform very differently on, eliminate items
that may be offensive to certain groups, keep in mind the background of the
person who took the test when examining their answers, never assume that
everyone shares the same opportunities and experiences, never base decisions
about a person on one test score, instead collect multiple sources of data, and
when a whole group scores low on a test, consider that the test may be the
problem.
I
believe that intelligence cannot be tested independently of the culture that
gives rise to the test. I believe that many tests are geared toward a certain
group of people or cultures and there is a bias to them. Depending on what each
student’s strengths are and what he has been exposed to, he may score a lot higher
on the test than a person of a different background. I think that students need
to be evaluated for gifted and special education programs based on the results
of multiple assessments, rather than solely relying on an intelligence test
that may or may not yield different results for different cultural groups. I also
believe that there needs to be a value put on teacher input for considerations
in these programs. A student’s teacher is a great source of information about a
student’s individual strengths and weaknesses.
In
addition I think that it would be advantageous, not only to use multiple
assessment measures, but to reevaluate students in special education programs
multiple times throughout the year to see if they still require special
education services or to see if there may have been an error in original
testing. The culturally linguistically diverse students, in particular, must be
reevaluated to determine whether or not they have been placed correctly. Some
diverse students may not have tested into the gifted and talented program at
their school that should have been; others may not require special education
services but are being given them. Human and assessment error is an unfortunate
part of life, it is an educators responsibility to make sure that if an error
has occurred that it be rectified.
I agree
that cultural bias may be lessened in intelligence tests but cannot be fully
removed. The test will always reflect the culture of the person that created
the test. Ford and Gilman (n.d), for example, state that even with the best
intentions to create tests that have little to no bias, “human error,
stereotypes and prejudice undermine test administrations, interpretation, and
use” (Ford & Gilman, n.d). Culturally diverse students are, more often than
not, are affected by this bias.
In the future, I hope that test developers
realize that withholding the statistical information related to the mean scores
of different cultural groups does more harm than good. I understand that the
developers are trying to be socially conscious, however, that data could
possibly be used to spark future research related to test bias so that,
hopefully, in the future, less culturally and linguistically diverse students
are incorrectly placed in a certain program, or do not have a chance to
participate in a program due to faulty test scores. The release of this data
can also be used to help test users choose an assessment that has be proven to
produce scores with less discrepancies between difference cultural groups.
References
Baca, L., & Cervantes, H. T. (1989). The
bilingual special education interface
(2nd ed.). Columbus, Oh.: Merrill.
Edwards,
O. (2006). Special Education Disproportionality and the Influence of
Intelligence Test Selection. Journal of Intellectual & Developmental
Disability, 31(4), 246-248. Retrieved April 12, 2012, from the
Ebscohost database.
Ford,
D. (2010). Culturally Responsive Classrooms: Affirming Culturally Different
Gifted Students. Gifted Child Today, 33, 50-53. Retrieved April
1, 2012, from the Ebscohost database.
Ford,
D. (2004). Intelligence testing and cultural diversity: Concerns, cautions, and
considerations. National Research Center on the Gifted and Talented, 1,
1-71. Retrieved April 2, 2012, from the Educational Resources Information
Center database.
Lopez,
R. (1997). The practical impact of current research and issues in intelligence
test interpretation and use for multicultural populations. School Psychology
Review, 26(2), 249-254. Retrieved April 22, 2012, from Ebscohost
Gay,
Geneva . "Culturally responsive teaching in special education for
ethnically diverse students: setting the stage." Qualitative Studies in
Education 15.5 (2002): 613-629. http://www.cehd.umn.edu/. Web. 6
March 2012.’
Skiba,
R., Simmons, A., Ritter, S., Gibb, A., Rausch, K., Cuadrado, J., et al. (2008).
Achieving Equity In Special Education: History, Status, and Current Challenges.
Council for Exceptional Children, 74, 264-288. Retrieved April
16, 2012, from the Educational Resources Information Center database.