Overview of Teacher Evaluation
Baseball is known as the national pastime of the United States, but teacher evaluation beats it hands down. Everybody does it - some with a vengeance, others with the casual disregard that physical and emotional distance afford. Most enthusiasts grow up with the game, playing a sandlot version as they go through school. Indeed, familiarity with the job of teaching and the widespread practice of judging teachers has shaped the history of teacher evaluation.
History of Teacher Evaluation
Donald Medley, Homer Coker, and Robert Soar (1984) describe succinctly the modern history of formal teacher evaluation - that period from the turn of the twentieth century to about 1980. This history might be divided into three overlapping periods: (1) The Search for Great Teachers; (2) Inferring Teacher Quality from Student Learning; and (3) Examining Teaching Performance. At the beginning of the twenty-first century, teacher evaluation appears to be entering a new phase of disequilibrium; that is, a transition to a period of Evaluating Teaching as Professional Behavior.
The Search for Great Teachers began in earnest in 1896 with the report of a study conducted by H.E. Kratz. Kratz asked 2,411 students from the second through the eighth grades in Sioux City, Iowa, to describe the characteristics of their best teachers. Kratz thought that by making desirable characteristics explicit he could establish a benchmark against which all teachers might be judged. Some 87 percent of those young Iowans mentioned "helpfulness" as the most important teacher characteristic. But a stunning 58 percent mentioned "personal appearance" as the next most influential factor.
Arvil Barr's 1948 compendium of research on teaching competence noted that supervisors' ratings of teachers were the metric of choice. A few researchers, however, examined average gains in student achievement for the purpose of Inferring Teacher Quality from Student Learning. They assumed, for good reason, that supervisors' opinions of teachers revealed little or nothing about student learning. Indeed, according to Medley and his colleagues, these early findings were "most discouraging." The average correlation between teacher characteristics and student learning, as measured most often by achievement tests, was zero. Some characteristics related positively to student achievement gains in one study and negatively in another study. Most showed no relation at all. Simeon J. Domas and David Tiedeman (1950) reviewed more than 1,000 studies of teacher characteristics, defined in nearly every way imaginable, and found no clear direction for evaluators. Jacob Getzels and Philip Jackson (1963) called once and for all for an end to research and evaluation aimed at linking teacher characteristics to student learning, arguing it was an idea without merit.
Medley and his colleagues note several reasons for the failure of early efforts to judge teachers by student outcomes. First, student achievement varied, and relying on average measures of achievement masked differences. Second, researchers failed to control for the regression effect in student achievement - extreme high and low scores automatically regress toward the mean in second administrations of tests. Third, achievement tests were, for a variety of reasons, poor measures of student success. Perhaps most important, as the researchers who ushered in the period of Examining Teaching Performance were to suggest, these early approaches were conceptually inadequate, and even misleading. Student learning as measured by standardized achievement tests simply did not depend on a teacher's education, intelligence, gender, age, personality, attitudes, or any other personal attribute. What mattered was how teachers behaved when they were in classrooms.
The period of Examining Teaching Performance abandoned efforts to identify desirable teacher characteristics and concentrated instead on identifying effective teaching behaviors; that is, those behaviors that were linked to student learning. The tack was to describe clearly and precisely teaching behaviors and relate them to student learning - as measured most often by standardized achievement test scores. In rare instances, researchers conducted experiments for the purpose of arguing that certain teaching behaviors actually caused student learning. Like Kratz a century earlier, these investigators assumed that "principles of effective teaching" would serve as new and improved benchmarks for guiding both the evaluation and education of teachers. Jere Brophy and Thomas Good produced the most conceptually elaborate and useful description of this work in 1986, while Marjorie Powell and Joseph Beard's 1984 extensive bibliography of research done from 1965 to 1980 is a useful reference.
Goals of Teacher Evaluation
Although there are multiple goals of teacher evaluation, they are perhaps most often described as either formative or summative in nature. Formative evaluation consists of evaluation practices meant to shape, form, or improve teachers' performances. Clinical supervisors observe teachers, collect data on teaching behavior, organize these data, and share the results in conferences with the teachers observed. The supervisors' intent is to help teachers improve their practice. In contrast, summative evaluation, as the term implies, has as its aim the development and use of data to inform summary judgments of teachers. A principal observes teachers in action, works with them on committees, examines their students' work, talks with parents, and the like. These actions, aimed at least in part at obtaining evaluative information about teachers' work, inform the principal's decision to recommend teachers either for continuing a teacher's contract or for termination of employment. Decisions about initial licensure, hiring, promoting, rewarding, and terminating are examples of the class of summative evaluation decisions.
The goals of summative and formative evaluation may not be so different as they appear at first glance. If an evaluator is examining teachers collectively in a school system, some summary judgments of individuals might be considered formative in terms of improving the teaching staff as a whole. For instance, the summative decision to add a single strong teacher to a group of other strong teachers results in improving the capacity and value of the whole staff.
In a slightly different way, individual performance and group performance affect discussions of merit and worth. Merit deals with the notion of how a single teacher measures up on some scale of desirable characteristics. Does the person exhibit motivating behavior in the classroom? Does she take advantage of opportunities to continue professional development? Do her students do well on standardized achievement tests? If the answers to these types of questions are "yes," then the teacher might be said to be "meritorious." Assume for a moment that the same teacher is one of six members of a high school social studies team in a rural school district. Assume also that one of the two physics teachers just quit, the special education population is growing rapidly, and the state education department recently replaced one social science requirement for graduation with a computer science requirement. Given these circumstances, the meritorious teacher might not add much value to the school system; that is, other teachers, even less meritorious ones, might be worth more to the system.
The example of the meritorious teacher suggests yet another important distinction in processes of evaluating teachers: the difference between domain-referenced and norm-referenced teacher evaluation. When individual teachers are compared to a set of externally derived, publicly expressed standards, as in the case of merit decisions, the process is one of domain-referenced evaluation. What counts is how the teacher compares to the benchmarks of success identified in a particular domain of professional behavior. In contrast, norm-referenced teacher evaluation consists of grouping teachers' scores on a given set of measures and describing these scores in relation to one another. What is the mean score of the group? What is the range or standard deviation of the scores? What is shape of the distribution of the scores? These questions emanate from a norm-referenced perspective - one often adopted in initial certification or licensure decisions.
The work of John Meyer and Brian Rowan (1977) suggests that there are yet other goals driving the structure and function of teacher evaluation systems. If school leaders intend to maintain public confidence and support, they must behave in ways that assure their constituents and the public at large that they are legitimate. Schools must innovate to be healthy organizations, but if school leaders get too far ahead of the pack - look too different, behave too radically - they do so at their own peril. When they incorporate acceptable ideas, schools protect themselves. The idea that teachers must be held accountable, or in some way evaluated, is an easy one to sell to the public, and thus one that enhances a leader's or system's legitimacy.
Trends, Issues, and Controversies
With the standards movement of the late 1990s came increased expectations for student performance and renewed concerns about teacher practice. Driven by politicians, parents, and, notably, teacher unions, school districts began an analysis of teacher evaluation goals and procedures. The traditional model of teacher evaluation, based on scheduled observations of a handful of direct instruction lessons, came under fire. "Seventy years of empirical research on teacher evaluation shows that current practices do not improve teachers or accurately tell what happens in classrooms" (Peterson, p. 14). Not surprisingly, in this climate, numerous alternative evaluative practices have been developed or reborn.
In the early twenty-first century, the first line of teacher evaluation consists of state and national tests created as barriers for entry to the profession. Some forty states use basic skills and subject matter assessments provided by the Praxis Series examinations for this purpose. Creators of the examinations assume teachers should be masters of grammar, mathematics, and the content they intend to teach. Though many states use the same basic skills tests, each sets its own passing score. The movement to identify and hire quality teachers based on test scores has resulted in some notable legal cases. Teachers who graduate from approved teacher education programs yet fail to pass licensure tests have challenged the validity of such tests, as well as the assignment of culpability. If a person pays for teacher education and is awarded a degree, who is to blame when that person fails a licensure examination? This is not an insignificant concern. In 1998, for example, the state of Massachusetts implemented a new test that resulted in a 59 percent failure rate for prospective teachers. Once a teacher has assumed a job, however, that teacher is rarely, if ever, tested again. In-service teachers typically succeed at resisting pressure to submit to periodic examinations because of the power of their numbers and their political organization.
Despite the well-known difficulties of measuring links between teaching and learning, the practice of judging teachers by the performance of their students is enjoying a resurgence of interest. Polls indicate that a majority of the American public favors this idea. School leaders routinely praise or chastise schools, and by implication teachers, for students' test results. Despite researchers' inabilities to examine the complexity of life in schools and in classrooms, studies of relationships between teaching and learning often become political springboards for policy formulation. For example, William Sanders (1996) suggests that teacher effectiveness is the single greatest factor affecting academic growth. His work has been seized upon by accountability proponents to argue that teachers must be held accountable for students' low test scores.
Although there may be much to be gained from focusing educators on common themes of accountability through the use of standards and accompanying tests, there may be much to lose as well. The upside can be measured over time in greater collective attention to common concerns. The downside results when people assume teachers can influence factors outside their control - factors that affect students' test scores, such as students' experiences, socioeconomic status, and parental involvement. A focus on scores as the sole, or even primary, indicator of accountability also creates the possibility for academic misconduct, such as ignoring important but untested material, teaching to the test, or cheating.
As researchers have demonstrated, those schools that need the most help are often least likely to get it. Daniel L. Duke, Pamela Tucker, and Walter Heinecke (2000) studied sixteen high schools involved in initial efforts to meet the challenges of new accountability standards that emphasize student test scores. These schools represented various combinations of need and ability. The researchers found that the schools with high need and low ability (those with poor test scores and low levels of financial resources) reported the highest concerns about staffing, morale, instruction, and students. Thus, the schools that needed the most help, the ones that were the primary targets of new accountability efforts, appeared in this study to be put at greater risk by the accountability movement.
Teachers' jobs involve far more than raising test scores. An evaluation strategy borrowed from institutions of higher education and business, sometimes referred to as 360-degree feedback, acknowledges the necessity of considering the bigger picture. The intent of this holistic approach is to gather information from everyone with knowledge of a teacher's performance to create a complete representation of a teacher's practice and to identify areas for improvement. Multiple data sources, including questionnaires and surveys, student achievement, observation notes, teacher-developed curricula and tests, parent reports, teacher participation on committees, and the like, assure a rich store of information on which to base evaluation decisions. Current models tend to place the responsibility with administrators to interpret and respond to the data. To be sure, there are risks involved. The strategy asks children to evaluate their teachers, and it gathers feedback from individuals who possess only a secondary knowledge of a teacher's practices, namely parents and fellow teachers. Nonetheless, different kinds of information collected from different vantage points encourage full and fair representation of teachers' professional lives.
Toward Evaluating Teaching As Professional Behavior
At the turn of the twenty-first century, people continue to debate whether teaching is a true profession. Questions persist about educators' lack of self-regulation, the nebulously defined knowledge base upon which teaching rests, the lack of rigid entrance requirements to teacher education programs (witness alternative licensure routes), the level of teachers' salaries, and the locus of control in matters of evaluation. Yet school districts, state governments, the federal government, and national professional and lay organizations appear intent as never before on building and strengthening teaching as a profession.
One simple example of a changing attitude toward teaching as a profession is that of the use of peer evaluation. Two decades ago, in Toledo, Ohio, educators advanced processes of peer review as a method of evaluation. At its most basic level, peer review consists of an accomplished teacher observing and assessing the pedagogy of a novice or struggling veteran teacher. School districts that use peer review, however, often link the practice with teacher intervention, mentoring programs, and, in some instances, hiring and firing decisions. Columbus, Ohio's Peer Assistance and Review Program, seemingly representative of many review systems, releases expert teachers from classroom responsibilities to act as teaching consultants. Driven by the National Education Association's 1997 decision to reverse its opposition to peer review, the idea has enjoyed a resurgence of popularity in recent years.
Founded in 1987, the National Board for Professional Teaching Standards (NBPTS) is yet another example of people from different constituencies working together to advance the concept of teaching as a profession. The NBPTS attempts to identify and reward the highest caliber teachers, those who represent the top end of the quality distribution. Based on the medical profession's concept of board-certified physicians, the NBPTS bestows certification only on those teachers who meet what board representatives perceive to be the highest performance standards. By the end of the year 2000, nearly 10,000 teachers had received board certification - though this amounts to a tiny fraction of the nation's 2.6 million teachers. Widespread political and financial support, from both political conservatives and liberals, suggests this idea may have staying power.
Teacher evaluation will grow and develop as the concept of teaching as a profession evolves. Computer technology is only beginning to suggest how new methods of formative and summative evaluation can alter the landscape. Perhaps most important is that as reformers confront the realities of life in schools, public knowledge of what it means to be a teacher increases. More people in more walks of life are recognizing how complex and demanding teaching can be, and how important teachers are to society as a whole. Teacher evaluators of the future will demonstrate much higher levels of knowledge and skill than their predecessors, leaving the teaching profession better than they found it.
Bibliography
Barr, Arvil. 1948. "The Measurement and Prediction of Teaching Efficiency: A Summary of Investigations." Journal of Experimental Education 16 (4):203 - 283.
Brophy, Jere, and Good, Thomas. 1986. "Teacher Behavior and Student Achievement." In Handbook of Research on Teaching, ed. Merlin C. Wittrock. New York: Macmillan.
Gage, Nathaniel L., and Needels, Margaret C. 1989. "Process-Product Research on Teaching: A Review of Criticisms." The Elementary School Journal 89 (3):253 - 300.
Getzels, Jacob. W., and Jackson, Philip W. 1963. "The Teacher's Personality and Characteristics." In Handbook of Research on Teaching: A Project of the American Educational Research Association. ed. Nathaniel L. Gage. New York: Macmillan.
Domas, Simeon J., and Tiedeman, David V. 1950. "Teacher Competence: An Annotated Bibliography." Journal of Experimental Education 19:99 - 218.
Duke, Daniel L.; Tucker, Pamela; and Heinecke,
Walter. 2000. Initial Responses of Virginia High Schools to the Accountability Initiative. Charlottesville, VA: Thomas Jefferson Center for Educational Design, University of Virginia.
Herbert, Joanne M. 1999. "An Online Learning Community: Technology Brings Teachers Together for Professional Development." American School Board Journal March:39 - 40.
McNergney, Robert F.; Herbert, Joanne M.; and Ford, R. E. 1993. "Anatomy of a Team Case Competition." Paper presented at the Annual Meeting of the American Educational Research Association, Atlanta, Georgia.
Medley, Donald M. 1979. "The Effectiveness of Teachers." In Research on Teaching: Concepts, Findings, and Implications, ed. Penelope L. Peterson and Herbert J. Walberg. Berkeley, CA: McCutchan.
Medley, Donald M.; Coker, Homer; and Soar, Robert S. 1984. Measurement-Based Evaluation of Teacher Performance: An Empirical Approach. New York: Longman.
Meyer, John W., and Rowan, Brian. 1977. "Institutionalized Organizations: Formal Structure as Myth and Ceremony." American Journal of Sociology 83 (2):340 - 363.
Peterson, Kenneth D. 2000. Teacher Evaluation: A Comprehensive Guide to New Directions and New Practices, 2nd edition. Thousand Oaks, CA: Corwin Press.
Powell, Marjorie, and Beard, Joseph W. 1984.
Teacher Effectiveness: An Annotated Bibliography and Guide to Research. New York: Garland.
Sanders, William L. and Rivers, June C. 1996. Cumulative and Residual Effects of Teachers on Future Student Academic Achievement. Knoxville: University of Tennessee Value-Added Research and Assessment Center.
Scriven, Michael. 1967. "The Methodology of Evaluation." In Perspectives of Curriculum Evaluation, ed. Ralph W. Tyler, Robert M. Gagné, and Michael Scriven. Chicago: Rand McNally.
— MARI A. PEARLMAN





