Students are invited to complete student evaluations of teaching (SETs) for their classes at the end of each term at the University of Oregon. In principle, SETs have a dual purpose. Faculty can use SET results to help them identify areas of their teaching that need attention and improvement; that is, they have a formative purpose. SETs also have a summative purpose: they are used to inform evaluations of a faculty member’s teaching as part of decisions about tenure and promotion, contract renewal, and merit raises.
The latter purpose, especially, relies on the assumption that SETs are a valid measure of teaching effectiveness (assumed to be related to student learning). The research literature on SETs is extensive and stretches back nearly 100 years, but over that time little consensus has emerged about whether there is in fact a correlation between SET ratings and student learning, or even how one should measure student learning.
Many—but not all—studies show a modest positive correlation between SET results and student learning [1] [2]. But recent work, including a careful meta-analysis of previous results [3], indicates that there is no correlation between SET ratings and student learning after controlling for sample size and publication bias.
Other problems arise as well. For example, there are indications that students often do not interpret questions and terminology on SETs in the same way faculty do [4] so care must be taken with wording of questions and interpretation of results. Persistent questions also remain (see, for example [5]) regarding students’ ability to assess teaching effectiveness, the use of SETs to compare faculty in the absence of information about the spread of scores within a relevant group of faculty, and whether student response rates on non-mandatory SETs accurately reflect the true distribution of student opinion. In addition, there is evidence that SET scores vary depending on class size, the level of the class, the discipline, and prior preparation of the students.
Most disturbing, though, are results indicating that SETs show bias in gender [6] [7], race [8] [9], and ethnicity [10], with women, African-Americans, and Latino faculty receiving lower scores on SETs than their white male colleagues.
While there is debate about the validity, utility, and fairness of SETs, there is agreement in the research literature that if they are used at all, SETs should be only one of several tools used to assess teaching [2] [4] [11]. Peer reviews, self-evaluations, administrator reviews, student interviews, and alumni ratings are alternative strategies that can be combined to create a more representative picture of a faculty member’s teaching.