Prior System at UO
From 2007 – 2019, UO administered numerical student evaluations of teaching (SETs) administered numerical student evaluations of teaching (SETs). These asked students to rank faculty on general criteria and invited open-ended descriptions of a course's and an instructor's "strengths and areas for possible improvement". In principle, these (and all SETs) have a dual purpose.
- Faculty can use SET results to help them identify areas of their teaching that need attention and improvement; that is, they have a formative purpose.
- SETs are used to inform evaluations of a faculty member’s teaching as part of decisions about tenure and promotion, contract renewal, and merit raises; a a summative purpose
The latter purpose, especially, relies on the assumption that SETs are a valid measure of teaching effectiveness (assumed to be related to student learning). The research literature on SETs is extensive and stretches back nearly 100 years, but over that time little consensus has emerged about whether there is in fact a correlation between SET ratings and student learning, or even how one should measure student learning.
Research on SETs
Many—but not all—studies show a modest positive correlation between SET results and student learning (Spooren et al., 2013; Benton & Cashin, 2012). But recent work, including a careful meta-analysis of previous results (Uttl et al., 2016), indicates that there is no correlation between SET ratings and student learning after controlling for sample size and publication bias. Research specific to SETs use at UO (Ancell & Wu, 2017) also concluded that suggest that "SET scores are not a valid measure of teaching quality at the UO" (pp. 1).
Other problems arise as well. For example, there are indications that students often do not interpret questions and terminology on SETs in the same way faculty do (Lauer, 2012) so care must be taken with wording of questions and interpretation of results. Persistent questions also remain (see, for example Stark & Freishtat (2014)) regarding students’ ability to assess teaching effectiveness, the use of SETs to compare faculty in the absence of information about the spread of scores within a relevant group of faculty, and whether student response rates on non-mandatory SETs accurately reflect the true distribution of student opinion. In addition, there is evidence that SET scores vary depending on class size, the level of the class, the discipline, and prior preparation of the students.
Most disturbing, though, are results indicating that SETs show bias. Research on SETs find bias in race (Smith, 2007; Smith and Hawkins, 2011), and in ethnicity (Smith & Anderson, 2005), with Black and Latino faculty receiving lower scores on SETs than their white colleagues. Similar research finds gender bias, with women receiving lower scores than male colleagues (Mengel, 2018; McNell, 2015; Boring et al., 2016; Boring, 2017) and gendered language in written comments (Mitchell, 2018; Ray, 2018. Boring, 2017). The research by Ancell and Wu (2017) into SETs at UO found evidence that "female instructors receive systematically lower course evaluation scores while their students achieve more than their peers taught by male instructors in future courses" (pp. 38).
While there is debate about the validity, utility, and fairness of SETs, there is agreement in the research literature that if they are used at all, SETs should be only one of several tools used to assess teaching (Benton & Cashin, 2012; Lauer, 2012; Berk, 2005). Peer reviews, self-evaluations, administrator reviews, student interviews, and alumni ratings are alternative strategies that can be combined to create a more representative picture of a faculty member’s teaching. Institutions such as the Association of American Universities (Dennin et al, 2018; The Association of American Universities, n.d.) and the Royal Academy of Engineering (n.d.) have argued that it is time for universities’ ideals regarding teaching excellence to align with their policies.
Current System at UO
With knowledge of these shortcomings, in 2017 the University Senate adopted a resolution to form a committee to overhaul UO's teaching evaluation system. The Senate Continuous Improvement and Evaluation of Teaching (CIET) committee (established in April 2019 legislation) now oversees implementation of Senate legislation related to teaching evaluation.
The University of Oregon has developed a holistic new teaching evaluation system that does more than simply replace problematic evaluation instruments. The new system provides the path to define, develop, evaluate, and reward teaching excellence. The goals of the new system are to ensure teaching evaluation is fair and transparent, is conducted against criteria aligned with the unit’s definition of teaching excellence, and includes input from students through the SES, peer reviews, and the faculty themselves.
In use since 2019, the SES system has shown to address issues seen in the old UO system. Pilot testing of the SES saw a reduction of personal comments about the instructor by a factor of around 10 — from about 20% of all student comments with the old system and down to less than 2% with the SES. References to instructor personality traits (such as bossy, sweet, funny, or patient) and perceived intelligence (such as bright, intellect, genius or smart) were also reduced in the new student experience survey compared to the prior system. Gender stereotyped language were also reduced with the SES. For example, under the prior UO system male faculty were twice as likely to be described as a "genius" compared to female faculty. By focusing student reflections in the SES on specific elements of teaching, the use of this unhelpful descriptor was reduced overall by about 60% and its use was equalized between male and female instructors.
If discriminatory, obscene, or demeaning comments are still submitted by students, the CIET and the Office of Provost also developed a protocol to redact these comments from SES results that are made available for viewing by unit heads and personnel committees. After SES responses are available, faculty can flag individual comments as being discriminatory, obscene, or demeaning (any of the three), and the CIET committee will review the comments for redaction. In the first five years of the SES, faculty flagged 118 comments as discriminatory, obscene, or demeaning; the CIET committee redacted 62 of those comments. You can read more about the redaction process on the SES Comment Redaction Page.
References
- Ancell, K. & Wu, E. (2017) Teaching, Learning, and Achievement: Are Course Evaluations Valid Measures of Instructional Quality at the
University of Oregon? [Honors Thesis, University of Oregon] - The Association of American Universities. (n.d.) Undergraduate STEM Education Initiative.
- Benton, S. L. & Cashin, W. E. (2012). IDEA Paper No. 50: Student Ratings of Teaching: A Summary of Research and Literature, The IDEA Center.
- Berk, R. (2005). A Survey of 12 Strategies to Measure Teaching Effectiveness. International Journal of Teaching and Learning on Higher Education, 17, 48-62.
- Boring, A. (2017). Gender biases in student evaluations of teaching. Journal of Public Economics, 145, 27-41.
- Boring, A., Ottoboni, K., & Stark, P. A. (2016). Student evaluations of teaching (mostly) do not measure teaching effectiveness. ScienceOpen Research.
- Dennin, M., Schultz, Z. D., Feig, A., Finkelstein, N., Greenhoot, A. F., Hildreth, M., Leibovich, A. K., Martin, J. D., Moldwin, M.B., O’Dowd, D. K., Posey, L. A., Smith, T. L., & Miller, E. R. (2018). Aligning Practice to Policies: Changing the Culture to Recognize and Reward Teaching at Research Universities. CBE—Life Sciences Education, 16(4).
- Lauer, C. (2012). A comparison of faculty and student perspectives on course evaluation terminology. In J. Groccia, J., & Cruz, L. (Eds.), To Improve the Academy: Resources for faculty, instructional, and organizational development. (pp. 195-211). Wiley and Sons, Inc.
- Ray, V. (2018). Is Gender Bias an Intended Feature of Teaching Evaluations?. Inside Higher Ed.
- Royal Academy of Engineering. (n.d.) The Career Framework for University Teaching.
- McNell, L., Driscoll, A., & Hunt, A. N. (2015). What's In a Name: Exposing Gender Bias in Student Ratings of Teaching. Innovation in Higher Education, 40, 291-303.
- Mengel, F., Sauermann, J., & Zölitz, U. (2018) Gender Bias in Teaching Evaluations. Journal of the European Economic Association, 17(2), 535–566.
- Mitchell, K. (2018, March 19). Student Evaluations Can’t Be Used to Assess Professors. Slate.
- Smith, B. P., & Hawkins, B. (2011). Examining Student Evaluations of Black College Faculty: Does Race Matter?. The Journal of Negro Education, 80(2), 149-162.
- Smith, B. P. (2007). Student Ratings of Teaching Effectiveness: An Analysis of End-of-Course Faculty Evaluations. College Student Journal, 41(4), 788-800.
- Smith, G., & Anderson, K. J. (2005). Students' Ratings of Professors: The Teaching Style Contingency for Latino/a Professors. Journal of Latinos & Education, 4(4), 115-136.
- Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the Validity of Student Evaluation of Teaching: The State of the Art. Review of Educational Research, 83(4), 598-642.
- Stark, P. B. & Freishtat, R. (2014). An Evaluation of Course Evaluations. ScienceOpen Research, 2014.
- Uttl, B., White, C. A., & Gonzalez, D. W. (2016) Meta-analysis of faculty's teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22-42.