Wednesday, 23 August 2017

Student evaluations are biased against professors teaching quantitative courses – Author interview with Bob Uttl

BY ALUN JONES · PUBLISHED MAY 9, 2017 · UPDATED JUNE 6, 2017 Today we published “Student evaluations of teaching: Teaching quantitative courses can be hazardous to one’s career” by Bob Uttl and Dylan Smibert. This study used 14,872 publicly posted class evaluations with input from over 325,000 students to identify the differences in standard evaluations of teaching in quantitative vs non-quantitative subject areas. Their analysis demonstrates the bias against professors teaching quantitative courses. The study also finds that professors teaching quantitative vs. non-quantitative courses are far less likely to receive tenure, promotion, and/or merit pay when their performance is evaluated against common standards. These findings have substantial implications for professors teaching quantitative courses, especially in a time when performance metrics have such a strong influence on university decision-making. Here we interview the corresponding author Bob Uttl on the apparent bias against professors teaching quantitative courses and what colleges and universities can do to be more transparent about the way Student Evaluations of Teaching are used. PeerJ: Can you tell us a bit about yourself? Bob Uttl: I am a professor of Psychology at Mount Royal University (MRU), a midsize undergraduate university in Calgary, Alberta, Canada. I am a cognitive psychologist with main research interests in memory, ageing, assessment, and psychometrics. In any given year, I typically teach statistics, advanced research methods, psychometrics, and cognitive psychology courses. I have also accumulated lots of personal experience with Student Evaluation of Teaching (SET) ratings both as a faculty member whose “teaching effectiveness” was evaluated primarily if not exclusively by SET scores, as a member and later a chair of promotion and tenure committees, and as a co-chair and later chair of Mount Royal Faculty Association’s Faculty Evaluation Committee. As the result of these experiences and my research methods, assessment and psychometrics, I have become interested in the uses and misuses of SET ratings. PJ: Can you briefly explain the research you published in PeerJ? BU: My former students and I (Uttl, White, & Wong Gonzalez, 2016) have recently shown that the widely accepted evidence of SETs validity as measure of professors’ teaching effectiveness – meta-analyses of so-called multi-section studies by Cohen (1981), Feldman (1989) and Clayson (2009) – are evidence of their lack of validity. We conducted re-analyses of these previous meta-analyses and we also completed a new up-to-date meta-analysis of the multi-section studies. We found that there was zero correlation between SETs and student achievement/learning when small study effect and students’ prior ability and knowledge were taken into account. Image credit: Sebastiaan ter Burg (Wikimedia Commons CC BY) Our research published in PeerJ examines the validity of SETs from a different angle; it examines whether SET ratings depend on courses professors are assigned to teach (quantitative vs. non-quantitative) and quantifies the impact of teaching quantitative vs. non-quantitative courses on making high-stakes personnel decisions. Our results show that professors teaching quantitative vs. non-quantitative courses are far less likely to receive tenure, promotion, and/or merit pay when their performance is evaluated against common standards. Although lower SETs of professors teaching quantitative vs. non-quantitative courses is not evidence, by itself, that SETs are biased, other well-established findings suggests that lower SETs of professors teaching quantitative vs. non-quantitative courses are due to factors unrelated to professors’ teaching effectiveness, including students’ lack of basic numeracy, lack of interest in quantitative courses, and math anxiety. PJ: Do you have any anecdotes about this research? BU: The way SETs are sometimes used is nothing short of astonishing to anyone who understand key concepts such as precision, central tendency, standards, etc.. To illustrate, one department’s personnel committee concluded that a professor was unsatisfactory and not worthy of promotion and tenure because this professor’s SET scores were 0.01 below the department’s mean SET ratings of 4.25 on 5-point Likert scale. Pushed to its logical conclusion, this department would have to fire annually approximately 50% of its professors and sooner or later would run out of professors to hire since no one would be able to exceed their 5.00 average. Image credit: University of Liverpool (Flickr, CC BY) PJ: What kind of lessons do you hope universities take away from the research? BU: Today, nearly all colleges and universities in many countries ask students to evaluate teaching effectiveness of their professors using SETs. The SET scores are then used to make high-stakes personnel decisions about faculty including hiring, firing, re-appointment, promotion, and merit pay. The administrators, evaluation committees, public, and educational policy makers need to realize that students’ perceptions of teaching, as measured by SETs, are not valid measures of professors’ contribution to students’ learning and reflect students’ characteristics including prior abilities, knowledge, interests, and motivation; situational characteristics including course subject (e.g., quantitative vs. non-quantitative), class size, class level, class time, and class physical environment (e.g., room layout, external noise); course events including the number of students caught plagiarising course work (and subsequently evaluating the professor who called them on it); and professor attributes that have nothing to do with professors’ teaching abilities such as hotness, accent, and perceived approachability. Colleges and universities that continue to insist that SETs are valid measures of professors’ teaching effectiveness, despite all evidence to the contrary, ought to be clear and transparent with their hiring and evaluation policies. For example, they ought to include in their hiring ads that applicants must not have a foreign accent, must be “hot”, and must have facial features showing high approachability. To increase hiring efficiency and to avoid costs of later firings, the applicants should be required to provide voice samples as well as professional full body and head color photos of themselves so that the hiring committees can assess each applicant’s accent, hotness, and facial approachability and screen out applicants who do not meet these job requirements. If that seems discriminatory and contrary to public policy, these colleges and universities should reconsider use of SETs in evaluating professors’ teaching effectiveness.