Guest post: “Breaking Bad” – Encouraging Two-Stage Exams in Science Classrooms

(This is a guest post by Chad Atkins, PhD candidate in the Department of Chemistry at UBC. Find him on Twitter as @chemchad.)

If asked how science is taught at a university level, a common response would likely start by describing details of the class: there would be a pale and grizzled professor with graying hair, a huge lecture hall with a massive chalk board, and endless amounts of Greek symbols that [apparently] represent more than local fraternities.


Google image search of “science professor” provides stock photos of the common perception.

Realistically, this description wouldn’t be that far off – I completed an undergraduate degree in chemistry six years ago and the majority of my classes followed a similar theme: show up to class, take good notes, and study them meticulously for the final exam.  Interactions with classmates never happened in class and only happened outside its walls when stuck on an assignment and/or ranting about the material. While not necessarily in the best interest of learning, this brand of science education has endured many generations of students.

The neat thing about scientists is that they practice what they preach – everything can be studied. It shouldn’t be surprising that a growing number of initiatives sprouting up in universities across North America to study the ways students learn and develop complex reasoning skills. There’s a prime example right in my own backyard, as the University of British Columbia has been home to the Carl Wieman Science Education Initiative (CWSEI) since 2007. A glance through the site’s menu tab on research demonstrates the quantity of cross-discipline studies being approved and producing significant results. Building this foundation takes considerable time, but it seems evident that applying these research-based educational methods to science classrooms can produce a more effective learning environment than achieved through traditional means.

Although there are numerous alternative approaches — some of which are better suited for certain classrooms, based on content, class size, etc. — the common thread between them is that engagement between classmates is encouraged.[1] Such approaches might fall under the active learning umbrella, but for the purpose of this post I’d like to focus on the concept of peer-based learning – students interacting with each other to learn the material together. Research in this realm was pioneered by Eric Mazur, who introduced the practice of peer instruction in the early 1990s that is now well-documented in scientific literature.[2][3] Given this wealth of evidence to support the approach, why hasn’t it been universally implemented in higher education facilities worldwide? The answer lies in the sheer volume of work required to create an effective peer-based concept activity for a class of >100 students. To be useful, a challenging concept must be identified, common student misconceptions need to be understood, and a series of questions have to be crafted to address the misconceptions while still promoting learning.[4] An easier gateway into this pedagogical realm takes advantage of the evaluation methods students often dread most – examinations.

Under the mold of traditional science lecturing, the greatest effort a student injects into a course is during the two or three days before the midterm and final exams.  The actual writing of the exam ends up being a complete brain splurge, but eventually the torture ends and students leave the lecture hall and lounge in the hallways to bravely compare their answers. The bright student in the class declares their solution to the last was 4.00 kilowatts/hour, and everyone else hangs their head because they had 8.00 kilowatts/hour. In this situation, none of the students actually know what’s right or what’s wrong, and by the time the exams are marked and the results posted a week later, the majority of them will have forgotten where that extra factor of two came from in the first place – in other words, the window of opportunity for concrete learning has closed. To eliminate these periods of uncertainty, an alternative approach was conceived in the mid 1990’s that would aim to maximize student learning during the exam while also taking advantage of proven peer-based learning approaches.

David Cohen, a professor of mathematics at Smith College, was the first to introduce a collaborative exam style known as ‘the pyramid exam’.[5] Over the last few decades, the style has evolved and been refined into a more concise structure broadly referenced as a two-stage exam.  Anecdotal evidence exists, both in the academic and blog communities, of professors and teachers applying such a strategy and having tremendously positive results and feedback; however, two-stage exams have still not found much footing across multiple disciplines inside the walls of post-secondary institutions.

As dictated by the name, a two-stage exam happens in, well… two stages. The first stage – which is typically weighted to be worth ≥75% of the final mark – is written individually like a traditional exam and accounts for 2/3rd of the scheduled exam timeslot. The questions on this exam are then closely replicated in a new exam handed out for completion during the second stage, except this second iteration is written in small groups of 4 or 5 with the remaining 1/3rd of the time and accounting for the remaining percentage of marks. The group discusses the questions and are required to come to a consensus as only one copy gets handed in per group with all team members receiving the same mark. I won’t go into great detail about the benefits of a two-stage exam (as it has previously been covered here by @jossives), but this second stage gives students the immediate feedback that they tend to seek in the hallways after a traditional exam ends.  The difference is that now they can collaborate directly with more of their peers and leech/share personalized solving approaches while having grade incentive to do so. Furthermore, it’s believed that lower-achieving students profit from extra explanation and over-achieving students succeed from explaining their logic to others – a win-win scenario.

Supporting the notion that ‘seeing is believing’, I took some time in November to observe a two-stage exam in a UBC undergraduate Earth & Ocean Sciences course with >200 students. I wasn’t sure what to expect, but my gut instinct assumed there would be insurmountable chaos when the individual portion of the exam ended and groups needed to be formed. Despite the fact that groups weren’t created prior to the exam (although there is a tool available to assist with assigning balanced groups), it turned out that many students had pre-formed their groups beforehand – chaos was very short-lived, with the total lag time between stages being somewhere between four and five minutes.  Any students who didn’t have a full group were gathered at the front and quickly organized into groups of four. The most surprising part of the process, however, was how quickly students completed the second stage – discussions were heated and animated, with some groups finishing in less than twenty minutes of an allotted thirty minute slot. When the final seconds ticked off the clock, only a handful of groups remained. Without having seen the process in action, I may have doubted its utility for learning, but recent research results from the instructor of this class demonstrate that the improvement in student performance from utilizing a two-stage exam is uniform across the class and not dependent on the caliber of student.[6]

What’s my interest in this? The six years since my undergraduate degree have been spent trying to attain all the chemistry degrees I can from graduate school. Like most science graduate programs in Canada, chemistry requires its students to complete a certain amount of teaching hours in a semester to fulfil requirements for their stipend. The roles assigned are often in a laboratory course, but this semester I’m sneaking into a third-year analytical chemistry class focused on instrumental analysis. Having taught portions of the accompanying lab component portion of the class for the last two years, I know first-hand the struggles that can accompany learning the necessary material – the vast amount of information tends to be overwhelming and students often feel they need to resort to the memorization to do well. The UBC chemistry department has recognized this and is attempting to introduce peer-based learning into the class curriculum. Under the guidance of a CWSEI Science Teaching and Learning Fellow (@ejanemaxwell), the first step in the process the introduction of some in-class activities and a two-stage midterm in February – I’m excited to see how the class responds.

Assuming Jared gives me the OK, I’ll report back with some impressions at the end of the semester.


[1] “Teaching problem solving through cooperative grouping. Part 1: Group versus individual problem solving” (1992) Heller, P. et al., American Journal of Physics: Volume 60, Pages 627-636. Available at

[2] “Peer Instruction: Ten Years of experience and results” (2001) Crouch, C.H. and Mazur, E. American Journal of Physics: Volume 69, Pages 970-977. Available at

[3] “Peer instruction enhanced meaningful learning: ability to solve novel problems” (2005) Cortright, R.N. et al., Advances in Physiology Education: Volume 29, Pages 107-11. Available at

[4] “Teaching problem solving through cooperative grouping. Part 2: designing problems and structuring groups” (1992) Heller, P. et al., American Journal of Physics: Volume 60, Pages 637-644. Available at

[5] “The Pyramid Exam” (1995) Cohen, D. and Henle, J. UME Trends, no. July, Pages 2, 15. Available at

[6]“Research and Teaching – Collaborative Testing: Evidence of Learning in a Controlled In-Class Study of Undergraduate Students” (2014) Gilley, B.H. and Clarkston, B.  Journal of College Science Teaching, Volume 43, Pages 83-91. Available at

More Resources:



  1. […] site suggesting that we could adapt the format of a two-stage exam (beautifully summarized here by my brilliant lecture TA, Chad Atkins) to use as a review activity on the first day of class.  A […]

  2. Hey Chad, thanks for the great post! I see that you and Jane did a two-stage review activity in the first class ( Have you any idea what the perception in the class is about the upcoming two-stage midterm, since they’ve already got a taste?

  3. Hi Chad. What role do you think that formal immediate feedback could or should play in these group exams? Although it is true that the grades on these are extremely high, not every group gets every question correct. There are ways to help make this feedback happen (IF-AT cards, online homework systems, CMS/LMS), but they all come with their challenges. Do you think that the next stage in the development of two-stage exams should be to move in the direction of providing that additional layer of immediate feedback to the students or is it sufficient that nearly all of the groups get nearly all of the questions correct?

  4. @Jared — to be honest, the perception seems to be mixed. Some of the class have run through this exercise before and are familiar with the two-stage process in an assessment environment. Others are definitely unaware of what to expect when grades enter the equation. A major difference between the review activity and the midterm will be how groups are selected — in the former, the students chose themselves (it was just a review activity, afterall). For the midterm, we’re planning on pre-arranging groups AND the location for each group within the lecture hall. We’re hoping that’ll reduce unnecessary chaos once the individual portion finishes.

    @Joss — Yeah, Jane used IF-AT cards in the two-stage review activity she developed, so they were able to get that immediate feedback. The limitation of IF-AT cards for us in our third-year chemistry class is that some concepts can’t really be addressed in a multiple-choice format. When we’re asking a question of whether component X should be included in instrument Y for function Z, showing five or six different versions is almost like unveiling the answer without challenging every student to have their unbiased input. I’m not expecting all of our groups to get all of the questions correct, but I’m hopeful that the process will still identify areas where student B is approaching a problem much differently than student A.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: