REPOST: What You Want from Tests
what you carry around in your head vs. what you can accomplish
I originally posted this essay in July of 2023 — this is a new version with minor edits to update the language and improve clarity.
I used to be in favor of open-notes tests. But after seeing them in action for a while, I realized that I don’t think they’re a very good idea.
Traditional tests aren’t great, so it’s nice to see people exploring other directions. But the open-notes approach is a mistake, because it doesn’t fit very well with the strengths of test taking.
Tests have some natural strengths and some obvious weaknesses. If we understand these strengths and weaknesses, we can design tests that let us meet more of our goals. Settling for the open-notes approach keeps tests from becoming all they can be.
Knowledge
The traditional argument in favor of open-notes tests is that having access to your notes is true to life. In the real world, it’s rare to find yourself locked in a room without your books, forced to answer questions under a time limit. For most tasks in real life, you’ll have access to whatever resources you need, and can look things up as you go.
Einstein famously was unable to remember the speed of sound, when given the Edison Test. Why memorize such facts, he remarked, when one could easily look them up in a textbook?
Einstein was right. Skill comes from more than just what you carry around in your head. Experts use all the tools they need and refer whatever sources they want when they’re solving a problem. In many ways, skill in a domain is just skill at using the reference works of that domain. Hence the old joke that programming be renamed “Googling StackOverflow.”
Take this view too far, however, and you end up with absurdity. It’s clear that experts don’t carry everything around in their head. But it’s also not true that they carry nothing around in their head.
A physicist may not be able to tell you the speed of sound without looking it up. But every physicist will be able to tell you who Maxwell and Newton were, and a little bit about their contributions. If someone doesn’t know what F = ma means, they’re probably not a physicist.
A programmer won’t be able to recall from memory the exact workings of every function they’ve ever used. But every programmer will be able to tell you the syntax needed to write a for loop in their favorite languages. If someone can’t tell you the syntax of an if statement, they’re probably not a programmer.
An expert is someone who is able to do both. Some things they will know by heart, and some things they will be able to accomplish only given time and resources. You need both to have mastery of a skill. We might call these two forms of knowledge what you carry around in your head and what you can accomplish.
Mastery
We don’t expect students to leave a class as an expert in their field, but we do expect them to have mastery of the material.
What does mastery mean? I think that mastery involves both of these skills.
Someone who can accomplish a task but doesn’t carry any of that knowledge around with them is following a guide, or a set of instructions, without any understanding. Someone who can tell you important facts about a field but can’t accomplish anything is a fan, not an expert.
Students shouldn’t be expected to memorize everything. But we should expect them to carry certain facts around in their head wherever they go. I don’t care if a student leaves my statistics class without memorizing the equation for a t-test. They can look that up. But if they can’t read a scatterplot, that’s a problem.
To evaluate a student’s mastery of a subject, we want to measure both kinds of knowledge. We should give them the chance to demonstrate real skill in the field, but we should also require them to show that they have internalized some of the most important facts and concepts.
We already have good ways of doing both.
Tests separate the student from their resources, and have the potential to measure the information that the student actually carries around in their head.
Class projects (and depending on the subject, papers) allow the student to use whatever they want in the solving of an actual (if usually artificial) problem, and have the potential to measure the student’s ability to accomplish practical work in the field.
When tests and projects are designed with this in mind, a class can run smoothly. When they are not, the result is disaster.
Tests
What are the important features of a test? Well, they happen in a controlled environment. You can’t choose what you’re working on; all questions have been decided for you. You have a limited amount of time. You’re not allowed to collaborate with other people. And you’re not allowed to look anything up.
Open-notes tests relax this last criterion. Some of them relax it in a small way, like giving students a formula sheet, or allowing them to bring a note card as a cheat sheet. Sometimes tests are truly open notes, and students are allowed to refer to whatever they like. Sometimes students can even bring their laptops, and make use of the entire internet.1
It’s good to evaluate a student’s skill at solving problems without restrictions. But tests aren’t a good way to evaluate this kind of knowledge, because they unnaturally restrict the student in other ways. The student isn’t given the kind of time they would have if they were solving a real problem. They don’t get any choice of what problem to work on. They can’t collaborate with others, or go to peers to discuss some aspect of the problem that’s troubling them, which is a huge part of solving problems in the real world. The format of a test hamstrings them.
This is tragic. Tests are naturally suited to evaluating the knowledge and skills that a student has internalized. We should use tests to see if the things you want your students to carry around in their heads have actually ended up in there.
When designing a test like this, you should figure out what you want your students to walk around with, and only include questions about those facts and skills. If it’s information they’d be better off just looking up (dates, exact values, trivia, etc.), that shouldn’t go on the test.
A simple way to evaluate this kind of test is to give it to other experts, and make sure that they can easily answer all the questions without looking up the answers. If experts in the field can’t casually ace your test, then it isn’t a good test of what experts should be expected to carry around in their heads.
This standard may even be slightly too harsh; you probably don’t need your students to walk out of the class on the same level as an expert. Another way to benchmark a test is to pick a student who you know reasonably well, who seems to have mastered the subject, and see how they do on your test.
A test made on these principles should be simple and easy, something that an expert would be able to breeze through. No cruft and no trick questions. Just an evaluation of how much knowledge they are carrying around.
Projects & Papers
For most subjects, class projects or papers are the right way to test the skill of what you can accomplish. Don’t shoehorn open-notes into a test format, it doesn’t fit. Just have them do a project. Projects are inherently open-notes; who ever heard of limiting the resources that can be brought to bear on a class project?
Projects provide a better environment for testing what you can accomplish because they don’t unrealistically hamper the student, as even the most liberal open-notes test will. Students have some level of control over what project they choose, how they approach it, what techniques they use, and who they call on for help. That’s a fair test of their abilities as a whole.
Exceptions
Does this advice apply to all subjects? I don’t think so. Foreign language courses are almost entirely about internalization. If you need to look anything up, you haven’t really learned the language. So testing makes a lot of sense in a language course. On the same note, I’m not sure if projects have any place in introductory language courses — though once you get to composition courses, projects start making more sense again.
There may be other reasons to have students do projects. Here I’ve mostly approached projects as a form of evaluation, but projects can also be an important teaching tool. Having students complete a project as an alternative to readings or lecture is a good idea, but that’s a different use case.
There are also some subjects where tests make no sense at all. For many hands-on skills, like writing or sculpture, you could conceivably make a test, but the real proof will be in creation.
Testing is a good way to examine internalized knowledge, but there are some kinds of internalized knowledge that aren’t easily measured by a test. Exactly how to hold your hammer and chisel, just what the dough looks like when it’s ready to go in the oven — these are things that an expert will have internalized, but which would be difficult to put on a test.
So there are some kinds of internalized knowledge that are better measured by projects. It seems like this is especially true for crafts, and for courses beyond the beginner level, as the student begins to pick up these hard-to-measure intutions.
Generally, the more advanced the course, the less of a role there is for testing. While every subject has a core base of knowledge that all experts will know by heart, specialists will internalize knowledge that sets them apart even from other specialists. People already seem to understand this at some level, and most advanced courses tend to go light on the tests.
Sky Zhang points out that in certain cases, formula sheets can make a lot of sense. A programmer may not remember the syntax for all the basic operations of the language they’re learning, and the professor shouldn’t care. Giving them a sheet that provides that syntax won’t help them if they don’t understand the concepts, but it is forgiving towards students who have deep conceptual understanding but can’t be bothered to remember the exact notation for every operation. We can trust that if they choose to continue, they will eventually know the basics by heart. I think this is another case where professors should think about what they really want students to get out of the course (in this case, the concepts) and what they could care less about (hopefully, the syntax).