Author: welcometotheoryland

A Tablespoon of Salt

I enjoy cooking – I find it really relaxing. There’s something nice about getting to switch to autopilot and follow a recipe after a long day of work.

A few years back I found a soup recipe (I can’t remember which one, unfortunately) that looked really, really good. I invited some friends over for dinner on the promise that I’d trade them food and shelter for good conversation. Nowadays, I think I have a pretty good handle on the basic steps in making soup, but at the time I was still getting a handle on cooking and found myself essentially following recipes more or less verbatim.

One of the first steps in this particular recipe was to cook a bunch of aromatics over low heat, along with some salt to draw out the water and speed things up. The recipe recommended adding in a teaspoon of kosher salt. I misread the directions and instead added a tablespoon of table salt.

Needless to say, although everything smelled amazing, when it came time to serve it up we found that it was essentially inedible. I had to apologize for keeping people waiting, but we managed to salvage the evening by heading downtown to a restaurant where the food was made by trained professionals.

So what went wrong here?

There were two immediate errors. First, I had mixed up tablespoons and teaspoons. That made my measurements off by a factor of three. I also used table salt instead of kosher salt, which has smaller crystals that can pack more tightly, effectively multiplying the oversalting amount by something like an extra 25 – 40%. So I had essentially added in around four times the amount of salt necessary. Oops.

But let’s take a step back. Why didn’t I notice that I was about to make this mistake? Well, I was new to cooking, and I hadn’t made enough recipes to realize that a full tablespoon of table salt is a lot of salt. That’s three times the daily recommended salt intake for a person. (That point is particularly relevant, since I was making food for three people!) In a sense, I hadn’t built up an intuition that would cause me to pause for a minute, say that something didn’t feel right about adding that much salt, and then double-check the recipe.

It also didn’t help that I had never really stopped to think about the difference between kosher salt and table salt. I had just thought that salt was salt was salt and that was that. It had never occurred to me that different types of salt had different properties and different use cases, or that recipes would specify different types for a good reason. (Or, rather, that they often are different and often are there for good reason. In some cases, salt is salt is salt. Just not this case.)

It also didn’t help that I hadn’t yet built up enough cooking experience to realize just how powerful salt is. There are some ingredients you can accidentally triple without breaking things. If I used too much oil in this recipe, I probably wouldn’t have noticed because the last step was to immersion blend everything and chances are it would have (mostly) emulsified in. It just would have been super rich. Salt is not one of those ingredients, but I hadn’t learned that yet.

And finally, it really didn’t help that this was a recipe I’d never tried out before. At the time I was making this recipe, I hadn’t yet developed enough experience to be able to stop and taste something partway through a recipe and figure out whether it needed some sort of adjustment. In this case even, when I was thinking “wow, those aromatics by themselves are really unpleasantly salty!” I didn’t have the context to know “and that’s not what they’re supposed to taste like!”

The reason I’m mentioning this story is that it began with me making what, in some sense, is a pretty minor error: using the wrong amount of salt. But really, this error indicated that I still had a lot to learn about cooking, namely:

  • A tablespoon of salt is a lot of salt – probably way more than what you’d ever need in soup for three people.
  • Kosher salt is not the same as table salt, and the two aren’t one-for-one interchangable.
  • Oversalting something can totally wreck it, and you have to be very careful about how much salt you add to something.
  • You should not invite people over and serve them an untested recipe!

Over the years both taking and teaching classes, I’ve found that there are certain mistakes you can make that seem minor, but which totally break whatever it is that you’re working on. Often times, the reason that those mistakes seem minor is because there’s a lack of a deeper understanding of some key concept or concepts, and really the true error is something more fundamental. I call these errors “tablespoon of salt errors.” A tablespoon of salt error is one where

  • the error appears superficial to the person who made it;
  • the error is sufficiently serious that it’s difficult, if not impossible, to recover from it without starting over; and
  • the error reflects a fundamental process error or a lack of understanding of some key concept.

For example, when I was an undergrad, I was taking a class on advanced object-oriented programming techniques in C++. One of the assignments asked us to simulate objects moving around in a shipping network. I had a helper function that checked whether the weight of a packet was zero and then returned one of two different values based on this. The code essentially looked like this:

WeightClass weightClassFor(unsigned weight) {
    return weight = 0? ZeroWeightClass : NonzeroWeightClass;

There’s an big error here. Specifically, I’m using a single-equals sign (“assign weight to be zero, then evaluate to zero”) rather than a double-equals sign (“compare weight against zero.”) This code always returns the NonzeroWeightClass, which isn’t correct.

I spent many hours working on this project, submitted it, and earned a solid C for it. The error was localized specifically to the above function, and changing that one equals sign would have fixed the issue and passed all the functionality tests.

In the only time I can ever remember doing so as an undergrad, I went to the TAs to beg and plead for some partial credit back, since after all, it was a one-character error! It’s pretty clear what I meant to do, after all.

But the thing is, this wasn’t a one-character error. Somehow, I had written some code that didn’t work, and I hadn’t sufficiently tested things before I submitted it. In reality, the root causes of the error were more accurately things like these:

  • I wasn’t compiling with warnings on the maximum setting. A good compiler probably could have caught this and said something like “so… you’re sure you want to have a comparison statement that always does the same thing?”
  • I hadn’t tested things thoroughly enough. This code clearly doesn’t work, but I had written a whopping zero unit tests for my code and therefore didn’t have a way of noticing.

This is a classic tablespoon of salt error. It seems like a minor error (I missed an equals sign). It broke everything pretty seriously (a quarter of the tests failed, despite the rest of the code being essentially right). And it indicated a serious process error on my part (where were the unit tests? why didn’t you compile with warnings as errors?)

I’m now completely convinced that the TAs were justified in taking off a bunch of points, since there was something fundamentally broken with how I approached the assignment. Plus, I now know not to make that error again!

Now that I’m teaching classes rather than taking them, I’m noticing that tablespoon-of-salt-errors are extremely common, especially in discrete math and proofwriting, where very, very simple mistakes can totally derail a proof.

For example, consider the following proof about multiples of three. The relevant definition we give to students is that an integer n is a multiple of three if n = 3k for some integer k.

Theorem: If n is an integer that is not a multiple of three, then n^2 is not a multiple of three.

Proof: Consider an arbitrary natural number n where n is not a multiple of three. This means that n \ne 3k for any integer k. Consequently, n^2 \ne 3(3k^2) for any integer k. Therefore, if we let m = 3k, we see that n^2 \ne 3m, and therefore n^2 is not a multiple of three. $\qed$

There’s a nice rhythm to the proof, and if I were to read this, I’d think that the (hypothetical) student who wrote it made a good-faith effort to solve the problem. I’d also think that they made a pretty big logic error: just because you can’t write n^2 = 3(3k^2) for any integer k doesn’t necessarily mean that you can’t write n^2 = 3m for some integer m that isn’t of the form 3k^2. Using that same logic, I could prove the (incorrect) claim that if n is not a multiple of nine, then n^2 is not a multiple of nine either:

“Theorem:” If n is an integer that is not a multiple of nine, then n^2 is not a multiple of nine.

Proof: Consider an arbitrary natural number n where n is not a multiple of nine. This means that n \ne 9k for any integer k. Consequently, n^2 \ne 9(9k^2) for any integer k. Therefore, if we let m = 9k, we see that n^2 \ne 9m, and therefore n^2 is not a multiple of nine. $\qed$

The theorem here is not true (pick n = 3, for example), and the proof follows pretty much the same line of reasoning as the one from earlier, so the logic can’t possibly be correct.

Superficially, this error isn’t all that deep. “Oops, I just made a mistake in that last step.” But really, this goes a lot deeper and hits at the challenge of working with arbitrarily-chosen values. If you want to rule out all choices of m, you have to really show that no choice of m is going to work, not just find a family of infinitely many numbers that don’t work. There’s also another issue with setting m = 3k^2 without actually saying what k is in the first place (is it a placeholder? you generally can’t do that to placeholders.)

This is a tablespoon of salt error. The error seems superficial (either “you can’t use inequalities that way” or “not all integers m can be written as 3k^2 for some integer k.”) The error completely breaks the proof (the same reasoning can be used to prove patently false statements without much modification). And the error indicates a fundamental misunderstanding of some key concept (arbitrarily-chosen values, placeholder variables, etc.)

A question that I keep grappling with as an instructor, and which I’m not sure I’ll ever really figure out the best answer to, is how to assign grades to student work with a tablespoon of salt error in it. I remember my own frustration as a student getting slammed for mistakes that I thought were pretty minor. I also remember how valuable those experiences were, though, since they really forced me to admit that there were things I didn’t yet know and they helped me figure out where I needed to study up.

Figuring out how to handle this case in a way that pushes students to do better without making them feel overwhelmed by what they don’t yet know is a challenge. It’s exacerbated in the tablespoon of salt case because the grading feels crazily disconnected from reality.

There’s a case currently up at the US Supreme Court (District of Columbia v. Wesby) where someone hosted a party in a vacant house that they had no business being in. The police came, questioned everyone there, and arrested everyone on suspicion of trespassing. In DC law, trespassing requires a mens rea on the part of the perpetrator – they have to know that they’re not supposed to be there. The question before the Court is about probable cause. The police report seeing a group of people in a house whose situation was so rough that the partygoers should have known that they were breaking it. The partygoers reported that they thought they were invited to a house they were allowed to be in. The disconnect between these narratives is currently being decided by nine very intelligent legal minds and will have a huge impact on law enforcement.

From the perspective of an instructor, a student who made a tablespoon of salt error has a serious gap in their understanding that needs to be corrected. From the perspective of the student, they made a minor error and the graders harshly took off way too many points for it. It’s a clash of narratives, which is never fun.

I’m going to keep tuning my approaches to handling cases like these. I figured I’d post this concept in the meantime in case the metaphor proves useful in other contexts.


What is a Hash Function? It Depends on Who’s Speaking

Hash functions are one of those wonderful mathematical objects that have a ton of use cases. You can use them to check whether a download failed due to a random error, or to build fast data structures (including the count-min sketches, the simplest data structure that most people haven’t heard of), or as a basis for cryptocurrencies.

Having taught a combination of courses, some more programming-oriented and some more theory-oriented, I’ve found that the term “hash function” and what’s expected of it varies a lot by who’s doing the talking. The idea of a hash code to a veteran Java programmer bears little resemblance to the families of hash functions that theoreticians would talk about, which in turn bear little resemblance to the cryptographic hash functions that computer security folks spend so much time working on. This can make it challenging for programming students to transition to learning about the theory behind hash tables or other data structures, as I learned the hard way when I didn’t recognize this when teaching algorithms for the first time.

This post is my attempt to talk about the different expectations of how hash functions work, or are expected to work, across different areas of computer science and software engineering. My hope is that if you’re a practicing programmer and are starting to study algorithms, data structures, or cryptography that this gives you a sense of what the expectations are when you dive into those fields.

Hash Functions: Programmer Edition

If you’re a practicing programmer, you probably first encountered hash functions in the context of hash tables, like the Java HashMap, the C++ std::unordered_map, or the Python dict type. You may understand how these data structures work, or you might just know that “they’re fast” and “they need a hash code.”

From a programming perspective, the most common times you’ll see hash functions are when you have to write them yourself. For example, if you’re a Java programmer, you probably learned to write a hash code like this:

@Override public void hashCode() {
    int result = myFirstField.hashCode();
    result = 31 * result + mySecondField.hashCode();
    result = 31 * result + myPenultimateField.hashCode();
    result = 31 * result + myUltimateField.hashCode();
    return result;

You’ve probably heard that hash codes are supposed to work so that different objects generally give different hash codes, and that equal objects have to have equal hash codes.

If you’ve implemented a hash table, perhaps as part of an intro class, or perhaps because you just thought it would be fun, you’ve had to be a client of a hash function. You maintained some kind of array of slots and determined which slot or slots to look for an entry by computing its hash code to get back some value (probably a 32-bit or 64-bit value), modding it by the table size, and then looking in that spot.

Then there’s the question of what you likely prioritize in a good hash function. You probably care about things like how well it spreads things out (low spread is bad for hash tables, high spread is good for hash tables) and how fast it runs (the faster, the better!).

To summarize, the key assumptions are that

  • each object has a hash code, which you usually implement by doing something involving multiplying by 31 and adding values together;
  • equal objects should have equal hash codes, and unequal objects should try to have unequal hash codes;
  • hash functions are usually 32-bit integers or 64-bit integers that get modded by some kind of table size;
  • there’s a bunch of “other” hash functions you can use to get a fingerprint that helps you check if you downloaded something properly; and
  • it’s good to write hash functions that spread things out and run quickly.

Hash Functions: The Data Structure Theory Perspective

If you’re a researcher studying algorithms or data structures, you have a pretty different point of view about what a hash function is and how it should behave.

Mathematically speaking, a hash function h is typically modeled as a mathematical function from some set of possible inputs (often called the key space or the universe) to a set of outputs, which is usually a range of integer values. For example, if you were building a hash table that had, say, 100 slots in it and you wanted to store integers in that hash table, you would want a hash function that took in an integer and then outputted an index between 0 and 99, inclusive. Mathematically, we’d model this as a function h : \mathbb{Z} \to \{0, 1, 2, ..., 99\}.

A few things might jump out here. First, we no longer speak of individual objects of having “a” hash code. That is, there’s no longer the idea that every object has exactly one hash code as some sort of intrinsic property. Instead, we think of the objects that we’re hashing as existing on their own, and then we come from the outside with some hash function h that suits our needs. In other words:

If you are a software engineer, you probably think of every object having a hash code. Hash codes are an intrinsic property of objects.

If you are a theoretician, you probably think of there being all sorts of possible hash functions that we could use for a group of objects. We can define lots of different possible hash functions for a given collection of objects. A hash function is an extrinsic object that gets applied to objects.

There’s another detail here. In The Real World, where there’s Object.hashCode(), or std::hash<T>, you have one hash function that needs to work for any size of hash table you might want to toss an object into. This means that hash functions typically output some sort of 32-bit integer or 64-bit integer, or more generally something big enough that no matter how large of a hash table you make, your hash function can provide outputs that can span the whole range.

But in Theoryland, it’s far more common to think of hash functions as being specialized for a particular size of table. If I want a hash function that I’ll use in a string hash table that hash 137 slots, I’ll model it as a function that takes in a string and outputs a number between 0 and 136, inclusive. (If you’ve seen the notation \Sigma^* from formal language theory, that means I’d think of my hash code as a function h : \Sigma^* \to \{0, 1, 2, ..., 136\}. If I then resize my hash table to have 237 slots, I would model it mathematically as though I’d picked a brand new hash function h' that takes in strings and outputs a number in the range \{0, 1, 2, ..., 236\} (with formal notation, that would be something like h' : \Sigma^* \to \{0, 1, 2, ..., 236\}. For those of you with a discrete mathematics background, the terminology here is that the codomain of the hash function is a formal part of the specification of the hash function in the first place.

Because hash functions in Theoryland are external to objects, rather than intrinsic to them, it’s not a problem if we decide to completely switch the hash function we’re using. In an actual programming language like Java or C++, though, the idea of switching hash codes is hard to model, since it’s not like we can just swap out std::hash<T> or hashCode for some other implementation.

In other words:

In software engineering, hash codes usually output very large values that then get modded down to the size that’s needed. If your table grows in size, you use the same hash code and just change what you mod by.

If you’re a theoretician, you model a hash function producing one of only a few possible values by building the set of possible outputs into the definition of your hash function. If you have a change in a hash table size, you pretend you just grabbed a brand new hash function with a different codomain.

These two differences – that hash functions are external to objects, and that you can just grab new hash functions with different codomains whenever you need to – can come out of nowhere if you’re first learning how to analyze algorithms or data structures, because a professor who’s been working in Theoryland might treat this as so fundamental that it’s not worth pausing to comment on.

But things get a bit weirder from here, especially if you go on to study algorithms or data structures more formally.

Let’s imagine you have a hash function that takes as input some kind of object (say, widgets) and outputs a natural number between 0 and n - 1, inclusive. Let’s have U be the set of all possible widgets. I’m going to assume that there are a lot of different widgets out there, say, way, way more than n of them. In that case, if you know in advance what hash function h : U \to \{0, 1, 2, ..., n - 1\} is going to be used in some hash table, you could envision what would happen if you hashed every single possible widget and looked at what hash code you got back. The handy pigeonhole principle tells you that there would always be some collection of at least \frac{|U|}{n} objects that have the same hash code.

If you’re a theoretician, this means that you have to worry about the case where your hash table happens to get \frac{|U|}{n} elements dumped into it that all have the same hash code. It doesn’t matter what hash function you pick – there’s simply too many widgets to distribute them in a way that doesn’t have a lot of collisions.

If you’re a Nefarious Scoundrel, you might have this same idea and reach the conclusion that you could break hash tables by doing some preprocessing to figure out what sorts of elements collide with one another, then dump them all in to some web server’s internal hash tables at once, degrading performance. And congrats, you’ve just invented the HashDoS attack, and this is why we can’t have nice things.

From a Theoryland perspective, given that there’s this fundamental weakness in hash functions, it makes sense to try to search for some way to build hash tables that can’t get clobbered in this fashion. And this is where we can use the nice idea that hash functions are external to objects rather than an intrinsic property. What if, when we build a hash table, we choose the hash function to use out of a pool of potential candidates? From the perspective of our Nefarious Scoundrel, this makes attacking our hash tables a lot harder, since no matter how much up-front work they do to try to figure out a collection of bad inputs, there’s a chance that we’ll pick a function out of our pool that prevents this attack

You often see terms like “universal family of hash functions” or “family of 2-independent hash functions” tossed around in Theoryland when discussing hash tables for this precise reason. When we’re reasoning about properties of hash tables, we often imagine that we have a pool of hash functions to pick from (which we’d more commonly call a family of hash functions, just because it sounds nicer) and we choose one such function at random to use in a hash table.

This hits at a very large gap between the theory of hash functions and what software engineers likely encounter:

If you’re a software engineer, you think of each object as having “a” hash code. You might also think of switching between different hash functions like MurmurHash, SAX, etc., but only in specialized circumstances or for performance reasons.

If you’re a theoretician working on algorithms and data structures, you often think of having large families of hash functions to pick from, and design data structures on the assumption that you can choose from these hash functions at random if necessary.

That being said, there isn’t a complete disconnect between theory and practice here. After enough Nefarious Scoundrels started crafting HashDoS messages to web servers to slow them down, many language implementations (like the JVM) started adding some randomness to how they hash fundamental types like integers, strings, and arrays. This meant that primitive types would not produce the same hash codes from program run to program run, which was less than ideal if you liked precomputing hashes and storing them on disk, but makes it much harder to HashDoS a website.

Because the hash functions used in algorithm and data structure design typically look so different than the ones that are usually talked about at the language level, the qualities that are optimized over look quite different. Families of hash functions in Theoryland are usually categorized by the strength of the assumptions you can make about them. If you choose a random hash function from a family and evaluate it on a single input, can you predict with better than random chance what it will do on another input? If not, you have a 2-independent family of hash functions. If you pick a random hash function, can you say that any two objects, no matter how they’re chosen, have a low collision probability? If so, you have a universal family of hash functions.

To summarize, in the context of algorithms and data structures:

  • hash functions are external to the objects they work on and are not intrinsic properties;
  • hash functions typically have their codomain specified in a way that bakes in the size of the range that they’re mapping to;
  • hash functions are typically grouped into families, with the idea that a hash function will be picked out of the family at random; and
  • the “power” of a hash function is usually measured in terms of statistical guarantees and precise inequalities bounding collision probabilities, rather than in terms of speed or a more flexible idea of “well-distributed.”

Things Not Covered Here

This is, of course, a very quick and probably somewhat unfair characterization of what hash functions look like in programming practice and in the theory of algorithms and data structures. I didn’t talk about cryptographic hash functions, which get used all the time in both theory and practice. I didn’t address clever custom hash functions like locality-sensitive hash functions or the hash functions you might use in derandomization.

My intent is to help make it easier for people used to writing code and worrying about performance to be able to read about the theory behind why things work, and hopefully to learn a thing or two in the process of doing so. I think it would be great if more people knew about count-min sketches, or more clever hash tables like the FKS dynamic optimality scheme, or using LSH to speed up finding related items in a data set.

Designing Assignments with Google Forms

Over the last academic year, I’ve gotten a lot of mileage designing online assignments using Google Forms. I’ve typically seen them used for event RSVP systems, but apparently if you’re creative about how you use sections and multiple-choice questions and data validation, you can use Google Forms to ensure that students work through a particular series of exercises, and can even offer targeted feedback along the way.

There are two particular techniques that I’ve gotten the most mileage out of. The first one is an approach I’m calling the multiple choice maze. This is a Google Form that’s a series of multiple choice questions. Each time you provide a wrong answer, the form takes you to a page with either a hint or an explanation about why that answer is incorrect, followed by a link back to the previous page. The form can only be submitted on the last page, which ensures that students have to answer everything correctly before they can submit the assignment.

You can try out an example multiple choice using this link.

There are a number of really nice advantages to using a multiple-choice maze for an assignment. First, it’s a highly scalable way to make sure that everyone in the class is on the same page about something. When I last taught CS106B, I used a multiple choice maze to make sure that the students were familiar with the Honor Code policies for the course; we asked a series of multiple choice questions about what actions were and were not permissible, and only allowed students to submit the assignment by getting all the answers right. Second, it’s an effective and lightweight way to make sure students are on top of lecture material. I sent the spring quarter CS103A students a multiple choice maze assignment every Wednesday evening covering material from the week’s two lecture meetings and specifically asked questions that required them to exercise the skills and concepts we’d taught. Because the feedback is automatic and customizable, students can quickly size up whether their understanding of the concepts is where it needs to be and can get refreshers and reminders on topics that subject. Finally, since everything is handled through Google Forms, once all things are said and done we have a reliable way to determine who’s completed the assignment, since we have timestamps for all of the submissions and can ask for names.

Once you’ve gotten the hang of it, it’s not too hard to assemble a multiple choice maze. For example, here’s the internal view of the first question from the form linked above:

Screenshot from 2017-08-10 17-29-48

Here, I have a multiple-choice question. It’s set to required so that anyone working through the form has to answer it. In the dropdown menu in the bottom corner of the question, I’ve checked the Go to section based on answer option, which lets me have each question jump to a different section in the form. As you can see, the three wrong answers all link to Section 2 (titled “Oops! That wasn’t quite right.”) and the correct answer links to Section 3 (“Some Exposition!”). That way, if you answer correctly, you get taken to a section where we can move on to the next question, and if you answer incorrectly, you get taken to a different section where we can explain why that answer is wrong. Here’s what Section 2 looks like on the back-end:

Screenshot from 2017-08-10 17-29-08

Notice that there are no questions in this section and that the section is purely text describing how to approach the problem. (I’ve never taught addition of fractions, so I suppose that this is probably not a very good explanation… sorry!) Importantly, the After section 2 option, which controls which section to visit after seeing this section is set to Go to section 1, which is the section containing the initial multiple-choice question. That way, if you get the answer wrong, you’ll get this nice helpful text and then get warped back to the question so you can make another guess.

We can take this a step further by creating questions where each wrong answer goes to a separate screen with a follow-up. For example, here’s the back-end view of the second question from the list:

Screenshot from 2017-08-10 17-29-20.png

Notice how each wrong answer goes to its own individual section, each of which is designed like the one above to warp back to this question.

The second approach I’ve found useful for making Google Form assignments is what I’m calling a find the hidden number assignment.

Last time I taught CS106B, I wanted students to get more familiarity using the debugger over the course of the quarter and figured a good way to do this would be to create a debugging assignment. For the very first assignment of the quarter, I took a standard program we’ve given out for a while that computes a rolling hash code, then designed an assignment that asked students to single-step through the hash code computation on a specific input. At some point in the single-stepping, I asked students to write down one of the numbers that was generated at an intermediate step. The Google Form for the assignment then required students to enter that number in order to move on to the next part of the assignment (and, therefore, the “Submit” button.)

Here’s a demo of what the form might look like. The secret number is 137.

The main advantage of setting up an assignment like this is that it functions as a lightweight “proof-of-work” system. Students need to show that they did some amount of work – which you can customize – in order to receive credit for the assignment. In my case, I was able to use this system to ensure that (in principle) everyone in the class had successfully gotten their IDE and debugger set up and were able to run programs and step through them in the debugger.

Setting something like this up isn’t too hard and just requires you to enable data validation and to choose the “Equal to” option:

Screenshot from 2017-08-10 17-43-41

Are these tools perfect for the job? No. But do they work well? Evidence seems to suggest that they do! The multiple choice maze I set up for the Honor Code seemed to work well; we had way fewer copied assignments submitted the quarter we deployed it than what we usually see, and more students seemed to know how to use the debugger.

I’ll probably keep using this technique in the future (once the forms are set up, it’s not at all hard to copy them from quarter to quarter). Let me know if you’re using anything along these lines or find this approach useful or interesting!

Negafibonacci Representations

Today I found this amazing question about representing numbers in negafibonacci over on Stack Overflow. The negafibonacci number system represents each integer as a sum of terms of the form (-1)^n F_n using only the digits 0 and 1. I played around with the problem a bit and found a really cool algorithm that efficiently converts numbers into negafibonacci by subdividing the integers into different ranges bounded by Fibonacci numbers and using those ranges to decide which bits to include.

A Case for Undecidability via Self-Reference

One of the major results that’s covered in most intro CS theory courses is that there are problem that are truly undecidable – there’s no general algorithm that can tell you the correct answer in all possible cases.

The typical way that undecidability is taught goes something like this:

  • Students first see a concrete example of an undecidable language, like the halting problem. The proof that the halting problem is undecidable typically goes something like this:
    • Assume that there’s a decider D for the halting problem.
    • Using D as a subroutine, produce a machine H that takes as input a TM M, then runs D on \langle M, \langle M \rangle \rangle, then loops of D says M will halt on \langle M \rangle and halts if $langle D$ says otherwise.
    • Run H on its own encoding \langle H \rangle. Contradiction!
  • Students then use mapping reducibility as a mechanism for showing that other related languages are undecidable. Those proofs typically look like this:
    • Given as input a TM M, construct a TM D that runs M on some input and has some complex behavior if M halts on some input and has some other complex behavior if M loops on that input.
    • Use the fact that a TM that could decide whether D has the property in question would decide the halting problem to show that the problem in question is undecidable.

When I first taught CS103, this is the approach I took to introducing undecidability, and I found it to be less than ideal for a number of reasons. First, the construction used to show that the halting problem (or, in my case, the language A_{TM} of the universal TM) is undecidable is extremely tricky. It involves starting with a decider for the language and making a series of highly non-obvious transformations on it – converting it so that instead of taking two inputs, it takes a single input; building a new TM that listens for the result from the decider and then does the opposite; and finally running that TM on itself.

Second, once students had figured out how the construction worked, they had little intuition for what it was exactly about the halting problem that made it undecidable. “I can follow the proof,” they’d say, “but that still doesn’t give me a high-level intuition for how you’d eyeball this problem and have a hunch that it’s undecidable.” This is partly because the standard proof that the halting problem is undecidable is fairly technical and requires a number of non-obvious steps.

Third, students often had little intuition for how reducibility had anything to do with the intrinsic difficulty of problems. Part of this might have been specific to CS103: since the course doesn’t have algorithms as a prerequisite, for most students these sorts of impossibility reductions were the first exposure they’d had to reducibility. Many students thought that “reduction” mean “put a TM inside of a TM so that you can put that TM inside of another TM” and didn’t realize that there was a bigger-picture idea at play. Even after mastering the technique of proving undecidability via self-reference, many students expressed frustration that they understood how to execute a proof of undecidability, but had no idea why the reduction they’d done had proved anything.

In Fall 2014, I decided to abandon presenting undecidability using this above route and instead opted to use direct self-reference as a mechanism. After tuning and tweaking the presentation over the past three years, I’m very happy with how it’s worked.

This presentation of undecidability works in a number of stages. After first establishing the fundamentals (what decidability and recognizability mean, how languages connect back to the abstract notion of decision problems, how TMs work, etc.), I present a series of claims, in sequence, that build up to our first proof of undecidability:

  • Universality is an emergent property of computation. It’s possible to build TMs that compute on other TMs and for TMs to run other TMs. (This introduces A_{TM} and also is a great launching point for talking about why virtual machines, compilers, etc. are possible.)
  • Self-reference is an emergent property of computation. I introduce an informal version of Kleene’s recursion theorem by showing how to write a Quine (with the strong caveat that what I’m doing is a proof of concept rather than something students are expected to know how to do!), then show how to write programs that do things like ask how many characters long they are or that print out whether they’re an even-length or odd-length program. I conclude by mentioning that, essentially, any program in any programming language can be modified to include a new function mySource() that returns the source code of the program.
  • The combination of these two ideas leads directly to undecidability. Imagine you have a decider for A_{TM}, which we can model in software as a function willAccept(program, input) that takes as input the source code of a program and an input, then returns whether the program will accept the input. We then write this program, which asks whether it’s going to accept its input and then does the opposite:Screenshot from 2017-07-17 10-13-53
  • Generalize the above pattern. We can write all sorts of self-referential programs along the above lines to show that many different problems are undecidable without needing to appeal to reducibility.

From what I’ve seen, this approach offers several advantages over the version I outlined above. First, while there’s certainly some creativity involved in coming up with a self-referential program like the one above, the amount of effort required is lower than in the traditional proof that A_{TM} is undecidable. There isn’t a complicated step of taking a two-input decider and turning it into a one-input decider, then flipping the accept and rejecting states, then running it on itself. This program is more of a “one-step” trick. The program asks what it’s going to do (is it going to accept its inputs?) and then proceeds to immediately do the opposite.

Second, this approach provides a powerful intuition for why many problems about TMs are undecidable. I often frame these sorts of self-referential constructions in terms of the classical trope of oracles and seers predicting the future. If you have a decider for a language about the observable behavior of TMs, you can think of it as an oracle (in the classical Greek sense rather than the complexity theory sense) that will predict the future by telling you what the program is or is not going to do. With self-reference, programs can then use such a decider to ask questions about what they themselves are going to do in the future, and therefore they can just decide to do the opposite of whatever is supposed to happen next. Students can point directly at the self-referential machine they construct as an explanation about why the problem they’re exploring is undecidable – they can say that this specific program will do the opposite of what it’s supposed to do if the underlying language is undecidable.

Finally, from a CS103 perspective, this approach is pedagogically useful for motivating reductions in a more practically-oriented setting. I touch on reducibility in the context of P and NP by showing it as a way of solving new problems using a solver for an original problem. For example, I show off how to solve the problem “given a box of connectors, can you find a way to plug your laptop’s VGA output into a projector’s HDMI input?” using a solver for the directed reachability problem, and the problem “given a crossword puzzle grid, how many dominoes can you place on it so that each domino covers two white squares and no dominoes overlap?” using a solver for maximum matching. As a first introduction to reducibility, I think these examples build a much better intuition for how reductions function than the computability “put a TM inside a TM inside another TM” reductions students would otherwise encounter first.


From experience, the sorts of questions I’ve gotten from students about this technique tend to be dramatically more focused and interesting than the questions I used to get about reducibility and the standard proof that the halting problem is undecidable. Students often ask about the limits of this technique – why can’t you use self-reference to prove that every language is undecidable, for example? – and about whether it generalizes to show that every language about TMs is undecidable. These are great questions and an excellent launching point for discussions of topics like Rice’s theorem.

There are still some bugs I need to work out of the presentation. I’ve been using C/C++-style pseudocode, which more students feel comfortable with than rather than high-level TM descriptions a la Sipser. This works well, but getting the right syntax down to make the learning as easy as possible is tricky and I think I still have some more work to do on that front.

There is also the shortcoming that this approach completely bypasses reducibility in the conventional sense. For the purposes of what I do in CS103 I think this is totally fine, though in other contexts this might not work out so well.

Going forward, I’m hoping to get this fully polished and to write a more in-depth explanation of the pedagogical decisions behind it and the supporting materials I’ve used to make everything click.

What Does it Mean to Solve a Problem?

At the tail end of CS103, we talk about decidable and recognizable (R and RE) languages as a way of talking about problems that are beyond the capacity of computers to solve. I’ve typically framed the discussion of these languages in terms of trying to answer this overarching question:

What problems can be solved by a computer?

Since CS103 stresses mathematical rigor, I usually introduce this question less as a concrete mathematical goal and more as a framework for thinking about compution. In order to rigorously answer this question, we nee definitions of “problem,” “computer,” and “solve.” In formal language theory, problems are modeled by languages and computers are modeled by Turing machines. The idea of “solving” a problem can then be formulated in terms of decidability and recognizability, along with a ton of other solution strategies that we don’t explore in the course (e.g. co-recognizability).

To help students get why it’s so important to have a formal definition of what it means to solve a problem, I’ve added in this question to the problem set, which connect the earlier material on implications to the later material on computability:

Let L be a language over \Sigma and M be a TM with input alphabet \Sigma. Below are three properties that may hold for M:

  1. M halts on all inputs.
  2. For any string w \in \Sigma^*, if M accepts w, then w \in L.
  3. For any string w \in \Sigma^*, if M rejects w, then w \notin L.

At some level, for a TM to claim to solve a problem, it should have at least some of these properties. Interestingly, though, just having two of these properties doesn’t say much.

  1. Prove that if L is any language over \Sigma, then there is a TM M that satisfies properties (1) and (2).
  2. Prove that if L is any language over \Sigma, then there is a TM M that satisfies properties (1) and (3).
  3. Prove that if L is any language over \Sigma, then there is a TM M that satisfies properties (2) and (3).
  4. Suppose that L is a language over \Sigma for which there is a TM M that satisfies properties (1), (2), and (3). What can you say about L? Prove it.

Many students, upon reading this problem, think that they must have misread something, since it looks like it suggests that all languages are decidable or something along those lines. Working out why exactly this isn’t the case is trickier and forces students to read the properties closely to determine what they mean.

These questions also segue nicely into questions about sound and complete approximations to problems. Many students ask whether, if a language is undecidable, there’s still a way to build a TM that gets the right answer “most of the time,” for some definition of “most of the time,” or that gets an “approximately” correct answer. This question shows that it’s always possible to find a decidable subset or superset of any given language (though not a particularly useful one), and is a great launching point for a further discussion of the topic.

A Set of Things Not Implied

The first midterm exam for CS103 focuses heavily on set theory and formal logic, since they form the backbone for so many of the later mathematical structures and concepts that come up the course (and more broadly in discrete mathematics and theoretical computer science), and so each quarter I try to come up with at least one good question relating the two together.

A while back I came up with the following problem, which has had a lot of staying power and is a great way to test the two concepts:

Let A and B be arbitrary sets and consider the set S defined below:

S = \{ x | \lnot(x \in A \to x \in B) \}

Write an expression for S in terms of A and B using the standard set operators (union, intersection, etc.), but without using set-builder notation. Briefly justify why your answer is correct.

I highly recommend taking a minute to work through this problem if you haven’t worked with propositional logic or set-builder notation in a while.

One of the reasons I love this problem so much is that the set-builder notation for the set just plain doesn’t seem to make sense. What on earth is meant by a set of things where it isn’t implied that if the things are in one set, they’re in another? But if you proceed slowly, expand out the negation of the implication x \in A \to x \in B, then read the overall expression, you can see that it ends up working out really nicely to the set A - B. That in turn might cause you to wonder what the connection between a negated implication and the set A - B is, which might you remember that negations only fail to hold when the antecedent holds and the consequent doesn’t. That in turn might then cause you to wonder whether there are other connections between set theory and propositional logic, and now you’re starting to get a better sense for how everything more generally relates to one another.