MCMC on a 7x7 Grid | Metric Geometry and Gerrymandering Group

Once more Gridlandia has gotten a little bigger, and now our task is to examine drawing seven connected, equal-size districts (we're going to skip the 6x6 grid because we don't want to have to worry about ties). We'd like to be able to do a similar analysis as in the 5x5 case, where we can examine a proposed plan and a distribution of voters to see if there is evidence that the plan was intentionally selected to favor or disfavor one of the parties. This time, however, we don't have a full list of all of the legal plans. We know that there are 158,753,814 of them, and even with a careful encoding scheme, a file containing all of these would be a couple of gigabytes. But this is okay! If we have trouble writing down all of the legal districtings of a 7x7 grid, there is no way we'll be able to do such a thing for a real jurisdiction, so we can use this as an opportunity to develop some sampling techniques.

How do we generate a random, and hopefully representative, sample of plans? There are two really simple techniques which are terrible on their own, but when combined become very powerful. The first is to have a computer generate district assignments of the cells by a uniformly random process and keep the ones which satisfy the criteria for being legal districting plans. What's great about this is that each valid plan occurs with equal probability. What's not so great is that this probability is extremely tiny. While there are a very large number of legal plans, there are way, way, way more labellings than legal plans—about $10^{33}$ of them. Even if we had a computer that could generate 1,000 attempted plans per second, we wouldn't expect to hit a valid one within a human lifetime, or the duration of life on earth.

The second strategy is to use a human to produce plans. This has the advantage of not wasting time drawing and checking labellings which are not legal plans. The downside is that, in the grand scheme of things, humans are pretty slow at this task. Some of our brave researchers tried to write down all of the legal plans for the 4x4 grid and, although they succeeded, it did take quite a few hours of time.

How can we combine these? What we can do is start with a human-drawn (or human-assisted) plan, then use a computer to randomly modify it. Remember when we defined two plans to be adjacent in the metagraph if we could transform one into the other by swapping two cells in adjacent districts? Since there are only 1176 ways to randomly pick a pair of cells to try to swap in a 7x7 grid, a computer can very easily move from one plan to the next, which is exactly what is happening in the animation below. Every half-second, an algorithm randomly picks two cells and checks if exchanging their district assignments yields a legal districting plan. If so, it moves to that plan. If not, it tries again.

We are doing a random walk on the metagraph. Each plan is represented by a node in the metagraph. Every second, the walk takes a random step to one of the neighbors of the current plan. This algorithm is remarkable for three important reasons. The first is its simplicity—the demo above is running live off of about 50 lines of JavaScript, in your browser, right now. The second is the effectiveness and speed. While it would take a long time to enumerate all of the legal districting plans, we can definitely enumerate a huge chunk of them very quickly. The third is that the process often rapidly converges to a truly representative sample of the full universe.

Introducing MCMC: A Demo

This kind of sampling algorithm is called a Markov chain Monte Carlo method, or MCMC for short. These are incredibly powerful tools for sampling, approximation, and optimization in settings where it is difficult to get your hands on the object you're interested in, such as the space of all valid districting plans.

Once again, you can select a distribution of voters below. This time you can design a plan by clicking and right-clicking the cells to change their district assignment, or press the 'Random Plan' button to have the random walk choose one for you. Once you have a valid plan, press the 'MCMC Sampler' button to generate a histogram. The Blue bar represents the number of seats your distribution of voters the Hearts Party wins under your selected plan, and the histogram bars show the seat shares for a sample of 1,000 plans generated by the MCMC random walk. If the Blue bar is to the left of most of the mass of the histogram, it means that the nearby sample found a lot of plans that gave more seats to the Hearts Party than yours. If the Blue bar is to the right of most of the mass, it means that the sampler found a lot of plans which give more seats to the Clubs Party than yours.

Design a distribution $\Delta$

Click cells to toggle

Design a plan $\mathcal D$

Click cells change their district assignments

Is your plan an outlier for this distribution?

Number of ♥ seats

If you leave your distribution of voters alone and click the Sample button multiple times, the histogram will change, but not by too much, and the general shape will still be the same. This is because even though 1,000 is a very small number compared to 158,753,814—less than one percent of one percent—it's still a large enough sample of plans to be a fairly representative of the whole collection of plans. How to do this in the real world where we need to draw districts on states and municipalities which may have hundreds or thousands of cells is a hot area of research in redistricting.

Sampling real-world districting plans is of course much more complex than sampling on the grid, but fundamentally the idea of the random walk is the same. When you change the details of how you conduct the random walk, you change the "stationary distribution"—in other words, it alters the weight that tells you how likely each plan is to come up if you were were to walk for a very long time. The scientific community is currently spending lots of energy to understand the details of how to conduct these walks to come to the strongest conclusions about the universe of redistricting possibilities. In the next page, we examine a richer model of determining election outcomes and demonstrate the use of MCMC for optimization, rather than simply sampling.