Created by Emma Pierson

Previously I introduced a mathematical way of figuring out the probability that someone is romantically interested in you; click on things.

As a brief recap, an arrow from circle A to circle B means that "A affects B", so whether you are captain of the chess team affects whether she likes you and whether she likes you affects whether she smiles at you. Clicking on a circle corresponds to asking the model, what is the probability she likes me given this evidence? [1] So the initial probability that she likes you is 7.5%, but the probability that she likes you given that you're captain of the chess team is 29.5% (this is based on hard data, by which I mean "my high school romantic preferences") and the probability that she likes you given that you're captain of the chess team, you've spoken to her, she has trouble forming complete sentences around you and asks you for private grammar tutoring and smiles at you in the hallway even though her dog died and flowers turn up from your doorstep and they're not from your mom -- well, you're golden, brah, that's 99.9%. (Of course, at none of these percentages should you assume consent: inference can only imply that you should ask for it.)

This is a pretty cool model! But how do we actually compute these numbers? The simple way to compute the probability she likes you is to make a gigantic table where each row is one possible world (she likes me = YES, she sends me flowers = YES, her dog died = NO, etc); then you add up the probabilities of all the worlds in which she likes you, and there you go. But there are a lot of possible worlds: in this example, we have 9 circles, each can be True or False, so we have 2 ^ 9 possibilities = 512. Which is fine, but if we scale up to 40 circles, we'll have 2^40 = 1.1 trillion possibilities, which will definitely kill my laptop -- and for a really complex situation, like medical diagnosis, we're completely screwed:

So we need to avoid computing the whole table. Fortunately, we can do this. Consider a simple example:

You want to compute the probability that you'll die alone (don't we all?) but you don't want to compute the full table, which has 2^5 = 32 entries. Start at the beginning of the chain. Let's say the probability she likes you is 20% (I'm bumping up your odds here because otherwise this example, like your love life, will be a little dull) and the probability that you snuggle, if she likes you, is 50%; otherwise, it's 1%. Then the probability that you snuggle overall is 20% * 50% + 80% * 1% = 10.8%. So now our chain is one shorter:

With the probability of "You Snuggle" being 10.8%. And we can just continue like that, eliminating the first circle in the chain, until we get to the end. At each step we've done a simple calculation, and we've never had to compute the full table. There's a general version of this "eliminate one variable at a time" procedure; if we choose a clever elimination order, we can drastically reduce the work we have to do. Unfortunately, it turns out that a) figuring out a clever order is really hard [2] and b) even if you do find the right order, you may still have to do a lot of work.

So we may have to give up on computing the exact right answer. But we can approximate it using a technique called sampling: we run many simulations, and take the average. This is what Nate Silver uses, for example, to predict elections and the World Cup. Here's what sampling looks like on our original example; click "Run" to start drawing samples, and "Faster" to speed it up.

In each simulation, we start by randomly setting "You have Never Spoke to Her" and "You are Captain of the Chess Team" to true or false with the correct probabilities (for example, "You are Captain of the Chess Team" is True 1% of the time); then we move downward, randomly setting each circle to true or false with probabilities that depend on its parents. Then we just compute the fraction of simulations in which she likes you. (Now instead of saying "Never in a thousand years" you can say "Never in a thousand simulations" -- a more mathematical way to reject someone).

Actually, there is a technique called "rejection sampling", which we can use to incorporate evidence: for example, to compute the probability that she likes you given that flowers have turned up at your doorstep. In this case, we're not interested in every simulation -- just in the ones where flowers turn up at your doorstep -- so we throw out (reject) all the others. You can see an example of this below. Click on circles to set them to True or False -- leaving a circle as grey means that we don't know anything about it -- and then click run to simulate.

One thing you'll notice is that we throw out a lot of samples: this technique isn't very efficient. In the flowers-on-doorstep example, I had to run 2100 simulations to get 100 that I could actually use. There are many more sophisticated ways of sampling, and of approximating probabilities in general, because we have to do it whenever we have something which is a) uncertain and b) too difficult to come up with an exact solution for. If you're curious, some places to start would be Gibbs sampling, Metropolis Hastings, or belief propagation.

Notes:
[1] There's an important difference between observing a variable's value and manipulating the world so it takes a value. For example, if I observe that flowers are on my doorstep, that's a good sign for my romantic prospects -- but if I just put the flowers on my doorstep, that doesn't tell me anything about my romantic prospects. (Actually, it's probably a negative sign, because that's kind of sad.) This is known as the difference between observation and intervention; in the examples above, I am discussing observation. Thanks to Andrew Critch for pointing this out. [2] Ie, NP-complete.