Over on Facebook, fellow Pixarian Arun pointed out this story on a problem I encountered in my undergraduate schooling as “The Marriage Problem”, which I also heard of as “The Secretary Problem”. The idea is that you are supposed to find a wife (or secretary) out of a list of $n$ possibles. Each one is assigned a score, but are presented to you in random order. You look at each possible spouse (I’m gonna stick with the Marriage formulation from now on) and can see what her score is. You then have a choice: either propose, or lose her forever and go on to the next applicant. The goal is to maximize the odds that you will find the best spouse.
This was presented to me as part of a programming assignment when I was an undergraduate. Luckily, I recalled that Martin Gardner had covered this topic in one of his Mathematical Games columns for Scientific American as The Game of Googol. He described the problem thusly:
The numbers may range from small fractions of 1 to a number the size of a “googol” (1 followed by a hundred 0’s) or even larger. These slips are turned face down and shuffled over the top of a table. One at a time you turn the slips face up. The aim is to stop turning when you come to the number that; you guess to be the largest of the series. You cannot go back and pick a previously turned slip. If you turn over all the slips, then of course you must pick the last one turned.
I’ve given you some pointers, you can either work on the problem or go find the answers with Google. But the basic idea is that you can solve the problem thusly: find the maximum score for the first $n/e$ items in the list, then propose to the next applicant whose score exceeds that value. A little bit of tricky (but not impenetrable) math will show that this is the optimal rule, and that the odds of selecting the best spouse is $1/e$. Rather than do harder math, I wrote simpler simulations, and graphed all potential cutoffs, running 10,000 trials, and graphing the number of times the best mate was selected for each threshold. Voila:
But Arun was confused, and so was I. After all, if the best spouse was in the first $n/e$ items, you can’t win. And if the second best is in the first $n/e$ items, you can’t get the second best, and so on. So what was the average performance? After all, perhaps you are one of those people who don’t believe that there is one best person for you. Twisted as that is (I can say so, since I found my optimal mate) maybe your goal would be to pick the threshold to select the best average mate. If you run the same calculations, but instead graph average score, you get something like this:
In other words, if you want to get the best wife on average, you should come to your conclusion much faster. It isn’t quite clear to me what the optimal setting for this parameter is, but I’m too tired to do the math right now. But perhaps I’ll follow up.
The first time I encountered this problem was in a probability class. It was used as an example of how to prevent worst-case results. For example, if an adversary is choosing the order in which you encounter possible matches, then you’re screwed. They might give them to you in descending order, in which case this algorithm produces the worst possible match. If you’re able to randomize the order in which you encounter possible matches, then you are almost completely protected from getting the worst possible outcome.
Thanks for reminding me about this. I was never able to find the problem again because it was presented to us as a Sultan offering you a reward of one jewel from his vault. He would offer you one jewel at a time and you could either choose that one, or skip it, but you had no idea what range of value the jewels had. If I recall correctly, it totally changes if you don’t know how many jewels there are. I suspect in that case you should just take the first one 🙂