I was standing in Tom’s office, and asked him a simple probability question (and a timely one, given the World Series):
If the odds of a particular team winning against their opponent is some probability p, what are the odds that they will win a 7 game series?
If you had any probability at all, you probably have solved this problem before. I know I have: I blogged about it before.. But I was trying to remember the equations: I knew they had something to do with the binomial coefficients, but their exact form seemed problematic.
Tom pointed out that you could just write it down by making a table. We are looking for the best 4 out of 7, so let’s make a 5×5 array. For each cell_{ij}, we are going to compute the odds that the series will reach team A having I wins and team B having wins. Let’s begin by writing down the table:
Let’s say that team A wins with probability p, and team B wins with probaility q (note that q = 1 – p). Each time team A wins, let’s move one step to the right… Let’s imagine that team A wins in a clean sweep, we can fill in those numbers quite easily:
Similarly, we can compute the odds that team B has a clean sweep by marching down the first column, multiplying each previous entry by q.
1 
p 
p^{2} 
p^{3} 
p^{4} 
q 




q^{2} 




q^{3} 




q^{4} 




But now, what are the odds that we reach (say) a 11 even series. Well, there are two ways of reaching 11, either team A wins then team B, or team B wins then team A. If the games are independent, then the odds are the same regardless of the path taken, and there are two paths. So, we can fill 11 in as 2 p q …
1 
p 
p^{2} 
p^{3} 
p^{4} 
q 
2 p q 



q^{2} 




q^{3} 




q^{4} 




We can continue this, being careful to keep track of the number of paths, and it will begin to look like Pascal’s triangle…
1 
p 
p^{2} 
p^{3} 
p^{4} 
q 
2 p q 
3 p^{2} q 
4 p^{3} q 

q^{2} 
3 p q^{2} 
6 p^{2} q^{2} 


q^{3} 
4 p q^{3} 



q^{4} 




If you are familiar with Pascal’s triangle, or with binomial coefficients, those numbers are beginning to look pretty familiar to you. But you have to be a little careful in filling in the remaining squares: for instance, it’s perfectly possible for the series to end 41, but there are no paths that end up there that pass through 40 (when the first team wins 4 games, the series is over). So you don’t continue to add in the same way. for these remaining squares. And, of course the series won’t end 44, so the final square will remain unfilled.
1 
p 
p^{2} 
p^{3} 
p^{4} 
q 
2 p q 
3 p^{2} q 
4 p^{3} q 
4 p^{4}q 
q^{2} 
3 p q^{2} 
6 p^{2} q^{2} 
10 p^{3} q^{2} 
10 p^{4} q^{2} 
q^{3} 
4 p q^{3} 
10 p^{2} q^{3} 
20 p^{3} q^{3} 
20 p^{4} q^{3} 
q^{4} 
4 p q^{4} 
10 p^{2} q^{4} 
20 p^{3} q^{4} 

So, what are the odds that team A wins? We can just sum up the probabilities in the final column. That team B wins? We can sum up the final row (and the combination better add up to one).
Let’s say that p = 0.5. That means p = q, and we’d expect the odds to be the same. Just staring at the equations, substituting p for q should convince you that the probabilities are the same. If we’ve filled in our formulas correctly, we’d expect the overall probability to be 0.5 as well. It’s far from obvious to me, staring at those equations that it’s true, but if you go ahead and plug in the numbers, you’ll find that it does indeed work out.
So, let’s say that in a (for examples) Rangers vs. St. Louis matchup, the Rangers would win 60% of their games. What are the odds that the Red Birds win the Series? About 26.4% of the time. But if the Series is all tied up 33, then of course their probability is 40%.
Of course all this doesn’t take into account the difference in home field advantage, or pitching matchups, or any of a million other factors, so it’s relation to any games being played tonight is strictly theoretical.