Daily Archives: 5/30/2004

Still more baseball statistics, and a story…

Amazing what you can find when you dig. On retrosheet.org you can find box scores for lots of games. Pretty cool. For instance, I dug up the box score from an interesting game I attended.

The date was Sept 5, 2001. I was anxious to go to this Wednesday game because this would be the final appearance of Cal Ripken Jr. at the Oakland Coliseum. Cal would go 0 for 4 that day, but that’s not why I remember the game. The A’s would win 12-6, but that’s also not what I remember. Continue reading

More Baseball Statistics

Baseball-DataBank.org has statistics similar (identical?) to those from baseball1.com but has conveniently placed them in the form of a 33 megabyte file that you can import directly into MySQL. With it, you can formulate pretty simple queries in SQL and learn, for instance, the top 10 players of all time in ABs..

mysql> select sum(Batting.AB), concat(Master.nameFirst, ' ', Master.nameLast) 
from Batting, Master where Master.playerID = Batting.playerID 
group by Master.playerID order by 1 desc limit 10 ;
+-----------------+------------------------------------------------+
| sum(Batting.AB) | concat(Master.nameFirst, ' ', Master.nameLast) |
+-----------------+------------------------------------------------+
|           14053 | Pete Rose                                      |
|           12364 | Hank Aaron                                     |
|           11988 | Carl Yastrzemski                               |
|           11551 | Cal Ripken Jr.                                 |
|           11434 | Ty Cobb                                        |
|           11336 | Eddie Murray                                   |
|           11008 | Robin Yount                                    |
|           11003 | Dave Winfield                                  |
|           10972 | Stan Musial                                    |
|           10961 | Rickey Henderson                               |
+-----------------+------------------------------------------------+
10 rows in set (3.73 sec)

The Mathematics of Baseball

I just finished reading Moneyball, and as I woke up this morning I was wondering what good online information was available on the mathematics and statistics of baseball. Such are the questions that Google was invented for.

Little Professor Baseball: Mathematics and Statistics of Baseball Simulation is the first link a search on “baseball” and “mathematics” produced. It’s a nice page that talks about the basic principles of baseball simulation, and gives you the rules for a simple (or advanced) game to simulate baseball games using whatever lineups you desire. In briefly glancing over the ideas, it’s a little simplistic, but it could be kind of fun.

In chasing down links from the above page, I found that baseball1.com has a downloadable database consisting of batting and pitching statistics for 1871-2003. It is even free for research use. I downloaded it as a CSV list, but other database formats are also possible. I like CSV’s because Python has a nice module for reading and writing them.

Further poking in Google output yields a book entitled Curve Ball — Baseball, Statistics and the Role of Chance in the Game. I may have to dig around and see whether it has good reviews.

Moneyball mentions sabermetrics: an attempt to bring some actual rationality to baseball statistics. I found this brief introduction and those terrific guys at baseball1.com have a veritable goldmine of links and tools.