Tipping Pitches: Sports: Spreadsheet Madness


Monday, February 22, 2010

Sports: Spreadsheet Madness

email to friend edit
My nights and weekends have been filled with pounding away at Excel. Magic is happening. Determined to provide some clarity to who the greatest 300 offensive baseball players were of all time, why, and in what order.

Why stop at 300? Good question. I'm collecting the data of close to 1,000, so there's really no reason to stop there. Could easily expand this to 500, but I may be getting ahead of myself. An awful lot of thought and analysis has to go into each selection.

Originally, I was going to divulge the number one offensive player first. Then I collected the data of 32 greats in an effort to find the best of the bunch. Suddenly, I realized that I couldn't stop with 32 to determine the best. I needed a much bigger sample size.

That doesn't mean that I don't have enough data to determine the greatest. I know who it is. However, I've been as meticulous as I've ever been through this process to make sure that every step is as accurate as possible. Just wrong to get sloppy with the crowning of the best ever.

For example, I could have set a baseline for the average player to compare all players to. Could have been rather similar from year to year. Could have been based on theory. Could have cut corners.

Didn't happen. The average player is variable from year to year. It is variable based on number of teams, number of teams with the DH, number of games played, and average statistics. Significant calculation goes into even that baseline determination.

And we could have used the same number of plate appearances each season for that average player, or we could have based it on the total number of games played that season. But I didn't. I realize that several factors go into the typical number of plate appearances. Different years will call for different strategies when it comes to pinch hitting, use of the farm system, impact of injuries, on and on. So I lined up the X (where X equals the total number of starting positions in a season) most plate appearances and averaged them.

Luckily, I even made that calculation for all 135 seasons. While somewhat consistent, there was variation. The minimum average plate appearances per game was 3.14 while the maximum was 4.19. That variation of more than one per game can result in more than 100 plate appearances for the season -- which can significantly throw off our comparison point.

I also realized that I don't want to make this study anti-climactic. If I reveal number one right away, suddenly you will lose interest. Hey, even I may lose interest. So I am going to start from the back and work my way to the front -- at least for the purpose of revealing the results.

But to be honest, I don't yet know the results. I am still loading career stats into my handy spreadsheet. That spreadsheet is becoming so large that it is turning into three and four working spreadsheets. I just pasted in Tito Francona. Yeah, I'm going deep.

I paste in Tito's stats, and the work is done for me. It's good stuff. But there's still a lot of pasting to do.

So I think I'd be doing this project a disservice by starting from the beginning, particularly before even completing the research. I am going to start from the bottom, and the results will hopefully reinforce my rankings for the top.

It may be a week or two until the first results begin trickling in, but they're coming. And I may be close to invisible on Twitter (and to my wife) in the meantime.

Until then... the 2010 baseball season is coming!


Post a Comment