Well, it looks like baseball-reference.com has all the data I could ever conceivably want. For each player, they have "Hit Trajectory" data -- dividing their balls in play into ground balls, fly balls, and line drives. But that's the extent of the data. The rest is going to be all theoretical.
Just like with hits and walks, which were easy, I'll have to decide how many bases runners occupying given bases will advance on each type of out.
For example, let's say that with men on 1st and 3rd, a fly ball out will advance the runner from 3rd 60% of the time and the runner from 1st 5% of the time. In that case a fly ball out with men on 1st and 3rd is worth 0.65 bases advanced.
I'm going to find the number of bases advanced for each type of out, so that I can tack on an additional number to my original formula that goes like:
Advancement on Outs
= (Average Bases Advanced on Fly Ball Out) x (Fly Ball Outs)
+ (Average Bases Advanced on Ground Ball Out) x (Ground Ball Outs)
+ (Average Bases Advanced on Line Drive Out) x (Line Drive Outs)
where I use baseball theory to determine the coefficients and baseball-reference's data to get the actual number of those kinds of outs.
The next bunch of posts will be working out that theory.
Saturday, September 12, 2009
Friday, September 11, 2009
What's Left?
Now that I finally got a satisfactory data set to help me figure out the coefficients, there are only a few minor issues to resolve...and an immensely challenging next step.
Baserunning: Somewhere along the line, I've had to invoke probabilities for a runner advancing 1st to 3rd on a single, 2nd to home on a single, and 1st to home on a double (situations where the runner may advance more bases than the batter).
Up to this point, I've pulled those numbers directly out of my ass. I believe in the latest version I assumed runners go 1st to 3rd 30% of the time; 2nd to home 50% of the time; and 1st to home 40% of the time.
What I'd like to find is league-wide data on how often a runner on first advances to 3rd on a single (as opposed to just 2nd), etc. But I have no idea where or how that information would be kept.
That would provide a minor improvement to Bases Advanced, but would be nowhere near as significant (and complicated!) as the following.
Productive Outs: Yeah, that's right. I want to figure out a way to calculate how many bases "random" teammates would advance on a player's non-strikeouts. How many bases is the average out worth?
Not all outs are created equal. A long flyout is much more likely to advance a hypothetical teammate than a ground ball to short (which, at the wrong times, could be turned into a double play). And, presumably, some players have penchants for making certain kinds of outs over others. For example, David Ortiz tends to fly out to medium-depth center field, while Jacoby Ellsbury pounds the ball into the ground right at the second baseman.
Is there any conceivable way to create a model, possibly based on ground ball/fly ball/line drive percentages on outs, to calculate how many bases theoretical teammates advance on a particular hitter's outs, given those out-making idiosyncrasies.
For example, I could assume a runner on 3rd advances on 30% of fly balls, 50% of ground balls, and 10% of line drive outs. (Like the baserunning numbers above, I'd really like to find a better way to pin down these figures than "out of my ass".) Then I could calculate how many fly ball outs, ground ball outs, and line drive outs have been made by that hitter, and multiply by the probability that a runner will be on 3rd in a random plate appearance, as before. And so on.
Christ this is going to be hard.
Baserunning: Somewhere along the line, I've had to invoke probabilities for a runner advancing 1st to 3rd on a single, 2nd to home on a single, and 1st to home on a double (situations where the runner may advance more bases than the batter).
Up to this point, I've pulled those numbers directly out of my ass. I believe in the latest version I assumed runners go 1st to 3rd 30% of the time; 2nd to home 50% of the time; and 1st to home 40% of the time.
What I'd like to find is league-wide data on how often a runner on first advances to 3rd on a single (as opposed to just 2nd), etc. But I have no idea where or how that information would be kept.
That would provide a minor improvement to Bases Advanced, but would be nowhere near as significant (and complicated!) as the following.
Productive Outs: Yeah, that's right. I want to figure out a way to calculate how many bases "random" teammates would advance on a player's non-strikeouts. How many bases is the average out worth?
Not all outs are created equal. A long flyout is much more likely to advance a hypothetical teammate than a ground ball to short (which, at the wrong times, could be turned into a double play). And, presumably, some players have penchants for making certain kinds of outs over others. For example, David Ortiz tends to fly out to medium-depth center field, while Jacoby Ellsbury pounds the ball into the ground right at the second baseman.
Is there any conceivable way to create a model, possibly based on ground ball/fly ball/line drive percentages on outs, to calculate how many bases theoretical teammates advance on a particular hitter's outs, given those out-making idiosyncrasies.
For example, I could assume a runner on 3rd advances on 30% of fly balls, 50% of ground balls, and 10% of line drive outs. (Like the baserunning numbers above, I'd really like to find a better way to pin down these figures than "out of my ass".) Then I could calculate how many fly ball outs, ground ball outs, and line drive outs have been made by that hitter, and multiply by the probability that a runner will be on 3rd in a random plate appearance, as before. And so on.
Christ this is going to be hard.
New Numbers!
Wow did I stumble upon a gold-mine of data. Baseball-Reference apparently keeps track of the league-wide numbers I've been searching for.
So far, in 2009, there have been 161474 plate appearances across Major League Baseball.
89044 (55.1%) have come with the bases empty
29045 (18.0%) have come with a man on 1st
14155 (8.8%) have come with a man on 2nd
4798 (3.0%) have come with a man on 3rd
11642 (7.2%) have come with men on 1st and 2nd
4851 (3.0%) have come with men on 1st and 3rd
3557 (2.2%) have come with men on 2nd and 3rd
4382 (2.7%) have come with the bases juiced
These numbers are a little different from the sketchy ones I used a few months ago. The coefficients for Bases Advanced will be adjusted slightly, and frankly, since I don't know how to account for productive outs, I'm taking out the parts with sacrifices and sac flies (since those are only positive since there happen to be men on base).
Bases Advanced
= 1.44*(BB + HBP) + 1.82*(1B) + 3.29*(2B) + 4.45*(3B) + 5.45*(HR) + SB
So far, in 2009, there have been 161474 plate appearances across Major League Baseball.
89044 (55.1%) have come with the bases empty
29045 (18.0%) have come with a man on 1st
14155 (8.8%) have come with a man on 2nd
4798 (3.0%) have come with a man on 3rd
11642 (7.2%) have come with men on 1st and 2nd
4851 (3.0%) have come with men on 1st and 3rd
3557 (2.2%) have come with men on 2nd and 3rd
4382 (2.7%) have come with the bases juiced
These numbers are a little different from the sketchy ones I used a few months ago. The coefficients for Bases Advanced will be adjusted slightly, and frankly, since I don't know how to account for productive outs, I'm taking out the parts with sacrifices and sac flies (since those are only positive since there happen to be men on base).
Bases Advanced
= 1.44*(BB + HBP) + 1.82*(1B) + 3.29*(2B) + 4.45*(3B) + 5.45*(HR) + SB
Subscribe to:
Comments (Atom)