Saturday, September 12, 2009

Productive Outs Theory

Well, it looks like baseball-reference.com has all the data I could ever conceivably want. For each player, they have "Hit Trajectory" data -- dividing their balls in play into ground balls, fly balls, and line drives. But that's the extent of the data. The rest is going to be all theoretical.

Just like with hits and walks, which were easy, I'll have to decide how many bases runners occupying given bases will advance on each type of out.

For example, let's say that with men on 1st and 3rd, a fly ball out will advance the runner from 3rd 60% of the time and the runner from 1st 5% of the time. In that case a fly ball out with men on 1st and 3rd is worth 0.65 bases advanced.

I'm going to find the number of bases advanced for each type of out, so that I can tack on an additional number to my original formula that goes like:

Advancement on Outs

= (Average Bases Advanced on Fly Ball Out)
x (Fly Ball Outs)
+ (Average Bases Advanced on Ground Ball Out) x (Ground Ball Outs)
+ (Average Bases Advanced on Line Drive Out) x (Line Drive Outs)

where I use baseball theory to determine the coefficients and baseball-reference's data to get the actual number of those kinds of outs.

The next bunch of posts will be working out that theory.

Friday, September 11, 2009

What's Left?

Now that I finally got a satisfactory data set to help me figure out the coefficients, there are only a few minor issues to resolve...and an immensely challenging next step.

Baserunning: Somewhere along the line, I've had to invoke probabilities for a runner advancing 1st to 3rd on a single, 2nd to home on a single, and 1st to home on a double (situations where the runner may advance more bases than the batter).

Up to this point, I've pulled those numbers directly out of my ass. I believe in the latest version I assumed runners go 1st to 3rd 30% of the time; 2nd to home 50% of the time; and 1st to home 40% of the time.

What I'd like to find is league-wide data on how often a runner on first advances to 3rd on a single (as opposed to just 2nd), etc. But I have no idea where or how that information would be kept.

That would provide a minor improvement to Bases Advanced, but would be nowhere near as significant (and complicated!) as the following.

Productive Outs: Yeah, that's right. I want to figure out a way to calculate how many bases "random" teammates would advance on a player's non-strikeouts. How many bases is the average out worth?

Not all outs are created equal. A long flyout is much more likely to advance a hypothetical teammate than a ground ball to short (which, at the wrong times, could be turned into a double play). And, presumably, some players have penchants for making certain kinds of outs over others. For example, David Ortiz tends to fly out to medium-depth center field, while Jacoby Ellsbury pounds the ball into the ground right at the second baseman.

Is there any conceivable way to create a model, possibly based on ground ball/fly ball/line drive percentages on outs, to calculate how many bases theoretical teammates advance on a particular hitter's outs, given those out-making idiosyncrasies.

For example, I could assume a runner on 3rd advances on 30% of fly balls, 50% of ground balls, and 10% of line drive outs. (Like the baserunning numbers above, I'd really like to find a better way to pin down these figures than "out of my ass".) Then I could calculate how many fly ball outs, ground ball outs, and line drive outs have been made by that hitter, and multiply by the probability that a runner will be on 3rd in a random plate appearance, as before. And so on.

Christ this is going to be hard.

New Numbers!

Wow did I stumble upon a gold-mine of data. Baseball-Reference apparently keeps track of the league-wide numbers I've been searching for.

So far, in 2009, there have been 161474 plate appearances across Major League Baseball.

89044 (55.1%) have come with the bases empty
29045 (18.0%) have come with a man on 1st
14155 (8.8%) have come with a man on 2nd
4798 (3.0%) have come with a man on 3rd
11642 (7.2%) have come with men on 1st and 2nd
4851 (3.0%) have come with men on 1st and 3rd
3557 (2.2%) have come with men on 2nd and 3rd
4382 (2.7%) have come with the bases juiced

These numbers are a little different from the sketchy ones I used a few months ago. The coefficients for Bases Advanced will be adjusted slightly, and frankly, since I don't know how to account for productive outs, I'm taking out the parts with sacrifices and sac flies (since those are only positive since there happen to be men on base).

Bases Advanced

= 1.44*(BB + HBP) + 1.82*(1B) + 3.29*(2B) + 4.45*(3B) + 5.45*(HR) + SB

Friday, June 5, 2009

Bases Advanced vs. OPS

Well, technically, Bases Advanced per Plate Appearance (BAdPA) vs. OPS. These numbers are based on stats taken midway through the 2008 season.

Player BAdPA OPS
Kevin Youkilis 0.901 0.923
Matt Holliday 0.998 0.999
Jack Cust 0.808 0.805
Adrian Gonzalez 0.852 0.864
Vlad Guerrero 0.816 0.827
Jose Reyes 0.916 0.867
Marcus Thames 0.892 0.894
Rich Aurilia 0.738 0.756
Adam Dunn 0.943 0.952

Formatting WIN. Anyway, let's take a look. In general, BAdPA is a little lower than OPS, except for Reyes. But that's because he steals a ton. The only one whose BAdPA was higher than their OPS was Three True Outcomes posterboy Jack Cust. The biggest over-inflated ones on the other end were Youkilis and Aurilia. Aurilia is essentially a pure singles hitter, but Youk was in the middle of a season that saw a surge in his power numbers. So that one's a little weird.

I might run this again with the full season's 2008 stats and some different players. If anybody has ideas for seasons they want to see, let me know!


























The Rest of the Numbers

Sorry I left you guys hanging there. I'm sure the suspense was killing you.

Anyway, applying that same technique to advancement on BB/HBP, 1B, 3B, and HR, I determined that each of those batting events are worth the following number of bases:

BB/HBP - 1.40
1B - 1.78
2B - 3.25
3B - 4.38
HR - 5.38

So in summary, a player's Bases Advanced is equal to

1.40*(BB+HBP) + 1.78*(1B) + 3.25*(2B) + 4.38*(3B) + 5.38*(HR) + SB + S + SF

where SB is stolen bases (advancing yourself a base), S is sacrifices (advancing your teammate a base), and SF is sac flies (advancing your teammate a base). If I really wanted to make this hardcore, I could try to figure out how many bases teammates advance on the average out, specifically non-strikeouts. But what I have there is just a first treatment to present the concept.

A useful stat could be either Bases Advanced per Out, or Bases Advanced per Plate Appearance (BAdPA). BAdPA would be a decent stat to compare to OPS. I'll do a little comparison here, and then compare some actual player's numbers later.

To first order, OPS treats a walk as being 1 base, a single as 2, a double as 3, a triple as 4 and a homer as 5. So OPS appears to undervalue walks and extra-base hits compared to BAdPA, while overvaluing singles. I'd expect that BAdPA would be higher compared to OPS for Three True Outcomes hitters, while lower vs. OPS for contact hitters.

Tuesday, May 12, 2009

What Are the Odds?

Now that we know how many bases a double is worth for each arrangement, let's figure out the average value of a double. In math terms, the equation looks like this, where B is the average number of bases hypothetical teammates advance on a double:

B = 0* (% nobody on) + 2.5*(% 1st) + 2*(% 2nd) + 1*(% 3rd) + 4.5*(% 1st and 2nd) + 3.5*(% 1st and 3rd) + 3*(% 2nd and 3rd) + 5.5*(% loaded)

The coefficients are just the number of bases teammates advance from each arrangement on a double.

We just need to find a way to figure out what those percentages are. Fortunately, ESPN has some ridiculous stat-keepers who record how many plate appearances a hitter gets with each baserunning arrangement. Unfortunately, they only do it on a player-by-player basis, nothing league-wide. (And league-wide is what I'm looking for.)

So...I just have to take a good cross-section of hitters across different leagues, at different spots in their batting order, and on both good and crappy-hitting teams. I chose to honor the following 6 hitters:

Freddy Sanchez (top/NL/crappy)
Luke Scott (middle/AL/decent)
Ryan Howard (middle/NL/good)
Orlando Cabrera (top/AL/decent)
Mark Ellis (bottom/AL/crappy)
Jason Kendall (bottom/NL/good)

Using their summed 2008 stats, I got the following percentages for coming to bat with the following arrangements:

Nobody on: 56.6%
1st: 17.5%
2nd: 9.7%
3rd: 2.9%
1st and 2nd: 6.4%
1st and 3rd: 2.3%
2nd and 3rd: 2.0%
Loaded: 2.6%

Now we can calculate that sum from above. The total number of bases that the average double is worth is... 3.25 bases. This is actually a little higher than what OPS tells us it's worth, which is about 3 bases (1 for OBP, 2 for SLG).

So that's how I calculated the number of Bases Advanced a double is worth. Later I'll post how many Bases each of the other plate outcomes is worth and show you all my formula for Bases Advanced.

Feedback is welcome! So if there's something you think I fucked up or overlooked, lemme know.

How Valuable Is a Double?

(...Or a walk, hit-by-pitch, single, triple, home run, stolen base, sacrifice, or non-strikeout?)

As I claimed yesterday, the value of a plate appearance is assessed by the number of bases a hitter moves himself and his teammates. Well, it's pretty easy to tell the number of bases a batter moves himself with any one of those actions. The hard part is figuring out how far teammates advance in an average plate appearance through each of those actions.

While watching a game, it's pretty easy to tell. When Mark Teixeira doubles home Derek Jeter from 3rd and Johnny Damon from 2nd, he's created 5 Bases Advanced (2 for himself, 1 for Jeter, and 2 for Damon). But we want to know how much a generic double is worth. In order to do that, we need to invoke some basic baseball knowledge and a little math.

The basic idea is this: there are 8 possible baserunning arrangements for a batter when he comes to the plate (nobody on, man on 1st, man on 2nd, man on 3rd, men on 1st and 2nd, 1st and 3rd, 2nd and 3rd, and the bases loaded), and we can predict how many total bases all runners will advance from each arrangement for a particular outcome at the plate. What we need to do is determine the probability that a hitter will come up in each of those situations, and then find the average number of Bases Advanced.

Take a double, for example. Obviously, the batter advances himself 2 bases. And the runners?
Nobody on: Uhh, there's nobody on. 0 Bases.
Man on 1st: Advances to at least 3rd base, scores 50% of the time. 2.5 Bases.
Man on 2nd: Scores. 2 Bases.
Man on 3rd: Scores. 1 Base.
1st and 2nd: Guy on 2nd scores, guy on first scores half the time. 4.5 Bases.
1st and 3rd: Guy on 3rd scores, guy on first scores half the time. 3.5 Bases.
2nd and 3rd: They both score. 3 bases.
Loaded: Guys on 2nd and 3rd score, guy on first scores half the time. 5.5 Bases.

But wait. We're not done. Now we need to know the odds that a batter comes up with each of those arrangements.