We begin here with a Tweet sent out by former MLB playerturnedpundit Marlon Anderson
Baseball fan tell me what you think about these numbers! #MLB #postseason #Cardinals #Dodgers #MLBPlayoffs pic.twitter.com/rwBgVyuFEG
— Marlon Anderson (@MarlonAnderson8) October 4, 2014
Clayton Kershaw had a nightmare seventh inning against the Cardinals on Friday night, in which he coughed up eight runs and the substantial lead that the Dodgers had staked him to. Kershaw’s last three playoff appearances (Friday, plus Games Two and Six of last year’s NLCS) have all been losses to the Cardinals. And yes, the numbers that the MLB Network graphic showed are true, in the sense that Kershaw really is 37 against the Cardinals over the last few years. And of course, Mr. Anderson wants you to believe that there can only be one possible explanation: The Cardinals have somehow solved Clayton Kershaw. They are in his head.
Ah, but how the numbers can deceive!
Dissecting the graphic, we see wonloss record (*sigh*), win percentage (which is just the same info as above), ERA (that’s at least reasonable), opponent average (a clue hiding in plain sight!), and strangely, innings per start (huh?).
Let’s keep the same timeframe (2011 to 2014, including the postseason) and look at some other numbers that could have gone on that screen.
Kershaw vs. Cardinals 

Kershaw vs. Everyone Else 
9.39 
K/9 innings 
9.61 
2.74 
BB/9 
1.96 
0.52 
HR/9 
0.52 
2.78 
2.46 

.343 
.280 

298 
Batters Faced 
3,303 
We see that even on these stats, Clayton Kershaw has pitched somewhat worse against the Cardinals than the rest of the league. This shouldn’t surprise anyone. It’s comparing Kershaw’s performance against a good team to his performance against the league average. In fact from 2011 to 2014, the Cardinals finished 5th, 5th, 3rd, and, uh, 24th in runs per game in all of MLB. But we see that Kershaw hasn’t fallen apart against the Cards. Just done a tiny bit worse.
There might be people out there who aren’t familiar with why I picked these stats to highlight. One of the foundations of baseball research is that if you want to get to know a pitcher, look at his strikeout, walk, and home run rates. The reason is that a pitcher who is good at striking hitters out in one year is very likely to be good at striking hitters out in the following year. Same goes for a pitcher who is good at avoiding giving up walks and home runs. What happens when you take away walks, strikeouts, and home runs? You’re left largely with singles, doubles, triples, and outs on balls in play. These are all events which depend, at least in part, on the fielders behind the pitcher. What we find is that the results of these types of events (this is called “batting average on balls in play” or BABIP) do not stay the same from year to year. In fact, the results from one year to the next are nearly random. There’s been a lot of work showing that BABIP isn’t completely and totally random, but over a few hundred batters faced, there’s a lot of luck that drives a pitcher’s performance.
Yes. Luck. Sure, the fact that the ball got through the infield for a hit still counts on the scoreboard, but the way that stats like wins and ERA (and the #NarrativeMachine) give out credit assumes that the pitcher is completely at fault (or on a ball that’s caught, completely to be credited) for what just happened. That just isn’t the case. A pitcher might give up a screaming line drive that just happens to be hit right at the shortstop.
In general, most pitchers end up with a BABIP of around .300 in each year. But because of the luck factor, some end up in the .260s and some end up in the .340s. The swings in fortune can be that big. Their ERAs either rejoice in or suffer for this luck, but that swing in ERA is not indicative of who they are as pitchers. If you want to know what a pitcher is going to do in the future, a guess based on his strikeouts, walks, and home runs allowed (there are several, the one I used here is Fielding Independent Pitching or FIP) is actually better than using his ERA at the time. It’s better because it strips out the effects of luck. And Kershaw’s FIP is only slightly worse against the Cardinals than against the rest of the league. We see from the table that Kershaw’s FIP against the Cardinals is 2.78. That means that when we account for how unlucky Kershaw has been in the past, he has actually pitched much more like a guy with a 2.78 ERA than a guy with a 4.83 ERA. Suddenly, Kershaw looks like an ace again.
How unlucky has Clayton against the Redbirds? When we look at Kershaw’s performance against the Cardinals, we see that his BABIP is quite high at .343. I know that during the postseason everyone likes to pretend that games are won and lost based on magical fairy dust, grit, and character. But frankly, a lot of what drives a baseball game is dumb luck. That’s not comfortable for people to hear, but the sooner that you accept that, the sooner we can have a real conversation about baseball. Mr. Anderson, you can take the Dodger blue pill and accept it or the Cardinal red one and this will all be a fantasy. Clayton Kershaw has gotten very unlucky over the last four years against the St. Louis Cardinals, and luck is not a character trait. Luck just kinda happens. If you made bets on a series of coin flips and won seven in a row, that would be an unlikely event (though possible). Yes, you still have the money you just won in your pocket, but it’s not because you have a special skill for calling coin flips or because you are a morally righteous person. You caught a run of good luck. Congrats. Don’t expect it to last.
Dodger fans who are worried about what might happen in Game Five (if the series gets that far) or Game Four (if Kershaw goes on short rest) against the Cardinals because St. Louis seems to have CK’s number, take a nice deep breath. Your ace does not crumble whenever he sees two red birds balanced on a bat. The Cardinals should get credit for being a good team and even, in the past, for adapting their strategy at the plate to suit the situation. But they do not “own” Clayton Kershaw. In fact, on the things that Clayton Kershaw controls, he’s actually pitched like an ace against the Cardinals over the past few years. And that’s what you should continue to expect if the Claw gets another chance to pitch against the Cards.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
I do wonder, though, how much we'll ever be able to change the mind of mainstream fans and many pundits.
Human beings like to see meaning in events (I know psychologists have a word for this). And human being like drama and narratives. And some pundits are being paid to make things seem excitingthis is entertainment.
The gritty Redbirds rising to the occasion once again, or the pitcher who lacks guts for the postseason, or the wily Cardinals stealing signs, all make for better copy and a more gripping narrative, than "And then once again Clayton Kershaw was a victim of the vagaries of BABIP. Tune in to game 5 and watch to see if random chance will allow the Cardinals to once again defeat Kershaw. Only on Fox Sports 1!"
It seems to me that there is a fundamental tension between those who want baseball to be about entertaining stories, drama, and character, and those who want to scientifically analyze and understand it. It doesn't have to be a dichotomy for everyone, but for some people, it probably always will be..
I suspect that a .340 BABIP for a pitcher is roughly a 2sigma deviation from the norm over the number of atbats the Cardinals have had against Kershaw, which puts it just on the fringe of that "maybe, maybe not" territory, but I don't know that for sure. Do you know, off the top of your head, whether the scatter around the average BABIP does look Gaussian, and if so, how many sigma above the average this "event" is? Just what IS a standard deviation for BABIP over 300 batters faced?
I had misread the plate appearances in Russell's table above to be "plate appearances leading to balls in play." It isn't; that number is much less. The actual number of Cardinals hitters who put balls in play against him from 2011 through 2014, ignoring sacrifices and SFs (not sure how they factor into all this), was 197 if I count correctly. (There may be an error of 2 or 3 in this, for reasons I do not understand.) A BABIP of .343 means they got 67 or 68 nonhomerun hits in this time  again, I can't explain why this isn't an integer. Over that same time, CK had an overall BABIP of about .280 (a third mystery: I can't make this number agree exactly with Russell's, but to adequate precision, it doesn't matter). He would therefore have been "expected" to allow roughly 55 nonHR hits to the Cardinals. Assuming normal distributions, the excess of Cardinals hits works out to a shade under 2sigma  at least as compatible with luck as with anything real, and a bit less than I'd been hypothesizing previously.
The basic question remains, though: just what is the distribution of observed BABIPs around the "average" value? There are reasons to believe it isn't strictly Gaussian  some factors would broaden it (e.g. wind will affect the likelihood of balls falling uncaught, and wind velocities in a park certainly aren't Gaussian) and some would actually narrow it (e.g. the fact that certain important classes of batted balls, for example foul popups, have zero chance of contributing positively to batting average under any conditions). So Russell, I'm still interested in hearing your analysis of just what BABIP really does look like. It's relevant.
In statistical terms, we're start with var (obs) = var (true) + var (error). DIPS assumes that the variance between pitchers is much less than error variance (the statistical way of saying "it's mostly luck")
I suppose the twin questions are "What does the distribution of true talent look like?" (DIPS, the way that it's practiced basically says mean = .300, SD = 0) and "what does the distribution of error look like?"
We do know that for your typical starting pitcher sample size for one season (call it 700800 batters faced? Maybe 500 balls in play, give or take), no matter how you do it, the reliability estimate (whether that's yeartoyear, intraclass, split half, KR21) gets up to, at best, .2 or so
Using .260 and .340 as admitted guesses as to your basic +/ 2 SD range, around observed BABIPs, and assuming that var (true) and var (error) aren't correlated, it means that we can regress backward toward .300 and suggest that true talents have a spread between .292 and .308. (Even if we fiddle with those numbers a bit, we're going to get the same basic results.) We know that the noise is deafening.
Now, that's (as you point out) all nice and neat and Gaussian, and we know that there's research that suggests that the error term contains both random error and measurement error (need to treat GB/FB pitchers differently, knuckleballers are weird, park factors matter, etc.) There may even be an error covariance and local variations in there that the simple Gaussian model doesn't account for. However, the research all points out that while we can account for some of that var (error) with these measurement error/bias issues, the amount that we can account for is still relatively minor.
All that said, I get the desire to account for every last scrap of variance that we can, but I don't think that you escape the conclusion that even if it's not exactly right, the mnemonic "everyone will regress toward .300" is going to be right much more often than it is wrong.
The point at which random variation is equal to the true talent spread (i.e., r=.50) is at around 3000  4000 BIP.