Matt Cain and after-the-fact Conclusions

Now in the midst of his eighth full season, the Giants’ Matt Cain has been the subject of much sabermetric analysis over the course of his career. He has been an enigma statistically for a few reasons, the most popular of which comes from his ability to prevent hits on balls in play.

Here are the facts. Cain’s career BABIP is .263, about 30-to-40 points lower than the league median, and he has recorded a lower BABIP than the median of other regular starters during every single season since 2006*. Those facts are the source of many comments similar to these ones.

“It’s not like a guy can get lucky for eight straight years” and “You have to start believing it’s a skill by now.”

In 2005, Cain made just seven starts. His BABIP was below the starters’ median that season, but it was based on an awfully low sample size.

I’m not going to argue definitively that Matt Cain actually has zero control over his BABIP. I am not going to argue definitively that he has been lucky for eight-plus seasons. However, I am going to examine those claims as possibilities while introducing the probabilistic concept of multiple comparisons. That Wikipedia article has a classic example about testing many coins to see if any are weighted—we really like flipping coins, we statisticians—but maybe I’ll introduce the concept with a different example.

Ever heard of Yao Ming? He’s a 7-foot-6 Chinese man. The probability of any Chinese boy growing up to be 90 inches tall—even to parents who were 82 and 74 inches tall, respectively—is quite small. Are we to believe that Yao has an extraordinary skill that allowed him to grow taller? No, that’s silly. Genetics explains most all fluctuation in height. So how can we explain this anomaly?

Easy. There are 1.3 billion Chinese people.

Ok, let me explain. When we focus in on an individual, the probability of various events—like being really tall—is often quite small. But when we focus on a larger population, the probability that at least one individual from that population is extreme in some way is actually quite high. It’s like having a lot of chances to hit the jackpot. This is what multiple comparisons procedures try to take into account.

If we identified Matt Cain before he ever threw a pitch as someone who could maintain a low BABIP in the show, and then watched as he beat the median BABIP eight straight times, then we would have a case. After all, the chances that an individual is actually no better than the median, but then goes on to beat the median eight straight times, is just 1-in-256, or about 0.4%. So you’d be right to argue that he’s special…that it can’t be luck. But people didn’t start discussing Cain as a BABIP fiend until the 2009, maybe 2010, season from what I can tell from Google searches each year.

Here is a better way of saying what really happened. No one identified Cain as being a BABIP god ahead of time. Rather, a bunch of starters—about 150 with qualified innings each season, to be clear—went out and all tried to do the same thing: prevent hits. We noticed that one of them seemed to allow low BABIPs all the time. Our new question should then become, “what is the probability that at least one of those 150 pitchers beats the median eight straight years?” That’s a very different question. In fact, that probability is not 0.4%, but instead 44.4%. Wow! It’s totally within the realm of possibility that at least one of 150 guys could—despite being purely average (or median, I guess)—do what Matt Cain has done.

After the fact, many good articles have been written attempting to figure out what comprises Cain’s magic dust. Here’s one discussing his clever usage of changeups against left-handers, for instance. But the key words there are “after the fact,” and we’ve already seen how probabilities change when we find the most extreme guy from a larger group without identifying him ahead of time.

There is a still a good chance that Cain has some sort of BABIP prevention skill. I mean, I didn’t account for how far below the median he is performing, and there are good arguments to suggest he has some BABIP-prevention skills. But it’s important to realize that these traits were recognized after the fact, and identifying features that seem important after the fact always raises caution flags in observational studies.

In stats, we’d call this Matt Cain thing a case study. Case studies are helpful—and this one has put into motion a lot of good PITCHfx research–but they can’t determine causal relationships. If baseball were science, we would find a bunch of kids like Cain and a bunch of kids not like Cain. Then we would follow them for ten seasons, controlling which pitchers they face and meticulously recording BABIP data. In sports, that is impossible. So I guess all I’m trying to say is that we shouldn’t jump to conclusions and causations when extreme cases are identified after the fact.