by Theo Mackie
For nearly as long as baseball has existed, ERA has been the king of pitching statistics among the vast majority of fans. But, as most baseball fans know, Fielding Independent Pitching (FIP) has risen to the forefront of the discussion around pitching statistics in recent years. FIP attempts to remove defense and luck from a pitcher’s ERA, as it is based solely off of the “three true outcomes”: BB (and HBP), K, and HR, thus removing batting average on balls in play (BABIP) and sequencing. xFIP, developed by Hardball Times earlier this decade, expands on FIP by setting every pitcher’s home run to fly ball ratio to league average, as this is not consistent season to season. Due to these adjustments, xFIP is the most accurate “common” statistic in predicting future performance (SIERA is very slightly more accurate but much more complicated and rarely used).
But while xFIP is undeniably a fantastic statistic, I noticed something strange on one of my many late nights ogling at Clayton Kershaw’s Fangraphs’ page: he has outperformed his xFIP by over half a run in his career. That’s a sample size of 1932 innings and a massive discrepancy—exactly 12 runs per 200 innings, enough to make multiple games’ difference for the Dodgers. And while Kershaw is still the second best pitcher by xFIP since 2002 (as far back as xFIP goes), that’s a far cry from his absurd .58 run lead in ERA. But if Kershaw, a transcendent pitcher who has a Cooperstown plaque locked up before his 30th birthday, can outperform his xFIP so consistently—he’s done so in eight of his last nine seasons—it has to be a skill, right?
To find the answer to this question, I looked at all 304 starting pitchers who have thrown at least 500 innings since 2002. Among this group, xFIP-ERA (a.k.a. how much a pitcher outperformed his xFIP) ranged from .95 (Hector Santiago) to -1.18 (Luke Hochevar), with a standard deviation of .36. Using a mean of 0 (the actual mean was -.02 but xFIP is set to ERA so it would be 0 if we analyzed every pitcher) and assuming a normal curve—meaning pitchers can’t control xFIP-ERA—the likelihood of an outlier as far out as Hochevar’s occurring over a sample of 304 is approximately 15%. This may be unlikely, but it is absolutely possible and not nearly as unlikely as I expected underperforming one’s xFIP by more than a run over 128 starts to be. After Hochevar, the next biggest outlier, Jordan Lyles at -1.06, has a 39% chance of occurring.
None of this means that outperforming xFIP is luck, but it means that there’s a chance its luck. Furthermore, using xFIP-ERA among pitchers who qualified for the ERA title in each of the last two seasons, there was no correlation between seasons.
Despite this, there might be some way to explain those pitchers who consistently over or underperform their xFIP. To find this out, we have to find whether there are any consistent season-to-season stats that correlate strongly with xFIP-ERA. By comparing this difference to dozens of stats pulled from Fangraphs’ dashboard, batted ball, pitch value, and plate discipline info, I was only able to find a correlation for three statistics: LOB%, BABIP, and HR/FB—the exact three statistics that xFIP wants to ignore.
While the statistical basis behind xFIP is extremely solid, I just wanted to make sure that none of these three are consistent season-to-season using my ERA-qualified pitchers. Sure enough, no correlation. So the next thing to do was to see if any of the stats I tested against xFIP-ERA had a correlation with any of these stats. Among these hundreds of tests, there was one pair of statistics that had a legitimate correlation: LOB% and strikeout rate, with a .346 r-value.
But while this is a somewhat strong correlation, and one that definitely makes sense, it has two degrees of separation from xFIP-ERA, and the correlation between K/9 and xFIP-ERA is basically non-existent at an r-value of .003. So while I concede that there might be some extremely minor added benefit to having a high strikeout rate that xFIP doesn’t account for, it doesn’t explain Luke Hochevar or Jordan Lyles.
At this point, I’ve given up on finding whether a pitcher can consistently outperform his xFIP and want to find whether there are any external factors that can help predict when a pitcher is going to outperform his xFIP. To find this, I analyzed park factors and defense, which, if strongly correlated, could help explain Kershaw, who has played in front of a solid defense in a pitcher-friendly park for most of his career. For park factors, I tested the home xFIP-ERA of starting pitchers with at least 60 innings at home in 2017 against their park factors, as found on teamrankings.com. Nothing.
For defense, I tested the xFIP-ERA of starting pitchers with at least 120 innings in 2017 against their teams’ UZR.
Not nothing, but nothing significant, and even UZR vs. straight ERA only has a correlation of .07.
This final chart is the most intriguing. The entire point of FIP is to be independent of fielding so logically, ERA would be lowered by good fielding, causing xFIP-ERA to be greater. But instead, there’s almost no correlation. This is backed up by testing UZR against BABIP, which again shows no effect. Good fielding and having a high strikeout rate may help a pitcher outperform his xFIP ever so slightly, but ultimately there is no quantifiable way to explain these differences. They are likely equal-parts the quality of batter a pitcher faces and luck, with maybe a dash of some unquantifiable mental toughness that makes a once-in-a-generation pitcher like Kershaw consistently lock down hitters with runners on base, even if LOB% is luck for most pitchers.