Barrels, Normative Analysis, and the Beauty of Statcast

No comments

by Billy Stampfl

Introduction

Statcast—MLB’s player-tracking, ball-tracking, everything-tracking tool—has improved in accuracy and volume each year since its inception. The data it provides is uniquely valuable. Thus, we need to ask an important question: How can we put this data to good use?

My purpose in writing this article is to create a set of statistics that measures how well a player should have performed based on Statcast data. I accomplished this with the creation of three new measurements: eSLG, eISO, and eHR/G. We’ll go into these terms in-depth later, but for now, it’s important to know what my original intent was.

Each year, it happens that players who performed brilliantly the season before underachieve the next year. Then there’s another set of players who post career-high numbers just a summer after struggling through statistically-depressing seasons. Regression, be it positive or negative, is a staple of Major League Baseball. So how can we predict which players are most likely to succumb to regression? The answer lies in Statcast. Using Statcast data, I developed expected results for 407 eligible batters from 2015 and 2016. This is where eSLG, eISO, and eHR/G come from.

 

Basic Process

I examined Statcast results for batters with at least 150 batted ball events (balls put in play). I combined sabermetrics and Statcast data in a spreadsheet of 407 hitters from the 2015 and 2016 seasons, then mixed and matched different variables in order to evaluate positive or negative correlations. I wanted to see which Statcast variables correlated highest with basic and advanced statistics; then, I could start with normative analysis and expected output. I used R to perform linear regressions and other modelings like scatterplots with least-squares lines to show trends. Some of the most interesting discoveries came from Barrels, which was recently unveiled by Major League Baseball.

 

The New ‘Barrels’ Statistic

MLB’s newest Statcast treasure is called Barrels. It measures a player’s ability to put the barrel of the bat on the ball and generate good contact. Per MLB.com, “A barrel is defined as a well-struck ball where the combination of exit velocity and launch angle generally leads to a minimum .500 batting average and 1.500 slugging percentage.”

b434104416

 

The “barrel zone” is shown in the graphic above; it starts at an exit velocity of 98 mph with a launch angle between 26 and 30 degrees and then extends outwards.

 

Mixing and Matching: Statcast and Sabermetrics

As some preliminary research, I ran linear regression analyses on Statcast and advanced analytics variables, as displayed in Table 1 below. Their R-squared values—which show correlation, with a higher value meaning the two variables are more closely associated—are listed.

Variable 1 (Statcast) Variable 2 (Fangraphs) Correlation (R-squared)
Barrels/PA wRC 0.4034
Barrels/PA SLG 0.5900
Barrels/PA BA 0.0021
Barrels/PA wOBA 0.3970
Barrels/PA HR/G 0.7513
Barrels/PA ISO 0.7647
Avg Exit Velocity wRC 0.3173
Avg Exit Velocity wOBA 0.3336
Avg Exit Velocity SLG 0.3953
Avg Distance wRC 0.2440
Avg Distance wOBA 0.2698

Table 1

 

Barrels: Relationships with Other Statistics

The first thing we can note is that Barrels Per Plate Appearance, known henceforth as B/PA, has high correlations with three statistics: Isolated Power (ISO), Home Runs Per Game (HR/G), and Slugging Percentage (SLG). Graph 1 shows the B/PA-SLG relationship.

ed7c754483

Graph 2 shows the B/PA-ISO relationship.

8e331c507e

Graph 3 shows the B/PA-HR/G relationship.

a7f75d05a7

Slugging Percentage represents the total number of bases that a player records per at-bat. It attempts to correct the flaws that come with Batting Average—that not all hits are created equal. Thus, when calculating SLG, extra weight is given to doubles, triples, and home runs. ISO does something similar but ultimately subtracts batting average from slugging average. For homers, I had to use HR/G rather than HR to account for the fact that players who played more games would dominate the home run projections simply because they had more opportunities. Measuring on a per game basis averages out the totals and highlights which players hit homers at higher rates.

So why do ISO, SLG, and HR/G have stronger positive relationships with Barrels in comparison to other stats? Well, how do those three measurements differ from other statistics like On-Base Percentage (OBP) and Weighted On-Base Average (wOBA), for example? Essentially, ISO, SLG, and HR don’t deal with walks and hit-by-pitches—they rely on the ball being hit. Barrels can only occur when the ball is hit in play. Parts of OBP and wOBA—a more advanced stat that estimates the value of each walk, hit, or hit-by-pitch and then churns out a value—trust heavily on walks and hit-by-pitches, which clouds the correlations between B/PA and these statistics. (For those who might not fully understand wOBA, it’s helpful to think of SLG as a less sophisticated hits-only version of wOBA.)

It’s only logical that hitting more balls on the barrel of the bat will lead to more hard-hit balls, which will result in more hits, a higher slugging average, and more isolated power and home runs.

 

Locating Luck

I wanted to see which players in 2015 got “unlucky,” meaning they hit a high percentage of balls on the barrel of the bat and at a good launch angle, but weren’t rewarded with high slugging percentages, high isolated power numbers, or an appropriate amount of home runs. In the next sections, I’ll run through how we can establish who was “lucky” and who was not. Using linear regression models, I found the equation of the least-squares regression line for each relationship (and each scatterplot) from above. Using these equations, I then determined what every qualified player should have recorded in 2015 for each statistic being measured. I named this statistic by putting an “e” in front of the y-variable stat. For example, the expected Slugging Percentage (eSLG) for Jon Jay in 2015 was 0.365. His actual slugging percentage (aSLG) was 0.257. I’ll go into more detail for each of the three statistics below.

 

Finding eSLG

To find expected slugging percentage (eSLG) based on B/PA, I first ran the linear regression analysis, then used R numerical summaries to determine the equation of the least squares regression line. The equation was y = 2.0553X + 0.349.

Plugging in B/PA as the x-variable, I found eSLG for each qualifying player. Finally, I subtracted eSLG from aSLG to demonstrate whether a player slugged above or below what he should have based on how often he put the barrel of the bat on the ball.

As a side note, I believe other analysts have attempted to do something similar with Exit Velocity and even Launch Angle before Statcast released Barrels. However, Exit Velocity doesn’t correlate nearly as strongly with slugging percentage and other statistics. Thus, I think we can safely use Barrels now that it has been released and is statistically significant.

Here are the “unluckiest” players of 2015, based on what they should have slugged:

Player eSLG aSLG SLG +/-
Brandon Moss .522 .407 -.115
Giovanny Ushela .441 .330 -.111
Jon Jay .365 .257 -.108
Kevin Plawecki .398 .296 -.102
Chris Carter .528 .427 -.101
Chris Iannetta .433 .335 -.098
Leonys Martin .402 .313 -.089
Michael Bourn .370 .282 -.088
Willson Ramos .444 .358 -.086
Tyler Flowers .439 .356 -.083
Justin Smoak .550 .470 -.080
Yasmani Grandal .481 .403 -.078
Justin Maxwell .417 .341 -.076

Now, the “luckiest”:

Player eSLG aSLG SLG +/-
Bryce Harper .534 .649 .115
Francisco Lindor .400 .482 .082
AJ Pollock .419 .498 .079
Joey Votto .462 .541 .079
David Peralta .448 .522 .074
Joe Panik .388 .455 .067
Michael Brantley .415 .480 .065
Nick Hundley .407 .467 .060
Andres Blanco .444 .502 .058
Nolan Arenado .518 .575 .057
Maikel Franco .441 .497 .056
Mark Teixera .495 .548 .053
Dustin Pedroia .388 .441 .053

Notice that some of the “luckiest” players are some the game’s best hitters. Bryce Harper had one of the greatest seasons ever in 2015—can we really contribute any of this to luck?

Research has proven that MLB talent is, in general, normally distributed, so it would make sense that the players who overperformed or underperformed their expected slugging averages based on Barrels would regress to the mean.

I looked at the slugging percentages of each of these players in 2016, to see if they did in fact regress.

The “unlucky” ones:

Player 2015 eSLG 2015 aSLG SLG +/- 2016 aSLG Δ 2015 eSLG and 2016 aSLG Δ SLG from 2015 to 2016
Brandon Moss .522 .407 -.115 .500 -.022 +.093
Giovanny Urshela .441 .330 -.111 N/A N/A N/A
Jon Jay .365 .257 -.108 .383 +.018 +.126
Kevin Plawecki .398 .296 -.102 .247 -.151 -.049
Chris Carter .528 .427 -.101 .486 -.042 +.059
Chris Iannetta .433 .335 -.098 .331 -.102 -.004
Leonys Martin .402 .313 -.089 .383 -.019 +.070
Michael Bourn .370 .282 -.088 .372 +.002 +.090
Willson Ramos .444 .358 -.086 .491 +.047 +.133
Tyler Flowers .439 .356 -.083 .410 -.029 +.054
Justin Smoak .550 .470 -.080 .401 -.149 -.069
Yasmani Grandal .481 .403 -.078 .489 +.008 +.086
Justin Maxwell .417 .341 -.076 N/A N/A N/A

The “lucky” ones:

Player 2015 eSLG 2015 aSLG SLG +/- 2016 aSLG Δ 2015 eSLG and 2016 aSLG Δ SLG from 2015 to 2016
Bryce Harper .534 .649 .115 .439 -.095 -.210
Francisco Lindor .400 .482 .082 .436 +.036 -.046
AJ Pollock .419 .498 .079 .390 -.029 -.108
Joey Votto .462 .541 .079 .529 +.067 -.012
David Peralta .448 .522 .074 .433 -.015 -.089
Joe Panik .388 .455 .067 .379 -.009 -.076
Michael Brantley .415 .480 .065 .282 -.133 -.198
Nick Hundley .407 .467 .060 .440 +.033 -.027
Andres Blanco .444 .502 .058 .406 -.038 -.096
Nolan Arenado .518 .575 .057 .573 +.057 -.002
Maikel Franco .441 .497 .056 .417 -.024 -.080
Mark Teixera .495 .548 .053 .343 -.052 -.205
Dustin Pedroia .388 .441 .053 .449 +.061 +.008

As was expected, most of the players in the tables regressed to the mean, or at least moved a little closer to the average. Of the “unlucky” players, notice that of the players who remained in MLB in 2016, only Plawecki, Iannetta, and Smoak didn’t see their slugging percentages rise. And Plawecki has actually played most of 2016 in the minor leagues, where he’s slugged an impressive 0.484.

The “lucky” players mostly showed regression, too. Bryce Harper is the most apparent, but every other player besides Dustin Pedroia also decreased in SLG% in 2016. It should be noted AJ Pollock and Michael Brantley are both recovering from injuries, and though their slugging averages have fallen, they’ve each played in just a handful of games.

Finding eISO

Determining Expected Isolated Power (eISO) for a player is similar to how we found eSLG. The equation for eISO was y = 1.982412X + 0.083254. Simply plug in the player’s B/PA percentage and the result will be what his ISO should have been based on how often he hit the ball on the sweet spot of the bat.

Here are the “unluckiest” players of 2015, based on what they should have posted in terms of ISO:

Player eISO aISO ISO +/-
Brandon Moss .250 .181 -.069
Giovanny Ushela .172 .105 -.067
Jorge Soler .200 .137 -.063
JD Martinez .313 .253 -.060
Giancarlo Stanton .400 .341 -.059
Michael Bourn .103 .045 -.058
Anthony Rendon .157 .100 -.057
Jacoby Ellsbury .143 .088 -.055
Kevin Plawecki .131 .077 -.054
Tyler Flowers .170 .118 -.052

And the “luckiest”:

Player eISO aISO ISO +/-
Mark Teixera .224 .293 .069
Rajai Davis .121 .182 .061
Bryce Harper .262 .319 .057
Jed Lowrie .129 .178 .049
Stephen Drew .135 .180 .045
Maikel Franco .172 .217 .045
Evan Gattis .174 .217 .043
Russell Martin .176 .218 .042
Nolan Arenado .246 .287 .041
Ben Zobrist .135 .173 .038

Now let’s do the same thing we did with Slugging Percentage—that is, take a look at how these players have fared in 2016. Did regression occur with ISO as it did (for the most part) with SLG?

The “unlucky” players in terms of eISO:

Player 2015 eISO 2015 aISO ISO +/- 2016 aISO Δ 2015 eISO and 2016 aISO Δ ISO from 2015 to 2016
Brandon Moss .250 .181 -.069 .265 +.015 +.084
Giovanny Urshela .172 .105 -.067 .105 -.067 .000
Jorge Soler .200 .137 -.063 .200 .000 +.063
JD Martinez .313 .253 -.060 .230 -.083 -.023
Giancarlo Stanton .400 .341 -.059 .254 -.156 -.087
Michael Bourn .103 .045 -.058 .112 +.009 +.067
Anthony Rendon .157 .100 -.057 .175 +.018 +.075
Jacoby Ellsbury .143 .088 -.055 .114 -.029 +.026
Kevin Plawecki .131 .077 -.054 .063 -.067 -.014
Tyler Flowers .170 .118 -.052 .143 -.027 +.025

The “lucky” ones:

Player 2015 eISO 2015 aISO ISO +/- 2016 aISO Δ 2015 eISO and 2016 aISO Δ ISO from 2015 to 2016
Mark Teixera .224 .293 .069 .146 -.178 -.147
Rajai Davis .121 .182 .061 .144 +.023 -.038
Bryce Harper .262 .319 .057 .197 -.065 -.122
Jed Lowrie .129 .178 .049 .059 -.070 -.119
Stephen Drew .135 .180 .045 .258 +.125 +.078
Maikel Franco .172 .217 .045 .181 +.009 -.036
Evan Gattis .174 .217 .043 .255 +.081 +.038
Russell Martin .176 .218 .042 .178 +.002 -.040
Nolan Arenado .246 .287 .041 .279 +.033 -.008
Ben Zobrist .135 .173 .038 .159 +.024 -.014

The results are similar to those we obtained from running the numbers to get Expected Slugging Percentage. Players who overperformed in 2015—those who likely benefitted from luck—saw their ISOs decrease by an average of 0.040 in 2016. Those who underperformed based on their B/PA had their ISOs increase by 0.021 in 2016. So it’s clear that some players just have bad luck some years—they hit the ball on the sweet spot of the bat more often than most, but aren’t rewarded with base hits.

 

Finding eHR/G

The final statistic we’ll develop is Expected Home Runs Per Game, or eHR/G. Once again, we’re focusing on Home Runs as one of the three main stats because it holds such a strong correlation with Barrels. The process is pretty much the same as it was for finding eSLG and eISO, so I won’t go into great detail.

The equation for eHR/G was y = 339.348X + 2.1723. We make B/PA percentage the input and eHR/G the output. If a player hit more home runs than he should have based on the percentage of balls he hit on the barrel, we call him “lucky,” at least during the 2015 season. If he hit fewer homers per game than would be expected, we call him “unlucky.”

The “unluckiest” players:

Player eHR/G aHR/G HR/G +/-
Justin Smoak 0.26 0.14 -0.12
Brandon Moss 0.22 0.13 -0.09
Randal Grichuk 0.25 0.17 -0.08
Abraham Almonte 0.14 0.06 -0.07
Brandon Belt 0.20 0.13 -0.07
Stephen Piscotty 0.18 0.11 -0.07
Andres Blanco 0.13 0.07 -0.06
Clint Robinson 0.14 0.08 -0.06
Jorge Soler 0.16 0.10 -0.06
Kendrys Morales 0.20 0.14 -0.06

And the “luckiest”:

Player eHR aHR HR +/-
Mark Teixera 0.19 0.28 0.09
Albert Pujols 0.18 0.25 0.07
Dustin Pedroia 0.07 0.13 0.06
Carlos Correa 0.16 0.22 0.06
Carlos Gonzolez 0.21 0.26 0.06
Jed Lowrie 0.08 0.13 0.05
Brian McCann 0.14 0.19 0.05
Nelson Cruz 0.24 0.29 0.05
Nolan Arenado 0.22 0.27 0.05
Edwin Encarnacion 0.22 0.27 0.05

You know the drill—we will now take a look at homer per game rate for 2016, to see if regression to the mean occurred for the players in both of these tables.

Here are the “unlucky” ones:

Player 2015 eHR/G 2015 aHR/G HR/G +/- 2016 aHR/G Δ 2015 eHR/G and 2016 aHR/G Δ HR/G from 2015 to 2016
Justin Smoak 0.26 0.14 -0.12 0.11 -0.15 -0.03
Brandon Moss 0.22 0.13 -0.09 0.22 0.00 +0.09
Randal Grichuk 0.25 0.17 -0.08 0.18 -0.07 +0.01
Abraham Almonte 0.14 0.06 -0.07 0.03 -0.11 -0.03
Brandon Belt 0.20 0.13 -0.07 0.11 -0.09 -0.02
Stephen Piscotty 0.18 0.11 -0.07 0.15 -0.03 +0.04
Andres Blanco 0.13 0.07 -0.06 0.05 -0.08 -0.02
Clint Robinson 0.14 0.08 -0.06 0.05 -0.09 -0.03
Jorge Soler 0.16 0.10 -0.06 0.14 -0.02 +0.04
Kendrys Morales 0.20 0.14 -0.06 0.20 0.00 +0.06

And the “lucky” ones:

Player 2015 eHR 2015 aHR HR +/- 2016 aHR/G Δ 2015 eHR/G and 2016 aHR/G Δ HR/G from 2015 to 2016
Mark Teixera 0.19 0.28 0.09 0.12 -0.07 -0.16
Albert Pujols 0.18 0.25 0.07 0.21 +0.03 -0.04
Dustin Pedroia 0.07 0.13 0.06 0.10 +0.03 -0.03
Carlos Correa 0.16 0.22 0.06 0.14 -0.02 -0.08
Carlos Gonzalez 0.21 0.26 0.06 0.17 -0.04 -0.09
Jed Lowrie 0.08 0.13 0.05 0.02 -0.06 -0.11
Brian McCann 0.14 0.19 0.05 0.15 +0.01 -0.04
Nelson Cruz 0.24 0.29 0.05 0.28 +0.04 -0.01
Nolan Arenado 0.22 0.27 0.05 0.26 +0.04 -0.01
Edwin Encarnacion 0.22 0.27 0.05 0.27 +0.05 0.00

Reviewing the tables above, it looks at though the data isn’t quite as telling for players who supposedly underperformed in home run rate in 2015. But for the overachievers, it’s a whole different story. The average “lucky” player in 2015 saw his HR/G rate fall by 0.06 bombs per contest. That’s almost 10 home runs over the stretch of a 162-game season. Using this model, we probably could have predicted that Mark Teixeira, who somehow belted 31 homers while only averaging 0.07 barrels for every plate appearance, would take a big step backward in power numbers in 2016. Analysis like this can be invaluable to a team deciding which players they want to go after in the trade market and who they might want to forget about when signing free agents.

 

Using eSLG, eISO, and eHR/G

How can we use the three expected statistics? They shouldn’t be the most decisive factor when a ball club makes choices regarding acquiring players or letting them go. But the concept is similar to Pythagorean Wins, which tell us how many wins a team should have given their run differentials. For example, the Texas Rangers have the best record in the AL in 2016, but Pythagorean Wins says they should have 13 fewer wins because they don’t outscore teams by all that much. This type of normative analysis can be advantageous when evaluating players without bias.

 

Conclusion

Using my model, we can plug in a player’s Barrels/PA to find what his slugging percentage, isolated power, and home run totals should be. This isn’t always telling—many factors decide the fate of every batted ball—but if the difference between eSLG and aSLG is abnormally large, if eISO is much lower than aISO, if eHR/G is twice as high as aHR/G, regression might be coming up in the near future.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s