(Jayne Kamin-Oncea / Getty Images)
by Nick Ceraso and Julian Frankel
A fascinating and purely statistical take on projecting Free Agent Salaries.
Update 12/7, 9:34 PM ET: The model was very successfully in predicting the salary of Dodgers pitcher Rich Hill, who recently inked a deal with the Dodgers. So far, so good for the model.
Just as Cubs fans have begun to lead normal lives again now that the World Series is a month behind us, the baseball world is returning to normality after being turned upside down. To baseball fans of all ages, the offseason is a time when hope runs rampant and rumors are a dime a dozen. It can be confusing to sift through the thousands of tweets, articles, and whispers through the grapevine about your team’s star and where he might be headed. At the end of the day, most of this information is irrelevant, but as fans, we need something to cling onto.
MLB free agency is particularly interesting, as baseball is the only one of the four major sports to not adhere to a salary cap. Baseball’s offseason truly is an open market, with only a relatively small luxury tax to be paid for teams with the biggest pockets. With 30 teams vying for the services of 200 players, things can get murky. But with the data overload in baseball, there must be a way to quantify this process.
The amount of money a free agent will make is difficult to predict. Many outside factors affect these contracts, such as a slow-developing market for the position, a glaring need by a large market team, or an impatient owner who wants to win now. Age also plays a big role, as older players are often unwilling to take short-term deals, and teams are unwilling to offer them long term ones. Because of this difficulty, we decided to make our projections as simple as possible– focusing exclusively on WAR. WAR stands for Wins Above Replacement, which essentially tells you how much more wins the player earns for the team, as opposed to his Triple-A counterpart.
We made two models based on WAR—one being linear, and the other being quadratic—that are based on the past three years’ free agent signings. For each model, we found the optimal coefficients along with the most accurate weighting of the player’s WAR for the previous 3 years that resulted in the least difference between actual salary versus predicted salary.
We also attempted to measure the impact that the availability of players for each position has on the market. By dividing individual players’ WAR by the total WAR available at their position, we were able to gauge their relative market strength. We then multiplied our WAR weighted average by (1 + (Player WAR) / (Total Position WAR)) in order to get a position adjusted WAR. This reduced the SSE significantly and improved the accuracy of both models as a result. Because some players are signed to Minor League contracts, which do not guarantee a roster spot or usually any money, it is difficult to assess the true value of them. Based on historical data of the percentage of players who earn roster spots and further incentives from minor league deals, we assigned a dollar amount of $200,000 for each MiLB contract.
The most precise blend of the three WARs for both models weights WAR for the last year very heavily, and the third year almost not at all. While this makes sense conceptually, it can cause our model to miss on some players.
For example, let’s look at outfielder Yoenis Cespedes. After defecting from Cuba and signing with the Oakland Athletics, Cespedes has enjoyed success during his time in the majors. Looking at his WAR from the past three seasons, he was worth 4.1 wins in 2014, 6.3 in 2015, and 2.9 this past season. As our model places a heavy emphasis on past year’s performance, his 2.9 WAR is the driving force behind his projected salary. However, his talent level exceeds his 2016 WAR figure, and he will most likely be paid a higher salary than our model projects. After two great years of contributing 4+ wins to his team, he will not be valued as heavily on his 2016 performance as his model suggests. With that in mind, we both agree that our model prediction of $12,390,000 is on the low side for Yoenis Cespedes this year.
On the other side of the ball, our model favors two types of players: those who display consistency throughout their careers, and “late-bloomers” who experience great campaigns after a few consistently average years. Let’s look at an example of each. First, the most highly touted starting pitcher in this year’s market is a late bloomer: Rich Hill. After bouncing around the majors, Hill found himself in the Red Sox organization as a reclamation project. Looking at his WAR from 2014-2016, it seems like it worked. 0.2, 1.6, and then 4.1. In a unique case like this, it appears that his salary will be driven by his performance this year more than past years and we believe our model prediction of $16,540,000 is right about what he’ll end up taking home.
Another example of an ideal player for our model is third baseman Justin Turner. Turner has been consistently good-to-great for the Dodgers, averaging a WAR of 4.33 since 2014. This past year, he fell right in line with that, being worth 4.9 wins. With his 2016 performance being indicative of the type of player that he is, we believe his projected salary of $20,000,000 will ultimately be what he ends up signing for.
All in all, this formula is far from a final product but provides a solid basis for predicting contracts that teams will either regret or be proud of. A large portion of the data that made up our model is of free agents who signed contracts with average annual salaries in the $1,000,000 to $5,000,000 range. Because of this, and partially due to the fact that higher-salary players are less common in free agency, a lot of variances still exists in those upper-level players. Other models out there, such as FanGraphs’ “value” metric, tend to have a different problem.
Instead of consistently making predictions on the low end, FanGraphs tends to overvalue the true worth of a player on the open market. As any smart mathematician would, they don’t disclose too much information about their metric. Essentially, they attempt to place a dollar value on each player in terms of what they would make in the open market, and it is based entirely on WAR. On the surface, that seems very similar to the model we’ve made ourselves. However, our predictions seem to represent the two extremes. For example, let’s look at three players, two of whom have already been discussed. Starting with Yoenis Cespedes, FanGraphs values his 2016 season as being worth $25,300,000. That’s more than double our prediction of $12,390,000 for 2017. Rich Hill, whom our model predicts a 2017 salary of $16,540,000 was worth $30,700,000 according to FanGraphs. Saving the largest contrast for last, Aroldis Chapman was worth $21,700,000 as a closer last year for the Yankees and Cubs, but our model only predicts a salary of $7,480,000 for next season.
So what’s going on here? It is easy to rule out that each of these three players are expected to undergo an enormous regression in 2017, experiencing upwards of a 100% decline in production. No, it is not a matter of the player’s actual talent level. Instead, it appears that both our model and FanGraphs’ model are far from perfect. While both models are based on performance (WAR), our model focuses solely on the technical analysis of their performance and leaves out almost everything else. FanGraphs, on the other hand, seem to place a lot of value on “market intangibles” or various factors that account for two players with equal productivity being paid differently. The most common market intangible is inflation, which can occur for a number of reasons. Inflation could mean a bidding war in a weak market or a new TV deal for a large market team. Essentially, FanGraphs is trying to capture a feature that is almost impossible to quantify. The result: a much larger “value” on the open market. Meanwhile, our model sticks to what is known, which is why our predicted salaries may be slightly less than what is actually paid on the open market.
With that in mind, our model has a decent track record of success, albeit with certain groups of players. For example, during the 2015-2016 offseason, we experienced great success in predicting some starting pitcher’s contracts. Let’s take a look at a few examples. First, John Lackey signed a two-year, $32,000,000 deal with the Cubs, our model predicted an annual salary of $16,000,000. Hisashi Iwakuma signed a 1 year, $12,000,000 deal with the Mariners, we predicted he’d make $11,925,600. Finally, a name you might be tired of seeing at this point, Rich Hill, was signed by Oakland to a 1 year, $6,000,000 contract, we predicted it would be $6,043,300.
Not only were these all starting pitchers, but they were starting pitchers who were not the best in their free agent class (David Price, Zack Greinke) thus they were not subject to as many market intangibles. These three starters all had an above average season in 2015, but they are not a franchise building block. On the other hand, one of the largest misses last year was 2B Daniel Murphy, who signed a three-year, $37,500,000 contract with the Nationals. Our model predicted him to earn $4,510,000 based off of his performance. However, above other market intangibles, Daniel Murphy changed his swing during the 2015 playoffs. With the change that he made, he was able to help to carry the Mets to the World Series, winning NLCS MVP along the way. Without accounting for his new swing (and therefore increased performance) our model vastly undershot his projected salary on the open market. These cases seem few and far between, and it is likely to expect that there will not be many cases like this in the future.
One of the great things about baseball is the uncertainty tied to every event, and it will be exciting to see how these predictions turn out. I hope you enjoyed reading this as much as we enjoyed creating it.
Nick Ceraso and Julian Frankel are sophomores at the University of Michigan in the Stephen M. Ross School of Business. They can be reached at firstname.lastname@example.org and email@example.com respectively.