Image: (Brad Penner/Imagn Images)
Introduction
To identify which ballparks in Major League Baseball (MLB) are most conducive for scoring runs, baseball statisticians rely on a statistic called “park factors.” Variations of the statistic have been developed since the late 1990s, but they all use outcomes (singles, doubles, triples, home runs, runs scored, etc) to calculate the factors. As it is currently calculated, the stat compares offensive results across different ballparks to determine which ones are most or least amenable to offensive output. This is done for every MLB player and team, as well some minor league stadiums. However, a potential problem with this approach is that it may be subject to statistical noise and small sample size, seen in umpires, player injuries, and limited matchups.
Furthermore, traditional park factors cannot concretely predict how a park will affect run scoring. A prominent statistic, xwOBA (expected weighted on-base average or estimated speed-angle estimate), only uses exit speed and launch angle (and in some cases sprint speed) to determine what the batter deserved on a ball hit into play unless the play resulted in a strikeout, which would result in a wOBA (weighted on-base average) and xwOBA of 0.
The stat does not consider park factors or environmental conditions, two components that one would consider very important when pertaining to the flight of the ball. This is apparent in Baseball Savant’s Park Factors as Coors Field, easily the most run-conducive environment in MLB due to its high elevation, had an xwOBA (on batted balls) factor that was just 1% higher than the league average in 2024. While an xwOBA without considering park factors is useful in certain contexts because it provides information on how a batter or team would perform in a neutral context, I also believe that it would be prudent to create an xwOBA that takes park factors into account. This would be for the intention of discerning how well teams and players would perform in specific environments. This research paper proposes that using physical ballpark dimensions and environmental factors could provide more accurate, stable, and predictable park factor measurements than those of traditional outcome-based calculations as well as creating a new and improved xwOBA.
Literature Review
When attempting to identify exactly how ballparks affect run scoring in MLB, past research can be of great use. Obtaining information from other studies or articles helps provide a comprehensive overview of a specific topic, allowing researchers to gain more knowledge on the subject and build off of it. In my study, I used literature focusing on park dimensions, environmental factors, teams moving into new ballparks, and home field advantage to get a better understanding of the work other people have done within the scope of this research paper.
Park Dimensions in Major League Baseball
There has been limited work done to further our understanding of how ballparks affect run scoring. However, Biesiadny (2016) used shape analysis and regression to find out which park dimensions are most conducive to on-base percentage and home runs. Using just 2015 MLB data, Biesiadny posited that a deeper than average left field wall and a shallower than average center field wall maximizes on-base percentage, while a shape almost the same as that of the mean MLB ballpark would maximize home runs. In my study, I further developed this understanding of how ballparks affect player statistical outcomes.
Environmental factors of ballparks
Atmospheric conditions are crucial in our understanding of playing fields. For example, air density, altitude, temperature, barometric pressure, and relative humidity can affect how a ball spins or travels in midair, specifically when a pitcher is throwing a pitch to a batter (Bahill et al., 2009). Depending on the type of pitch and location, a ball can spin in the air differently, meaning that the pitch could be easier or harder for a batter to hit. Denver has a higher altitude than other cities in MLB, resulting in the Magnus forces (drag and lift) being the smallest, and thus making the Rockies’ Coors Field the best place for the ball to travel.
For every ten-percent decrease in air density (which takes into account altitude, temperature, relative humidity, and barometric pressure), there is a four-percent increase in the distance of a batted ball. Certain pitches, like fastballs and breaking balls, can also be affected by the change in air density. For example, a curveball in Coors Field, while increased in speed by one percent with a ten percent decrease in air density, also loses nine percent of its vertical drop, making it easier to hit. Air density is an important factor to consider when observing team performance in a certain environment.
Cloud cover along with the time of the game may also be an important consideration. According to Kent and Sheridan (2011), strikeouts and errors increase in clear sky conditions. In addition, flyouts and groundouts decrease in daytime games compared to nighttime games. There were certain stadiums that showed significant changes in other categories, like batting average (e.g. Wrigley Field) and ERA (e.g. Oakland Coliseum), depending on cloud cover or time of the game. Kent and Sheridan also found out that the home team wins a higher percentage of their games under clear skies compared to under cloudy skies. A possible reason for this could have been due to the home team knowing how to deal with specific angles of sunshine within their field of play. Understanding how cloud cover and game time contribute to a run scoring environment is key for ballpark analysis.
Moving into a new ballpark
Research examining the impact of moving into a new ballpark and how that affects a team’s performance is limited because this type of event is rare. Sommers (2010) analyzed five teams that did just that. He wanted to find out if a new ballpark added a significant home field advantage in the inaugural season. Before conducting his research, Sommers suggested that maybe these teams could perform better after moving into their new stadium due to the new facility attracting more fans and the home team taking advantage of the shape and conditions of the ballpark.
All five teams that Sommers tested were from the ‘90s, and ultimately, it was found out that the building of a new stadium did not have a significant impact on a club’s home win percentage. More recently, the Baltimore Orioles adjusted their ballpark to create better conditions for their hitters before the 2025 season; same goes for the Detroit Tigers in 2023. However, no extensive research has been done as of yet to see if those changes had significant effects on win percentage. I sought to contribute to this literature by identifying exactly what dimensions a team or player should want to play in to maximize their win percentage or statistical outcomes.
Assessing home field advantage
When aiming to understand a ballpark’s run scoring environment, I considered how a team is already using their facility to their advantage. Recent studies have found that playing at home does significantly increase the chances of a team winning (Risser et al., 2018; Higgs & Stavness, 2021). Not only is this the case in baseball, but it is also the case in most North American and European professional sports. In addition to looking at home field advantage, Higgs and Stavness also analyzed the effect of home field advantage in the NBA, NFL, NHL, and MLB during Covid-19 when there were no fans in the stands. They found that home field advantage was negatively impacted in both the NHL and NBA but not MLB and NFL. While these leagues had different setups (NBA had the “bubble”), what I focused on was MLB and that the crowd, specifically, does not seem to have a discernible impact on the outcome of the game.
Overall, these findings have important implications for how some ballparks may be easier to hit in than others. Specifically, teams and players might want to take ballpark dimensions, atmospheric conditions, and overall home field advantage into consideration within the realm of game strategy or when determining the run scoring environment.
Data
To assess how a ballpark’s environment impacts player performance, I combined data from multiple different sources. I obtained pitch by pitch level data for the 2024 MLB season from Statcast (Baseball Savant), weather data from Retrosheet (Retrosheet), and park dimensions from Andrew Clem Baseball (Clem Baseball). From the Statcast API, I filtered for pitches with batted ball outcomes that could be assigned an xwOBA value.
Last season, there were 133,436 balls hit into play. Using 32 variables, I was able to create an xwOBA for these balls based not only on the batted ball metrics but also based on environmental and park factors. Launch speed (in mph), launch angle, attack angle, and attack direction were used to identify how the player hit the ball. Fielders and alignment of the defense were also taken into consideration to see how good the defense is and where exactly they are stationed on any given batted ball.
Dimensions of the park, including distances (in feet) to left field, center field, and right field as well as wall heights (in feet) and area (in square feet) of the field (fair, foul, backstop) were accounted for to determine how much space there was on the field as well as how far a player had to hit it for the batted ball to be a home run. Environmental conditions, such as temperature (in degrees Fahrenheit), wind direction (as specified by if the wind is blowing from one part of the field to another), wind speed (in mph), level of precipitation, the surface played on, elevation (in feet), sky (sunny, dome, cloudy, overcast), and whether it is day or night were also factored in.
Table 1
| Variable | N | Mean | SD | Min | Max |
| Area Fair | 133436 | 109.857 | 3.741 | 104.2 | 119.2 |
| Backstop Area | 133327 | 51.910 | 4.920 | 42 | 65 |
| CF Dim | 133436 | 403.229 | 5.708 | 385 | 420 |
| CF Wall Height | 133285 | 9.510 | 3.896 | 6 | 25 |
| Elevation | 133150 | 519.373 | 938.775 | 0 | 5190 |
| Foul | 133150 | 23.456 | 4.144 | 16.5 | 40.7 |
| LC Dim | 110406 | 383.584 | 9.847 | 363 | 399 |
| LC Wall Height | 133285 | 10.350 | 4.082 | 7 | 25 |
| LF Dim | 133436 | 331.662 | 9.618 | 310 | 355 |
| LF Wall Height | 133285 | 9.948 | 6.299 | 4 | 37 |
| RC Dim | 110406 | 383.672 | 9.002 | 370 | 403 |
| RC Wall Height | 133285 | 11.210 | 4.882 | 5 | 25 |
| RF Dim | 133436 | 328.300 | 10.216 | 302 | 353 |
| RF Wall Height | 133285 | 10.940 | 5.598 | 3 | 25 |
| Attack Angle | 118973 | 8.523 | 9.067 | -86.770 | 86.088 |
| Attack Direction | 118973 | 0.353 | 13.517 | -179.463 | 179.675 |
| Launch Angle | 133436 | 13.182 | 28.692 | -90 | 90 |
| Launch Speed | 133436 | 88.509 | 14.536 | 8.8 | 121.5 |
| Temperature | 133436 | 72.593 | 10.636 | 0 | 103 |
| Wind Speed | 133436 | 6.834 | 4.900 | 0 | 26 |
| wOBA value | 133436 | 0.375 | 0.572 | 0 | 2 |
Method
I implemented an XGBoost (Extreme Gradient Boosting) regression model to predict wOBA values from batted ball characteristics as well as environmental conditions.
The model incorporated 32 predictor variables:
Table 2
| Ball flight characteristics | Launch angle, launch speed, attack angle, attack direction |
| Defensive positioning | Fielder positions, infield and outfield alignment |
| Environmental conditions | Day/night games, temperature, wind direction and speed, precipitation |
| Ballpark characteristics | Playing surface, fair territory area, roof coverage, dimensional measurements (left field, center field, right field distances and wall heights), backstop distance, foul territory, elevation |
| Atmospheric conditions | Sky conditions |
The dataset was randomly split into training (80%) and testing (20%). Observations with missing values in either predictor variables or the target variable (wOBA) were excluded from the analysis as ultimately, 112,324 entries were considered.
After creating and testing the model using all of the valid observations, I simulated every batted ball in the 2024 dataset across all ballparks, applying each venue’s typical weather patterns to evaluate how ball flight outcomes would differ by location. This was used to create park factors as well as batter and pitcher xwOBA values for each and every ballpark. To be qualified, a player had to either have or give up at least 100 batted balls, so I ended up analyzing 398 batters and 436 pitchers.
A minimum threshold of 100 batted balls per player was selected to identify qualifying batters and pitchers for analysis to balance the need for reliable individual player estimates while also maintaining a sufficient number of qualifying players. This resulted in an effective total sample size of at least 39,800 batted balls across 398 qualifying batters (and similarly for 436 qualifying pitchers) from the full dataset of 112,324 batted balls. A 95% confidence level was assumed, which corresponds to a standard normal critical value of z = 1.96. The population standard deviation for wOBA was calculated from the full dataset to be 0.572. At this combined sample size, the margin of error for estimating mean wOBA across all qualifying players was estimated using this formula:
Where z is 1.96, σ is the estimated population standard deviation for wOBA (0.572), and ME is the margin of error. With the total sample size of at least 39,800 batted balls, this yielded a ME of approximately 0.006 for the overall population estimate. Per player, the ME was at least 0.112.
The XGBoost model was created with the following key parameters:
Table 3
| Objective function | Squared error loss (reg:squarederror) |
| Evaluation metric | Root Mean Square Error (RMSE) |
| Tree structure | Maximum depth of 6 levels |
| Learning rate (eta) | 0.1 |
| Subsampling | 80% of training samples per tree |
| Feature Sampling | 80% of features per tree (colsample_bytree) |
| Regularization | L2 regularization (lambda = 1), L1 regularization (alpha = 0) |
| Tree constraints | Minimum child weight = 1, gamma = 0 |
I employed 5-fold cross-validation to determine the optimal number of boosting rounds. The training process continued for up to 1,000 rounds, with early stopping triggered if no improvement in validation RMSE occurred for 50 consecutive rounds.
The final model was trained using the optimal number of boosting rounds identified through cross-validation. XGBoost builds decision trees in which each new tree attempts to correct the errors made by the previous trees.
Raw model predictions were constrained to the valid wOBA range [0, 2] through clipping, ensuring realistic output values.
Model performance was evaluated using multiple metrics, including root Mean Square Error (RMSE), Mean Absolute Error (MAE), and R-squared.
Results
Model performance was compared against existing speed-angle based wOBA estimates (from the test data of 23,775 entries) to quantify the improvement achieved by incorporating ballpark, environmental, and defensive alignment factors into the calculation.
Table 4
| Site | N | xgb_RMSE | xgb_MAE | xgb_R2 | sa_RMSE | sa_MAE | sa_R2 | RMSE_Improvement | MAE_Improvement |
| T-Mobile Park (Mariners) | 645 | 0.378 | 0.272 | 0.566 | 0.418 | 0.293 | 0.467 | 9.569 | 7.167 |
| Target Field (Twins) | 833 | 0.405 | 0.288 | 0.519 | 0.446 | 0.306 | 0.423 | 9.193 | 5.882 |
| Tropicana Field (Rays) | 720 | 0.396 | 0.282 | 0.468 | 0.432 | 0.293 | 0.375 | 8.333 | 3.754 |
| Great American Ball Park (Reds) | 801 | 0.413 | 0.295 | 0.463 | 0.443 | 0.305 | 0.384 | 6.772 | 3.279 |
| Busch Stadium (Cardinals) | 876 | 0.411 | 0.292 | 0.434 | 0.44 | 0.307 | 0.355 | 6.591 | 4.886 |
| Progressive Field (Guardians) | 800 | 0.397 | 0.284 | 0.486 | 0.425 | 0.293 | 0.411 | 6.588 | 3.072 |
| Comerica Park (Tigers) | 831 | 0.404 | 0.285 | 0.457 | 0.432 | 0.299 | 0.38 | 6.481 | 4.682 |
| Oracle Park (Giants) | 849 | 0.392 | 0.275 | 0.477 | 0.419 | 0.287 | 0.406 | 6.444 | 4.181 |
| Coors Field (Rockies) | 857 | 0.393 | 0.295 | 0.534 | 0.418 | 0.291 | 0.481 | 5.981 | -1.375 |
| Daikin Park (Astros) | 747 | 0.442 | 0.298 | 0.457 | 0.47 | 0.314 | 0.388 | 5.957 | 5.096 |
| Petco Park (Padres) | 773 | 0.401 | 0.28 | 0.485 | 0.426 | 0.297 | 0.418 | 5.869 | 5.724 |
| Oakland Coliseum (Athletics) | 714 | 0.401 | 0.275 | 0.534 | 0.425 | 0.289 | 0.472 | 5.647 | 4.844 |
| Truist Park (Braves) | 777 | 0.402 | 0.29 | 0.477 | 0.426 | 0.301 | 0.413 | 5.634 | 3.654 |
| Kauffman Stadium (Royals) | 806 | 0.4 | 0.279 | 0.453 | 0.423 | 0.292 | 0.391 | 5.437 | 4.452 |
| Rate Field (White Sox) | 808 | 0.42 | 0.294 | 0.456 | 0.444 | 0.304 | 0.395 | 5.405 | 3.289 |
| Yankee Stadium (Yankees) | 831 | 0.412 | 0.286 | 0.491 | 0.435 | 0.295 | 0.436 | 5.287 | 3.051 |
| Rogers Centre (Blue Jays) | 832 | 0.412 | 0.291 | 0.441 | 0.435 | 0.299 | 0.375 | 5.287 | 2.676 |
| Citi Field (Mets) | 735 | 0.394 | 0.28 | 0.491 | 0.415 | 0.29 | 0.433 | 5.06 | 3.448 |
| Nationals Park (Nationals) | 872 | 0.382 | 0.273 | 0.518 | 0.402 | 0.284 | 0.467 | 4.975 | 3.873 |
| Fenway Park (Red Sox) | 860 | 0.4 | 0.289 | 0.495 | 0.419 | 0.287 | 0.45 | 4.535 | -0.697 |
| American Family Field (Brewers) | 815 | 0.404 | 0.284 | 0.536 | 0.423 | 0.295 | 0.489 | 4.492 | 3.729 |
| Citizens Bank Park (Phillies) | 781 | 0.417 | 0.303 | 0.473 | 0.436 | 0.309 | 0.423 | 4.358 | 1.942 |
| Globe Life Field (Rangers) | 813 | 0.394 | 0.266 | 0.509 | 0.411 | 0.283 | 0.464 | 4.136 | 6.007 |
| Dodger Stadium (Dodgers) | 686 | 0.389 | 0.274 | 0.526 | 0.405 | 0.285 | 0.485 | 3.951 | 3.86 |
| PNC Park (Pirates) | 825 | 0.399 | 0.28 | 0.512 | 0.414 | 0.283 | 0.475 | 3.623 | 1.06 |
| Angel Stadium (Angels) | 824 | 0.39 | 0.261 | 0.578 | 0.404 | 0.27 | 0.548 | 3.465 | 3.333 |
| loanDepot Park (Marlins) | 751 | 0.399 | 0.296 | 0.535 | 0.412 | 0.297 | 0.504 | 3.155 | 0.337 |
| Oriole Park (Orioles) | 782 | 0.405 | 0.285 | 0.533 | 0.414 | 0.29 | 0.513 | 2.174 | 1.724 |
| Chase Field (Diamondbacks) | 797 | 0.427 | 0.307 | 0.448 | 0.435 | 0.311 | 0.427 | 1.839 | 1.286 |
| Wrigley Field (Cubs) | 734 | 0.391 | 0.271 | 0.467 | 0.395 | 0.273 | 0.455 | 1.013 | 0.733 |
Table 5
| Site | Identifier for the ballpark |
| N | Number of test samples (batted balls) from this site |
| xgb | XGBoost model |
| sa | Speed-angle (xwOBA) estimate model |
| RMSE_Improvement | Percent improvement in RMSE of XGBoost over the speed-angle method |
| MAE_Improvement | Percent improvement in MAE of XGBoost over the speed-angle method |
Table 6
| Rank | Feature | Gain | Cover | Frequency |
| 1 | Launch Angle | 0.4422 | 0.1491 | 0.1761 |
| 2 | Launch Speed | 0.3457 | 0.1726 | 0.1549 |
| 3 | Attack Direction | 0.0887 | 0.164 | 0.1255 |
| 4 | Attack Angle | 0.0398 | 0.1293 | 0.1149 |
| 5 | Temperature | 0.0092 | 0.0423 | 0.0395 |
| 6 | Third Baseman | 0.0062 | 0.0283 | 0.0325 |
| 7 | Center Fielder | 0.0057 | 0.0328 | 0.032 |
| 8 | Second Baseman | 0.0053 | 0.0326 | 0.029 |
| 9 | Left Fielder | 0.0053 | 0.0296 | 0.0293 |
| 10 | Catcher | 0.005 | 0.0289 | 0.028 |
| 11 | Right Fielder | 0.005 | 0.0326 | 0.0286 |
| 12 | Wind Speed | 0.0041 | 0.0214 | 0.0228 |
| 13 | First Baseman | 0.004 | 0.0213 | 0.0222 |
| 14 | Shortstop | 0.0039 | 0.0188 | 0.0209 |
| 15 | Area Fair | 0.0032 | 0.0067 | 0.0139 |
| 16 | Wind Direction | 0.0028 | 0.0128 | 0.0141 |
| 17 | Backstop Area | 0.0028 | 0.0064 | 0.0128 |
| 18 | Elevation | 0.0026 | 0.0065 | 0.0124 |
| 19 | Foul | 0.0026 | 0.0067 | 0.0112 |
| 20 | RF Wall Height | 0.0023 | 0.0053 | 0.0095 |
| 21 | LF Dim | 0.0022 | 0.0063 | 0.0098 |
| 22 | LF Wall Height | 0.0021 | 0.0107 | 0.0091 |
| 23 | CF Dim | 0.002 | 0.0068 | 0.0094 |
| 24 | IF fielding alignment | 0.0019 | 0.0102 | 0.0116 |
| 25 | RF Dim | 0.0015 | 0.0046 | 0.0075 |
| 26 | CF Wall Height | 0.001 | 0.0026 | 0.0054 |
| 27 | Sky | 0.0008 | 0.0034 | 0.0053 |
| 28 | Day or night | 0.0007 | 0.0029 | 0.0042 |
| 29 | Precipitation | 0.0004 | 0.0035 | 0.0019 |
| 30 | OF Fielding Alignment | 0.0004 | 0.0006 | 0.0027 |
| 31 | Cover | 0.0003 | 0.0004 | 0.0018 |
| 32 | Surface | 0.0002 | 0.0003 | 0.001 |
Table 7
| Gain | The average improvement in model accuracy (loss reduction) brought by splits using this feature-measures how useful the feature is |
| Cover | The average number of observations affected by splits on this feature- measures how broadly the feature is used |
| Frequency | The proportion of times the feature is used in all trees – measures how often the feature is selected |
Table 8
| Metric | XGBoost | xwOBA | Improvement |
| RMSE | 0.4027 | 0.4252 | 5.29% |
| MAE | 0.2844 | 0.2941 | 3.3% |
| R-squared | 0.4948 | 0.4366 | 5.82% |
As presented in the table above, the XGBoost model outperformed the model used for traditional xwOBA. It achieved a 5.29% lower root mean squarederror and a 3.3% reduction in mean absolute error compared to the traditional, publicly used model. Additionally, it explained 5.82% more of the variance in wOBA.
After producing the xwOBA model, I wanted to see how it would rank the stadiums in terms of their respective run scoring environments. To accomplish this, I made calculations for each ballpark over the course of the 2024 season for the average temperature, average wind speed, most common day or night (in terms of when or what time the game was played), most common wind direction, most common level of precipitation, and most common sky condition. I took each ballpark’s dimensions into account, and every other variable in the XGBoost model was either set to 0 or calculated for its median league wide so that every ballpark would have the same values for variables in which they cannot control.
Table 9
| Site | Average Predicted wOBA | Park Factor | |
| 1 | Coors Field (Rockies) | 0.418 | 1.120 |
| 2 | Fenway Park (Red Sox) | 0.390 | 1.046 |
| 3 | Target Field (Twins) | 0.390 | 1.044 |
| 4 | Great American Ball Park (Reds) | 0.389 | 1.042 |
| 5 | Chase Field (Diamondbacks) | 0.387 | 1.038 |
| 6 | Angel Stadium (Angels) | 0.386 | 1.035 |
| 7 | Citizens Bank Park (Phillies) | 0.385 | 1.032 |
| 8 | Truist Park (Braves) | 0.383 | 1.025 |
| 9 | Kauffman Stadium (Royals) | 0.381 | 1.020 |
| 10 | Progressive Field (Guardians) | 0.376 | 1.008 |
| 11 | American Family Field (Brewers) | 0.376 | 1.007 |
| 12 | Daikin Park (Astros) | 0.373 | 1.000 |
| 13 | loanDepot Park (Marlins) | 0.372 | 0.998 |
| 14 | Yankee Stadium (Yankees) | 0.372 | 0.997 |
| 15 | PNC Park (Pirates) | 0.371 | 0.993 |
| 16 | Dodger Stadium (Dodgers) | 0.370 | 0.991 |
| 17 | Rogers Centre (Blue Jays) | 0.370 | 0.991 |
| 18 | Oriole Park (Orioles) | 0.369 | 0.990 |
| 19 | Petco Park (Padres) | 0.366 | 0.980 |
| 20 | Citi Field (Mets) | 0.365 | 0.978 |
| 21 | Oracle Park (Giants) | 0.364 | 0.976 |
| 22 | Oakland Coliseum (Athletics) | 0.364 | 0.975 |
| 23 | Rate Field (White Sox) | 0.363 | 0.973 |
| 24 | Comerica Park (Tigers) | 0.362 | 0.971 |
| 25 | Busch Stadium (Cardinals) | 0.362 | 0.971 |
| 26 | Nationals Park (Nationals) | 0.361 | 0.967 |
| 27 | Tropicana Field (Rays) | 0.359 | 0.961 |
| 28 | T-Mobile Park (Mariners) | 0.358 | 0.960 |
| 29 | Globe Life Field (Rangers) | 0.357 | 0.957 |
| 30 | Wrigley Field (Cubs) | 0.355 | 0.953 |
More detailed information on park factors as well as player evaluations for each ballpark can be viewed and analyzed through this link: https://drive.google.com/drive/folders/1ZWQBMpdefusP-_E-LdiuopWB6JogQMeN?usp=drive_link
Unsurprisingly, Coors Field ranked as the most run-conducive environment and T-Mobile Park ranks in the bottom three. The majority of these rankings generally correlated to the traditional park factors ranking for 2024 (based on Baseball Savant). However, there were a couple of surprises. For example, American Family Field was 11th in this ranking, while being 25th according to Baseball Savant. Truist Park was 8th in this ranking, but 21st in Baseball Savant’s. Fenway Park and Angel Stadium also rank much higher on this list with the Red Sox, of note, ranked as the 2nd most hitter-friendly environment. On the opposite spectrum, Oriole Park, Daikin Park, Yankee Stadium, and loanDepot Park were more hitter-friendly according to Baseball Savant.
Almost every batter in the dataset would have had their best year if they played all their games at Coors Field. The only two exceptions were Connor Norby (108 balls analyzed) and Juan Soto (438 balls analyzed), both of whom were designated as ground ball hitters according to the model. Norby’s best ballpark was Fenway while Soto’s best ballpark was Great American even though his xwOBA was greater for 61% of his batted balls in Colorado compared to Cincinnati.
However, 84.4% of Soto’s batted balls hit at 105+ mph would have performed better in Great American Ballpark, compared to just 19.6% of all batted balls hit at that velocity or harder across the entire dataset. Even Soto’s hard-hit ground balls would have performed better in Cincinnati. While Coors Field was clearly the best ballpark for offense, Great American was the most home run friendly, and at least with Soto and the way he was hitting the ball in 2024, he would have gotten the ball closer or above the outfield wall more often in Cincinnati rather than in Denver.
The worst ballpark for most batters was Wrigley Field, but Globe Life Field was not far behind with 128 players and T-Mobile Park with 87. In total, there were ten distinct parks where batters would have had the least xwOBA in if all their batted balls occurred in that specific park: Rate Field (1), Comerica Park (2), PNC Park (1), Oracle Park (3), Busch Stadium (1), Tropicana Field (7), Nationals Park (1). In all ballparks, lefties generally performed better than righties.
For pitchers, the best ballparks to pitch in included most of the ballparks where batters would perform their worst in, which makes sense. The best park to pitch in was Wrigley Field with 253 pitchers being the main beneficiaries. Other venues in which pitchers would perform their best in were Globe Life Field (113), T-Mobile Park (46), Tropicana Field (21), Comerica Park (1), Busch Stadium (1), and Nationals Park (1). All pitchers’ worst park to pitch in would have been Coors Field.
Discussion and Conclusion
This study aimed to find an alternative xwOBA to the one that is currently being used publicly. This, in turn, would lead to my other goal of creating better park factors. To achieve this, I found factors that the xwOBA was not considering, like field dimensions and weather, and used them in an XGBoost model to create a new xwOBA. While statistics like launch angle and launch speed were the two most important factors in predicting the statistic, temperature also played a major role, and it was interesting that the identity of the third baseman was associated more significantly than that of any of the other positions.
While this model ultimately performed better than the traditional method of just using exit velocity and launch angle, it was not by very much. One of the reasons for this could be that I used the wrong model. I tried random forest, k-nearest neighbors, and other types of algorithms, but XGBoost did perform the best even though in some cases, I thought that it should have classified certain hits as home runs due to ballpark factors, like when a player hit the ball to the short right field distance and fence at Fenway Park, for example. After some review, I believe it came up short in this regard.
I also believe that more years of data would have helped. For this project, I only worked with Statcast data from the 2024 MLB regular season, which, looking back, may not have been enough to work with. However, it may have proved difficult to account for certain variables over the course of multiple seasons, like fielders, for example, who might improve or decline in performance more drastically between years.
I was impressed with how the XGBoost came up with the park factors for every ballpark because in my opinion, the ranking of park factors was better for this model than that of Baseball Savant’s because it is simply based on dimensions and environment than outcomes. Citizens Bank Park, for example, was ranked 16th in park factors in Baseball Savant but 7th in my model, which I believe is more accurate, as it was easier to hit home runs in that ballpark due to smaller fence distances.
When it comes to specifically evaluating players, the amount of data and intriguing tidbits I could discuss is endless, so I included a Google Drive link to my findings in the results section. With this, one could analyze specific players, ballparks, trends, and more to uncover details that were not included in this writing.
The fact that the model had lefty batters performing better than righty batters in all ballparks was not reasonable. Dodger Stadium, for example, is a better ballpark for right-handed batters. It is also worth mentioning that I think this type of model is best used for batters as pitchers do not have as much control over where the ball ends up within the field of play. That being said, given the finite number of plate appearances over the course of a season, the variation in wOBA between ballparks may not play much of a role year by year, at least on a player-level.
Overall, I still believe that this work is a significant step toward understanding how ballpark and environmental factors can impact balls hit into play. Furthermore, this may help set the groundwork for front offices and baseball enthusiasts to better evaluate players, teams, and run scoring environments. In addition, this model may provide insights into how some organizations might want to modify or construct their ballparks in the future to try and increase their win totals.
References
Bahill, A. T., Baldwin, D. G., & Ramberg, J. S. (2009). Effects of altitude and atmospheric conditions on the flight of a baseball. International Journal of Sports Science and Engineering, 3(2), 109-128.
Biesiadny, S. (2016). Modeling the relationship between MLB ballparks and home team performance using shape analysis (Doctoral dissertation).
Clem, A. G. (n.d.). Clem’s Baseball ~ Introduction / navigation page. http://www.andrewclem.com/Baseball/
Higgs, N., Stavness, I. Bayesian analysis of home advantage in North American professional sports before and during COVID-19. Sci Rep 11, 14521 (2021). https://doi.org/10.1038/s41598-021-93533-w
Kent, W. P., & Sheridan, S. C. (2011). The Impact of Cloud Cover on Major League Baseball. Weather, Climate, and Society, 3(1), 7-15. https://doi.org/10.1175/2011WCAS1093.1
Retrosheet Event Files. (n.d.). https://www.retrosheet.org/game.htm
Risser, Michael S.; Gray, Blake R.; and Kelly, Ryan A., “Impact of Home Field Advantage: Analyzed Across Three Professional Sports” (2018). Student Publications. 611.
https://cupola.gettysburg.edu/student_scholarship/611
Sommers, P. M. (2010). Do new ballparks affect the Home-Field advantage? (Report No. 10–15). Middlebury College. https://core.ac.uk/download/pdf/6851604.pdf
Statcast Search CSV Documentation. (n.d.). baseballsavant.com. https://baseballsavant.mlb.com/csv-docs
Leave a comment