What if you were able to correctly predict the winners of professional baseball games 80 percent of the time? 90 percent? Even 95 percent?
Over the course of the remainder of the 2011 MLB season, I will continuously optimize my prediction algorithm by strategically consuming the large array of statistical data that is readily available to each and every baseball fan.
The winner of each game is a function of four key domains, namely:
- Winning Ability – Some teams are just winners.
- Offense – Which team is able to outscore the other?
- Pitching – Who can stop the other from scoring?
- Home Field (Dis)Advantage – Does the home team really have home field advantage against a team that dominates on the road?
These domains are meant to encompass arguably all aspects of a baseball game. If a prediction is incorrect, we can look to these domains to better understand why.
MLB Prediction Engine
The intermediary function Z(AwayTeam, HomeTeam) outputs the variable z, which represents the aggregate value of the four domains. The AwayTeam and HomeTeam input parameters encapsulate the specific team information for the away team and home team, respectively. Ultimately, the z variable is inputted into the prediction engine PE such that PE(z) = PredictedWinner and PredictedWinner represents either the away team or the home team.
As mentioned earlier, the variable z is the quantification of the four key domains. It is defined between 0.7 and 1.3, inclusively. The most important characteristic of z is its ability to map to the difficulty of predicting the winner of a game.
The PredictionDifficulty graph shown in the image demonstrates the parabolic nature of the PD(z) function. As shown in the graph, a z-value of 1 denotes the most difficult game to accurately predict; while, a z-value of approximately 0.7 or 1.3 represents the easiest game to predict.
What to Expect
In future entries, I will provide my predictions for the subsequent day’s games, with an in-depth look at one particular prediction.
Information that I will share with you includes:
- Visual analysis of the four key domains for one game
- The PredictionDifficulty for every game
- The PredictedWinner for every game
- Prediction Statistics (overall and based on PredictionDifficulty)
Specific information that will not be disclosed includes:
- The PredictionDifficulty(z) function
- Domain-specific values for any game
- The z-value for any game
- Any prediction engine rules