What if you were able to correctly predict the winners of professional baseball games 80 percent of the time? 90 percent? Even 95 percent?

The Challenge

Over the course of the remainder of the 2011 MLB season, I will continuously optimize my prediction algorithm by strategically consuming the large array of statistical data that is readily available to each and every baseball fan.

Predictive Domains

The winner of each game is a function of four key domains, namely:

  1. Winning Ability – Some teams are just winners.
  2. Offense – Which team is able to outscore the other?
  3. Pitching – Who can stop the other from scoring?
  4. Home Field (Dis)Advantage – Does the home team really have home field advantage against a team that dominates on the road?

These domains are meant to encompass arguably all aspects of a baseball game. If a prediction is incorrect, we can look to these domains to better understand why.

MLB Prediction Engine

The intermediary function Z(AwayTeam, HomeTeam) outputs the variable z, which represents the aggregate value of the four domains. The AwayTeam and HomeTeam input parameters encapsulate the specific team information for the away team and home team, respectively. Ultimately, the z variable is inputted into the prediction engine PE such that PE(z) = PredictedWinner and PredictedWinner represents either the away team or the home team.

PredictionDifficulty (PD)

PredictionDifficulty

PredictionDifficulty Graph

As mentioned earlier, the variable z is the quantification of the four key domains. It is defined between 0.7 and 1.3, inclusively. The most important characteristic of z is its ability to map to the difficulty of predicting the winner of a game.

The PredictionDifficulty graph shown in the image demonstrates the parabolic nature of the PD(z) function. As shown in the graph, a z-value of 1 denotes the most difficult game to accurately predict; while, a z-value of approximately 0.7 or 1.3 represents the easiest game to predict.

What to Expect

In future entries, I will provide my predictions for the subsequent day’s games, with an in-depth look at one particular prediction.

Information that I will share with you includes:

  • Visual analysis of the four key domains for one game
  • The PredictionDifficulty for every game
  • The PredictedWinner for every game
  • Prediction Statistics (overall and based on PredictionDifficulty)

Specific information that will not be disclosed includes:

  • The PredictionDifficulty(z) function
  • Domain-specific values for any game
  • The z-value for any game
  • Any prediction engine rules

Enjoy!

 

One Response to “Developing a Heuristic Approach to Predicting the Winners of MLB Games”

  1. I think this is among the most significant information for me. And i’m glad reading your article. But wanna remark on few general things, The website style is perfect, the articles is really great : D. Good job, cheers

Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

© 2011 Athletalytics | About | Contact | Login | Subscribe Suffusion theme by Sayontan Sinha