Machine Learning Model Updates: 12/27

We are back with a much more exciting update. We’ll take this in stages again, as I’ve done a few things since my last post. The model hasn’t necessarily become more accurate, but it has become much more valuable. The five areas of improvement were as follows: making our model easier to update, expanding our model to new leagues, expanding our dataset to include some of the teams’ transfer market data, engineering of new variables, and choosing the best model.

Easy Updates: This process was much easier than I thought it would be, as were many of the other improvements if I’m being honest. I pulled all of the code into one page and removed everything that was not relevant. Before, I had all of the data combined onto a single spreadsheet. To make the updates easier I combined all past data into one file, and have the file that updates with new data on a seperate file that I merge in R before beginning the cleaning process. Thankfully the idea executed as seamless as I had hoped, and a simple download and run provides me my upcoming games’ predictions.

New Leagues: Again, I was delighted at how much easier this was than I thought. There were definitely some hiccups, but some minor error correction (and reviewing of data sets) made the process very easy. The main problem was actually being caused by the next step, just because the team names weren’t lining up. Luckily it completely destroys our model, so it was very clear that something was wrong until it was all corrected. I now have the English Premier League, Serie A, La Liga, the Bundesliga, and am ready for the upcoming MLS season as well.

Transfer Market: This could likely be improved upon with some proper web scraping from the way back machine, and in an ideal world this information could be combined with injury/suspension information to get a sense of active player value and weigh the absence of key players. For now, it will be used as a way for our model to weigh in the value of a team, if it is a key factor. The variables in general were quite dissapointing and most were removed from the model. Those left in were average player value, and team value. I considered removing one of them as they were both still low weight factors, but the removal ended up causing significant decreases in accuracy.

New Variables: The rest of the new variables were created through data engineering. I think I will improve this as well in the future to create a sort of elo system or team strength measure, but I wanted to add some variables that showed the strength of the opponent as well as the teams’ recent defensive performances. I created average goals conceded per match for both home and away teams. I also added average shots per game and average shots on target per game, to let our trees factor in their attacking presence, as well as how well they convert opportunities. We may add more variables like this into our future model, and depending on how leagues handle yellow cards that could be incredibly valuable to add as well.

Best Model: This part of the process was mostly trial and error. My goal was to get an accurate enough model to have confidence in WITHOUT factoring in bookie odds. I am looking to beat the bookie so I want to avoid factoring in their data, even though it’s one of the strongest factors. Removing it came at a cost of accuracy, but after looking at different models we came up with above 50% accuracy for all leagues, with draws still being poorly predicted. We will not be taking a draw in future predictions unless the odds are well over a 20% differential. That being said, our predictive accuracy for wins hovers somewhere between 60% and 70% accuracy. After a great day today with Fulham taking home a +470 win I’m feeling very confident. That being said, our model is loving the underdog and that needs to be investigated. Check out my recent predictions posts where I discuss that a bit more.

I’ll soon update this post with a page for code, but for now check out the new picks history page that I’ve added! I’m looking forward to enjoying some matches this weekend while adding more variables. Let me know what you think I could improve upon.

Previous
Previous

RESULTS: English Premier League 12/26-12/27

Next
Next

Serie A Picks: 12/28-12/30