A review of our World Cup prediction competition
Just over a month ago, we announced a World Cup prediction competition, inviting entrants to write a probabilistic model to predict the World Cup in Brazil. Now that the dust has settled on one of the greatest football tournaments in recent memory, we’d like to share some of the things we’ve learned.
Having never run a coding competition before, we didn’t know what to expect, but we were thrilled at the enthusiasm with which people took to the challenge. Many of the solutions were highly original and challenged some of our assumptions of how to tackle a deeply complex problem.
It is generally agreed that the likely outcome of a football match can be modelled as a Poisson distribution of the average number of goals scored, weighted by the offensive and defensive strengths of the two teams. However, we were pleasantly surprised by some of the alternative solutions that were presented.
One entry built a weighted graph of all teams in the tournament, taking inspiration from Google’s PageRank algorithm, assigning various weights to matches of different prestige or importance. Another modelled relative team strengths as a function of player skill and prior results. And we were delighted by the number of entries that used the Monte Carlo method to simulate the tournament.
But overall, it was the simpler entries that gave the most reliable results. Although we assessed the entries on variety of factors, we did consider how closely each entry matched the market position before the tournament. In light of an unpredictable World Cup, it is almost tempting to disregard this as a factor, but producing prices that do not deviate far from the rest of the market is typically a very good litmus test. After all, predicting that Spain would go out in the group stages might have been deeply prophetic, but is unlikely to have any basis in statistical reality.
What we will do differently next time
Be clearer about the requirements
From a very early stage, we noticed that people were unsure about what we were asking them to do. It was difficult for us to quantify the judging criteria, because the numbers produced by the model were not the most important part. Even our own models would have struggled to correctly predict certain results!
We were keen to assess solutions on softer criteria: things like code quality, originality, choice of data. And from those who entered, we were certainly not disappointed! All the entries tackled the problem admirably, but we suspect that there are many more who did not enter, simply because they didn’t know what was being asked of them.
Perhaps it didn’t help that we had created a challenge that better suited data scientists and statisticians, but were keen to target software engineers. This dissonance meant that the competition possibly did not reach the heights that it otherwise could have.
Be careful about asking for code
There was some disquiet that we were asking for code. After all, what we do - modelling football matches - is lucrative and competitive, but also deeply secretive. We found these suggestions concerning, mostly because it had never even occurred to us that this is what people would think. We were interested in seeing the model - not the output of the model - but we can understand the scepticism this received. It was never our intention to profit from other people’s ideas, but we understand how this may have looked.
Spend more time on promotion
Knowing how, where and when to promote the competition - not only during launch, but also throughout - would have given it far greater exposure. We met with modest success on Hacker News and various other forums and mailing lists, but this is certainly something we’d like to give more careful consideration in the future.
When all is said and done, would we run another competition like this? Unequivocally, yes. In the last eight weeks we’ve learnt a lot about what it takes to run a successful coding competition. And even more than that, we’ve discovered that there is a genuine interest in football statistics in audiences far more diverse than we could have ever imagined.