Sunday, July 4, 2010

Pythagorean formula for soccer, the European one

As we have learned during this edition of World Cup, predicting the outcome of the single soccer match is hard. But, that's the excitement of once in a four year tournament.
In our regular, every day soccer, we have a little larger set of data to play with. The simplest method used in other popular sports, Bill James's Pythagorean formula, is not really accurate in soccer. There are attempts to tweak the formula by changing exponents as it was done for basketball, but swings of fortunes, in relatively short seasons are common and you have to deal with three possible outcomes on top of that.

So, let's tilt at the windmill again.

Few observations first. The main difference between soccer and pretty much any other sport is possibility of a draw. Since all win estimator formulas work on the principle of a clean cut winner, usual straightforward relation between runs/goals/points scored and the number of victories, falls short in the case of soccer. Usual way to compensate this is by counting points won, not victories.

In the Premier league or any other elite league with ,more or less, just dispersion of talent among teams, tie games deduct around 8 % of the maximum points per season. For EPL it's 20 teams, 380 games and 1140 points to distribute. Usually around 100 points are lost every year. If it's a win, then it's a three point game; if it's a draw, it's only two point game then. In a tie game both teams win and lose the same amount of points.
Let's start with simplified Pythagorean formula with exponent 1 and see how it plays out.

                                   Win % = ( goals scored / goals scored + goals allowed ) X 0,92

Calculation is for last year's champions Chelsea with 8 % deduction. For 2009/10 season it's exactly 8,42%

GF = 103

GA = 32

Win % = 0,699 ( or 69,9 % of possible points won )

Multiply Win % with maximum amount of points for a single team.

Win % X 114 ( for 20 club league; for Bundes league it's 102 )

In the end, we get 80 points for Chelsea. They won actually 86, but you would expect it for first place team to outperform expectations.




That is ,of course, 20/20 hindsight.

And now the future.

Calculations were done taking in account last three seasons ( weighted 60/30/10 ). Some numbers, yellowish ones, are averages or estimations for Championship or any other lower league. Results from the Championship are simply reduced by about 1/3 ( 2/5 to be precise, assumption is that Championship is 60 % the strength of the Premier League ). Same goes for other lower leagues. After we calculate PP ( predicted points ), we adjust them with three year average of over/under performance ( for example, in last three seasons Everton overachieved by 2 points and Fulam underachieved by 3 ). In the end we have adjusted points and predicted ranking of teams in accordance.
As things are today; with no knowledge of starting lineups, amount of money poured into clubs, injuries, bad/good shape of key players.... Here are predictions for the final standings of England Premier League for 2010/11 season.







I would be stunned if all this plays out as above, but some interesting questions are popping out. Some of the points allotted to bottom teams will end on the accounts of top teams, so disparity will probably be larger than shown. We'll see if 87 points will be enough for the title and 39 good enough not to get relegated. We'll see how fewer draws, if any, we'll get; since prediction is 1055 points aggregate for the season. That is, only 7,46 % of lost points or around 85 tie games. Three year average for Premier League is a little less than 100 per year.
Here are the others:





Blue corrective points for Freiburg and Mainz indicate that it is not average but last season achievement, since they played in second league previous years. Also, like in the above case of Blackpool, correction for promoted teams with no history in first league ( in this table, last three ) is 0.
So title goes to Bayern and the cut off for relegation will be 35 points. Dynamics of the season as predicted is pretty much in line with previous years, 75 to 80 draws.




For Calcio, things will remain mostly the same. Around 100 draws, get over 80 points and title is yours, don't get over 40 and it's Seria B for you.


And, last but not least:





Fewer tie games next year, almost 90 points for the title and above 40 points to avoid Liga Secunda.

And, that is it. This method shows good results in retrospect and now it's time for the test drive. There are visible flaws since we treat every league the same, but in many ways they are very alike. Each has one or two ultra dominant teams with few dark horses in wait. These are not exporter leagues; very few outstanding players play outside their country and if they do, they do so in some of the leagues mentioned ( Ballack, Luca Toni …). Since creation of UEFA Champions league in 1992, only three teams won that are not from one of these four leagues ( 3 out of 19 ), which speaks of superior quality of competition. And so on.
Now, all we have to do is wait for the summer of 2011 to see how horribly wrong these predictions were.

No comments:

Post a Comment