Saturday, October 30, 2010

Virtual Euro league

This is an informal combo table of European top 20 teams before this weekends round. I could have waited until Monday but Howard's Pythagorean calculations wouldn't fit then. Anyway, after a quarter of the season best team in Europe is Chelsea and they are underachieving by 3 points. Money makes the world go round indeed. Click to enlarge.

  


Explanation PPG points per game, Handicap – goal difference per game, PPG + - PPG adjusted for league strength according to UEFA rankings, H + - handicap with same adjustment, HP + - sum of adjusted PPG and Handicap, Δ - Howard's pythagorean estimation.

Rest of the order is more or less as would be expected, except for these guys.


 
Close to the bottom of the barrel as they should be, yet fighting tooth and nails.
Full standings for all top 5 leagues here.



Thursday, October 28, 2010

West Bromwich Albion Story

Away from the bright lights of London, childish bargaining in Manchester or the blind alleys of Liverpool, a true feel good story is developing.
It's about West Bromwich Albion.

Well, not exactly these guys. Younger ones.


When you lay a bet against the spread on a football or a basketball match you calculate in the handicaps the bookie gave you for each team. So the concept of handicapping should be familiar.  

Goal difference is a telling statistic. You can not, in a long run, fake your ability to score. Or to defend yourself from it.


When we calculate this ability as a goal margin per game ( handicap ) we get the numbers which show us that, for example, last season Chelsea was 1,87 goals favorite in each of their games. Manchester Utd was good for 1,57, Arsenal 1,11, Liverpool 0,68 and so on. Biggest underdog was Wigan with -1,11 goals per game. WBA 's number in this calculation is from their previous EPL season ( -0,82 ). That shouldn't be much off.

And for the sake of better understanding, remember these two facts.

First, the value of the victory in the EPL last season was 2,6 goals scored and 
Second, the best team has the easiest schedule for the season , by definition, since they don't have to play against themselves. 
In other words, it's not that Chelsea really is a 1,87 goals ( almost a win ) favorite from the start, no. Add opponents value in the equation too. Against Manchester Utd they would be 1,87 – 1,57 = 0,3 goals favorite. And when we translate that for the season ( sum of all handicaps for each team equals 0 ) their strength of schedule was 0 – 1,87 = - 1,87 goals. A complimentary win. As if they needed one.

And here is a table of what WBA has done so far.



What we see in first row is that Chelsea was so heavy favorite in the opener that that was almost an automatic win. They scored six for the good measure. True goal difference (True GD) reduces/increases the number of goals conceded for the amount of handicap. And it shows that, all other things being equal ( I'm omitting home field advantage, form, injuries...), Baggies conceded 3 “real” goals. That was their only negative performance so far.

So, what gives?

Sum of the handicaps so far (blue number) show that they were 12 goals underdogs and that they successfully parred with 10 . Expressed in wins, they are nearly 4 wins above expectations.
All of this gets it's true shape in comparison with the second table. These are the teams that had positive goal difference per game ratio last season and their true handicap/strength of schedule in first 9 weeks of this season.




30 (thirty) goals difference in handicap between WBA and Chelsea so far.

Last column shows us how well each team played so far.
Is Manchester Utd under performing or what? That's a literal half-ass job.
Chelsea is at 22 % above expectation, Arsenal at 12%. Liverpool is already lost for the season, I'm afraid. Aston Villa's percentage is a consequence of the Newcastle game blunder. They are not that bad.
And WBA is playing on the rocket fuel.
Is it sustainable? No, it's not. But when you over perform, you better do it in the thick of your schedule when it matters the most, and that is exactly what Di Matteo's squad is doing right now. With this stretch, they've probably secured their spot in Premiership for the next season.
Go Baggies!




  







  


Tuesday, October 26, 2010

Goalkeepers after 9

Well, it looks like the fog is lifting. Data are less and less skewed as time passes. Goalkeepers standings and defensive standings for teams after 9 weeks.



How about them Chelsea? Who cares. But WBA, Bolton and Wigan in top ten defenses and Liverpool, Manchester Utd and Aston Villa not. Cold winter is coming, cold indeed for some. I'll refrain from further comment until I look at the strength of schedule so far ( which will be the topic of my next post ) and some other things.

Short explanations: points are negative category, they represent opponents position on current form table X goals you conceded . Two goals from West Ham gets you 40 points, six pack from Chelsea gets you only 6 points. That red 22 for Aston Villa means that their defense has not reached Premier league level thus far since league has 20 teams and their opponent rank is 22. Not really fair but things will straighten themselves by the end of the season.

New feature is OPS+, which is borrowed from baseball. In essence 100 is average and bigger is better. This is development, upgrade of and replacement for my previous descriptive stat ReV and includes ,along with basic goals and saves, replacement level performance, opponents adjustments, park factors, time on the pitch … In short, lot of relevant things. I kept the name same as it is in baseball for the sake of the name recognition. Meaning of the abbreviation is of course different. Soccer has no bases or bats. So OPS+ stands for, I don't know, Overall Player's Score Adjusted ( that's for the + sign ). Yes, I'm reaching here but look at it as an homage to sabermetrics.


Sunday, October 10, 2010

Who won?

I saw this and tried something similar. These are last year salaries.

As always click to enlarge.

Wednesday, September 29, 2010

Losing it

Just to add my 2 cents, since this is related to something I've been working on.

For the last four seasons in England's Premier league, clean sheet for both teams or the most unpopular score in soccer 0–0, was the final outcome in 8,8% of the time .  

But fortunately for us spectators it's not an easy thing to achieve. In 91% of the cases someone scores.
To survive through the 90 minutes unscratched ( scores like 1-0, 2 0, 0 1, 0-2 ... ) your chances are 35,3% for home, or 24,6% for away team.

If your attacking capabilities averages to one goal per game ( result being your win or a tie: 1-0, 0-1, 1-1 ), your chances of losing that game are 69,6 % for home team, or 72% for away team.

If your forwards know how to put the ball behind goalies back at least 2 times per game, you will loose in only 1,38% of the cases for home or 2,7% for away team.

The ball is round and the grass is slippery, score in each half at least once and you are the champions my friend.

Wednesday, September 15, 2010

Goalkeepers week 4

Better late than never. Week four standings for EPL goalies, like in the previous post, plus standings for save rate per game. In the next few days team standings and some other stuff.







Tuesday, August 31, 2010

United European League

Following the same line of thought as presented in the previous post, I implemented the formula on each team's defense and offense. In the case of offense I just reversed the order of standings so that, for example, goal from Chelsea carries 1 point for that team's defense ( negative points ), but goal against Chelsea carries 20 points for offense since Chelsea equally well defends their goal as they tear up the net on opponent’s. As season proceeds I'll add other leagues. So far England and France, separate and combo. I wonder how the combo table will look at season's end?
Data from http://www.leagueday.com/. I've used form table for calculations, not regular table ( although they are same now, they will differ over time ).





Goalkeepers of England

It's time. After three weeks of Premier League soccer there is enough data to start with goalkeepers ranking as promised. According to formula, current ranking looks like this:


Friday, August 20, 2010

Premier and Primera 2009/10

Links for Google docs. Some stats are simplified, for practical reasons and are different than previously published but not by much. Correlations still hold. Everything is explained in the comments on top of the columns. Download and play. Previous seasons are coming, though further you go into the past, data becomes more and more doubtful.

Source, as always, ESPNsoccernet

EPL        and        Primera

Tuesday, August 17, 2010

Shearer and Wright

This time strikers. No special introduction needed.  






A little better team winning percentage and point rate for Wright because he played a few years on both ends of his career on lower competition level teams. Shearer, on the other hand, has advantage in career grade, win shares and relative value percentage ( average = 100 ). 
Wright had a late start of his career but his years as a Gunner were memorable. Shearer participated in Blackburn's historic run and greatness of Kevin Keegan's Newcastle United. These teams, along with Liverpool's youth movement ( McManaman, Fowler ) and post-Cantona-incident Manchester United were prolific, fast and extremely fun to watch.

Their careers according to age. Significant head start for Shearer.






Sunday, August 15, 2010

Zizu and Becks

Time to start with comparisons now that I have the tools. I've decided to start with Pele's or FIFA's list since it is as good starting point as any.

Each table will contain some obvious items: Goals, Appearances, Age and Team or teams for which player suited up that year.
Percentage of wins is one of two team statistics and refers to a percent of the games team won that season. Other team stat is Points rate or how much points, on average, team gained per game ( 0 – 3 ).
I'm using these two stats to describe the relative strength / weakness of a team for that particular season so that player's performance numbers wouldn't be out of context.

Individual stats are: Grade ( there's a heat chart for help ) tweaked a little to fit everyone equally, Win Shares ( number of points per season that player earned ) and RVP ( ReV ) or player's relative value for that season compared to his peers, in percents, where 100 is average performance.
Below the line are the career numbers or up to date achievements.
Let's roll.






And totally random soccer player only for the sake of establishing the base line:




Actually I was quite surprised to see that Becks is better than Zizu was. But blame it on the luck, late serious career start and early retirement. And don't forget that Zizu did it in the span of, really only ten years. 
Notice also the span in teams statistics for these three guys. Almost the same between them.
Teams for which Beckham played were roughly 10 % better at winning and 0,3 points better per game than teams Zidane played ( that Galaxy gig killed the numbers ). Again, roughly the same ratio applies to ZIdane and Gibbs.
Here is what their careers look charted.





Wednesday, August 11, 2010

Last step

Last step in finalizing our new statistic is merging offense and defense numbers in some meaningful way. Simplest way is to add PPG to D, adjusted with accuracy Ac. What we get is an aggregate number AG that represents individual player's input in team's defensive performance ( through saves and goals conceded ratio ), discipline and speed ( fouls committed ) mixed with offensive contribution ( goals and assists ), all calculated on per game basis.
Now, that was a mouthful... For a single player :






That composite number AG can be a standalone index. But it's not that intuitive and comparable, right?
Next step is to calculate D and AG numbers for the entire league, calculate their average or median, which ever works better ( median ), and finally calculate deviation for each player from that average/median and express the number as percentage.

I know I've lost you by now, but if you are still with me, I promise, no more math. What have we achieved with all this ?

Now we have both intuitive and comparable WYSIWYG stat. Something like OPS in baseball. In this case, the “sea level” is at 100, both for D% and ReV ( player's defense relative to league average and player's overall, both defensive and offensive value, relative to league average ).

 You can go scuba diving ( numbers can go below 100 for sub par achievements ) or you can go free climbing like the guy above. His defensive value was 48% better than league average last season and his overall value was three times higher then your average Placeholder Jones or Warm Body Smith. As simple as that.

Hidden benefit of this stat is that it is average of anything you want to put in the calculation. If you want to improve ReV by adding , for example passing stats, it can be done without disturbing the essence .Sea level would still be at 100 and all previous comparisons would still hold. Computations can be more precise, but the value ratio and validity of comparisons between players will remain the same.  
As soon as I finish beautification process, I'll post the spreadsheets for EPL.
And now, the grand finale!
ReV can be used for historic comparisons too. I've threw few hundred numbers into a cruncher to see how ReV relates to previous methods I've used for players evaluation ( Grade ).

Well, picture speaks a thousand words.


This is encouraging. Methods are interchangeable. This means that it's possible to compare players from different eras, with different sets of data used for their evaluation, with statistically significant level of correlation.

What remains now, is to enjoy the soccer by simply watching it.  


Tuesday, August 10, 2010

Attack,attack and then again attack some more

Offense is much simpler than defense. There are only two offensive categories that we really care about. Goals and assists. First, we double the points for scoring   i.e. 2:1 ratio for goals to assists, then we divide the sum of those two with the number of games played. That gives us number of points per game. That's it. Nothing else. That's offense.  



Where PPG stands for points per game, G stands for goals, A for assists, GS for games started, SB for substitution. Substitution games are divided by 2 because of Kevin Phillips. No really, if you are successful “pinch hitter” ( shooter/scorer in this case ) in the vein of Roger Milla or Ole Gunnar Solskjaer, you have to be rewarded a little.
This statistic can be used alone, but that would be a little dry. Let's consider something else. How accurate is player's kick? Does he waste a lot of opportunities or his balls have eyes? Does he bend like Beckham?
Call it the Accuracy percentage. The SH stands for shots in general, not to be confused with shots on goal.




To demonstrate; selected few from last season EPL :


  

Can you spot two defensive players in there?

Ac is written in American style percentage format. Drogba's 0,219 translates to 21,9 %  shooting precision. PPG is what it looks like. Drogba again; 2,159 translates in 1 goal or 2 assists per game for the 2009/10 season. Scary.

Data from ESPNsoccernet






Sunday, August 8, 2010

Defense

Before I begin I'll say a few words about the state of soccer statistics today.

Not long ago we didn't have any. Today we have it in amounts that are overwhelming. Just visit FIFA's World Cup page.
We are beginning to hoard more and more data. These are all valid data but fragmentation is great problem for grasping the big picture. There are some indexes that can help, like Castrol Index and alike but the problem with them is that they are not intuitive. For example, Frank Lampard has 776 index points for last season on the official EPL site. On Castrol he has 845 and is 4th in his league.
Meaning? 
What is the lowest possible number? What is the highest possible number? Can the values be negative...? Questions that can not be answered off-hand.
Higher is better but in comparison to what “natural” measure ( percentage, scale with upper and lower limit, some zero point...)? 
You get the picture. Not very intuitive. So, read the raw data instead, some might say...
Too much data can have the same effect. We still remain clueless. Example, until this World Cup we never had passing data in our statistic sheets, at least in Europe we didn't. Now?
As I said, FIFA's stat page. Dive in.

Enough rambling. My attempt with win shares and points is little more intuitive but emanate from a small set of data. For the historic purposes it's OK. For the future, why not use everything that we can? But since this enterprise of ours have for target audience a common fan, my aims for synthetic soccer statistics I'm developing are for them to be:

                                                                        Intuitive  and  Comparable


On the foundation of previous post about saves, I'll try to build a gauge that can capture the soccer defense.
If we add the number of saves to goals conceded during the season, we come up with the number of times the defense collapsed and allowed the opposing strikers to take their shots. Obviously, the lower number of collapses equals better defense. Until I come up with the better name, let's call it the average number of negative events per game. Catchy, right? Here is the table for last EPL season:



Or like this; strength of defenses in England’s Premier League for 2009/10 season



One other thing that indicate better defense is lower average number of fouls committed. Roughly speaking, in the case of EPL ,teams in lower part of the above graph tends to commit larger number of fouls, since their defenses are slower in reacting and positioning. There are exceptions of course, some teams have that style of play, some teams have better goalkeepers, other teams gave up trying, hence the lower number of fouls then expected. But to be on top, after the smoke clears, you need to be nice to your opponents. That's why they invented the fair play to begin with.  Here is the last season order:



Finally, combo of these two rates, in my opinion, can give somewhat accurate measure to individual defensive quality. I have excluded cards and penalty kicks since they are rare and dependent on circumstances.
On the other hand, over the span of a season we accumulate enough numbers for saves, goals and fouls to give us some level of certainty in trends that we observe.
And here is the formula; D stands for defense, FC/G for individual foul rate per game, N/G ( or NEG/G,  the program for writing formulas has an issue with NEG so I can not write it in the formula, but you'll understand it anyway ) team's sum of saves and goals conceded per game. Other mathematical operations are for cosmetic purposes only. Calculated like this, D number looks like a percent, goes upwards to indicate better performance and irons the wrinkles that occur in the cases of small sample size.



Math can be scary so let's visualize. Here are some gentlemen from EPL' s last season in no particular order ( median, on the bottom of the table, in this case works better than the average because of the big gap between the elite and average teams ); reds are below the league median:



Enough for today. That's defense. Next time offense. Data, as always, from ESPNsoccernet.

Saves and Goals

After a long summer break I'm back. Since the soccer seasons are about to start in Europe, it's time to get back to stats. For upcoming season I'll follow a few new stats that I came up with this summer. We'll start with goalkeepers. Along with the stats I've explained here ( and from now on it will be updated weekly, so the current forms of opponents will be taken in consideration; thank you Howard ),I'll monitor saves to goals ratio. It's simple enough statistic; just divide saves with goals conceded. And to give you the coordinates: higher the rate, the better. Here are the standings for EPL 2009/10; GS stands for games started, SV for saves and GC for goals conceded. Rate numbers in bold are starters numbers and they are colored, as you can guess, in traffic light fashion (data from ESPNSoccernet ):



Or like this, just starters:



Monday, July 5, 2010

Diego in Europe

Addendum to previous post. Maradona's numbers in European teams. Only regular league appearances, no cup games. On the left are Diego's statistics ( darker fields in win and draw columns are approximations ), on the right team's numbers.



Leo

That was only one game. Dont worry, you are GOOD.

Points are the mix of goals, wins, draws, red cards, shutouts and blowouts, as explained.
Win shares are number of team points directly produced by the player. 3 WS points = one win contributed. Percentages are the same thing but expressed as % of team's final points count.

Minimum grade is 0 points, theoretical maximum is 5. To get you a sense of perspective, Pele's best season was 4,39.

Anyway, Lionel Messi's career in Barcelona so far:



England's other goalkeepers

As explained here. You can find David James there too.










Hand of God and God himself

A few words.
Calculations were done as explained here. In short, goals, wins, draws,shutouts on the positive side; games lost without scoring a goal on negative. All that divided by number of games played.
Only regular national championship matches, no all star games, no cup games, no national squad games, no Puerto Rico 12:0 blowouts.. Only meaningful games played with balanced competition.

Pele's statistics are perfect. Every game, every goal, every squad. So, calculations for his career are as good as they can get. Spreadsheet can be found here. Though, keep in mind that soccer was a different game then. Substitutions were rare if even allowed, games were played all year round, Brazil didn't have unified league, so results are from the state of Sao Paolo league (Campeonato Paulista )...

Maradona's statistics are accurate for his Spain and Italian career, not so much for Argentinian part. But when in doubt, err on the side of the player. If I was biased in my approximations, I was biased in favor of Diego. Future exact calculations for the Argentinian part of his career ( when data will be available ) will not stray much from these. He was what he was. A wasted talent. So great, but so immature.

The graphics are straightforward: middle column represent age, years are colored in accordance with achievements for that year; squads in witch they played are next and calculated grade for that year on the outside columns.  


They both had same career span, started and finished them at same age, played same position and wore the same number.
Here they are:







Sunday, July 4, 2010

Pythagorean formula for soccer, the European one

As we have learned during this edition of World Cup, predicting the outcome of the single soccer match is hard. But, that's the excitement of once in a four year tournament.
In our regular, every day soccer, we have a little larger set of data to play with. The simplest method used in other popular sports, Bill James's Pythagorean formula, is not really accurate in soccer. There are attempts to tweak the formula by changing exponents as it was done for basketball, but swings of fortunes, in relatively short seasons are common and you have to deal with three possible outcomes on top of that.

So, let's tilt at the windmill again.

Few observations first. The main difference between soccer and pretty much any other sport is possibility of a draw. Since all win estimator formulas work on the principle of a clean cut winner, usual straightforward relation between runs/goals/points scored and the number of victories, falls short in the case of soccer. Usual way to compensate this is by counting points won, not victories.

In the Premier league or any other elite league with ,more or less, just dispersion of talent among teams, tie games deduct around 8 % of the maximum points per season. For EPL it's 20 teams, 380 games and 1140 points to distribute. Usually around 100 points are lost every year. If it's a win, then it's a three point game; if it's a draw, it's only two point game then. In a tie game both teams win and lose the same amount of points.
Let's start with simplified Pythagorean formula with exponent 1 and see how it plays out.

                                   Win % = ( goals scored / goals scored + goals allowed ) X 0,92

Calculation is for last year's champions Chelsea with 8 % deduction. For 2009/10 season it's exactly 8,42%

GF = 103

GA = 32

Win % = 0,699 ( or 69,9 % of possible points won )

Multiply Win % with maximum amount of points for a single team.

Win % X 114 ( for 20 club league; for Bundes league it's 102 )

In the end, we get 80 points for Chelsea. They won actually 86, but you would expect it for first place team to outperform expectations.




That is ,of course, 20/20 hindsight.

And now the future.

Calculations were done taking in account last three seasons ( weighted 60/30/10 ). Some numbers, yellowish ones, are averages or estimations for Championship or any other lower league. Results from the Championship are simply reduced by about 1/3 ( 2/5 to be precise, assumption is that Championship is 60 % the strength of the Premier League ). Same goes for other lower leagues. After we calculate PP ( predicted points ), we adjust them with three year average of over/under performance ( for example, in last three seasons Everton overachieved by 2 points and Fulam underachieved by 3 ). In the end we have adjusted points and predicted ranking of teams in accordance.
As things are today; with no knowledge of starting lineups, amount of money poured into clubs, injuries, bad/good shape of key players.... Here are predictions for the final standings of England Premier League for 2010/11 season.







I would be stunned if all this plays out as above, but some interesting questions are popping out. Some of the points allotted to bottom teams will end on the accounts of top teams, so disparity will probably be larger than shown. We'll see if 87 points will be enough for the title and 39 good enough not to get relegated. We'll see how fewer draws, if any, we'll get; since prediction is 1055 points aggregate for the season. That is, only 7,46 % of lost points or around 85 tie games. Three year average for Premier League is a little less than 100 per year.
Here are the others:





Blue corrective points for Freiburg and Mainz indicate that it is not average but last season achievement, since they played in second league previous years. Also, like in the above case of Blackpool, correction for promoted teams with no history in first league ( in this table, last three ) is 0.
So title goes to Bayern and the cut off for relegation will be 35 points. Dynamics of the season as predicted is pretty much in line with previous years, 75 to 80 draws.




For Calcio, things will remain mostly the same. Around 100 draws, get over 80 points and title is yours, don't get over 40 and it's Seria B for you.


And, last but not least:





Fewer tie games next year, almost 90 points for the title and above 40 points to avoid Liga Secunda.

And, that is it. This method shows good results in retrospect and now it's time for the test drive. There are visible flaws since we treat every league the same, but in many ways they are very alike. Each has one or two ultra dominant teams with few dark horses in wait. These are not exporter leagues; very few outstanding players play outside their country and if they do, they do so in some of the leagues mentioned ( Ballack, Luca Toni …). Since creation of UEFA Champions league in 1992, only three teams won that are not from one of these four leagues ( 3 out of 19 ), which speaks of superior quality of competition. And so on.
Now, all we have to do is wait for the summer of 2011 to see how horribly wrong these predictions were.

Monday, June 21, 2010

Die Angst des Tormanns beim Elfmeter or The Goalie's Anxiety at the Penalty Kick ( Fine, but bizarre movie )

Only he can touch the ball with his hands; he is the only player confined in limited space and he's protected by the offside rule. Only for him applies: the less contact with the ball, the better. The lion's share of he's value is in his team's quality and skill. For him defense must start on the opponents half . If the ball reaches his 10 yard perimeter it's usually to late.

Goalkeeper, the one with the ugliest jersey on the field.

I'll try to explain quick and easy method of rating goalkeepers seasons. It can be used in retrospect on some lustrous and pictorial characters like Jose Luis Chilavert and Rene Higuita, and even further in the past. Maybe Peter Shilton or Lev Yashin, if we'd had data.

What is the point of the game? To score a goal and not to allow one. If you somehow allow a few, isn't it easier to digest 2 or 3 from Manchester United then from Portsmouth? Yes, it is. There is no excuse for a triplet from Pompey.
The easiest way is to rate the pain relatively to opponents table position. So...


                                      Goal conceded  X  Team's position  =  Goal stress value



Fewer goals from bottom feeders equals better defense, better goalkeeper, better team and better final rank.
This method can also show why some guys are a treasure and some... well, not so much.




Games painted in green are victories, yellow ones are draws, reds are losses. Games that are not in color are the games played by substitute goalkeepers. We count only goals allowed, multiply the number of goals with the number of team's position ( X 3 for Arsenal's goal, X 20 Portsmouth's ) and in the bottom line we get the sum of "pain points". Divide the points with number of games. Finally, we get the average value for the goals. Their "weight". In this case, smaller is better.  Plainly, on average, to score a goal against Petr Cech you have to play like Liverpool ( 7,32 points = 7th place team ). To score a goal against David James, play like Bolton, you'll be fine.
Unfortunately for Portsmouth, substitutes played only 13 games last year.