Michael Lewis’ Moneyball thrust sabermetrics into the baseball mainstream. The story of Moneyball revolves around Oakland A’s manager Billy Beane, who used objective analysis to better manage his baseball team. Under his leadership, Oakland used statistics to understand which traits (like on-base percentage) and which players were undervalued by the market. Beane would buy these undervalued players for cheap, and watch the rest of baseball discover their value as they scored runs and made plays for Oakland. Then, when they became sought-after stars, he’d sell them to other teams for a profit (see Jason Giambi).
Cricket is perhaps a decade or two behind baseball in its use of objective analysis to better understand players, and better manage teams. I’d speculate that the big reason for that is the lack of money in cricket, until the IPL. No matter how good Stuart MacGill was, he couldn’t play Tests for any team but Australia. Now, if one IPL team doesn’t select a player, another team will — there is now a competitive market for talent. And the IPL is making this Moneyball scenario possible in cricket as well. Check out this quote from Amrit Mathur, COO of the Dehli Daredevils (article from Cricinfo):
Read more…
Happy New Year to you all! I’m happy to announce that Against the Spin has been voted the “Best New Cricket Blog” of 2009 over at World Cricket Watch — thanks for your support and votes. I’m flattered by this recognition, and hope to live up to that billing in 2010. Here are Against the Spin’s New Year’s Resolutions:
Read more…
Any cricket fan will tell you that a good-length ball outside the leg-stump will go for more runs than a good length ball just outside the off-stump. But how many more runs, on average? That’s where you need some real data to answer the question. So, in keeping with the theme of this blog, let’s try to quantify scoring & wicket-taking by area of the pitch.
First, let’s look at a bowler’s line and length. Obviously, this will differ based on whether the batsman and bowler are right or left handed, and whether the bowler is a pace bowler or a spinner. The handedness of the batsman is accounted for by making the pitch locations relative to the batsman’s off-stump. The pitch map below shows exactly where pace bowlers land their deliveries.

The totals in the margins indicate that 73% of balls are bowled in the channel outside off stump, and nearly a quarter of balls are bowled short of a good length. Now let’s look at how fruitful each of these delivery areas is. Shown on the pitch maps below are the strike rates of batsmen (runs/ball) for balls pitching in that area.

We can quantify the age-old cricketing wisdom that line & length are rewarded. Good length balls just outside the off-stump go for only about 7 runs per over, while fuller balls are punished at over 9 runs an over. Similarly, if the bowler strays onto the stumps, the easy on-side runs come at about 9 per over.
The data used to produce the above charts includes all Twenty20 internationals from June to November 2009, plus the 2009 Twenty20 Champions League. The balls include approximately 4800 balls bowled by fast and medium-fast bowlers. These pitch maps only include data on right-arm pace bowlers. Data on left-armers and spinners may appear in a future post.


First, apologies for the lack of posting over the past couple months. I’m in the midst of a busy semester of university, so I haven’t been able to dedicate much time to cricket analysis. But I wanted to share an interesting paper I came across, by James, Carré, and Haake. They build a Newtonian model of what happens when a cricket ball bounces on a pitch. It turns out that the “pace” of a pitch (ie, what percent of velocity a ball retains after bouncing) can be predicted fairly well by this model, which uses some physical pitch measurements as parameters.
You can find the abstract of this paper at: http://www.springerlink.com/content/g551516250368338/. Full-text can probably be found through an institutional subscription, if you are affiliated with such an institution.
With the Ashes heating up to a tense finale, Karl van der Merwe wondered on the Against the Spin facebook page how England would have fared if Australia had Glenn McGrath and Shane Warne. The short answer: not well. Australia have been a dominant side with the two bowling greats, and merely a quite competitive one without them.
I used Cricinfo’s Statsguru to take a look at how Australia have done in Ashes Tests since 1990, with and without this extraordinary pair. The arbitrary date restriction is to try to control for the quality of both teams, and still allow for there to be several matches without McGrath & Warne. It’s not perfect though; for example, an Aussie team containing McGrath and Warne was also more likely to contain Gilchrist, Ponting and Steve Waugh, making it a better team even beyond the presence of the two bowlers. There are too many complicating factors here for this to be considered anything of a definitive statement, but it’s still a fun fact for Aussie fans to rub in the faces of their English friends. Even with McGrath or Warne absent, Australia still win nearly half the Tests they play against England, and lose just about a quarter.
When both McGrath and Warne’s names were on the team sheet, Australia won a remarkable 76% of the Ashes Tests they played, and lost only 3 out of 25. With just one of the two in the lineup, Australia played 16 Ashes Tests, winning 8, and losing 5. And even with both of these greats gone, Australia won four matches out of ten, and lost just two. England have been more competitive when facing an Australian side devoid of McGrath and Warne, but have still struggled against their arch-nemesis.
I think one of the biggest barriers to widespread use of statistics in cricket to better understand team & player performance (in the way that sabermetrics is used in baseball) is the scarcity of freely-available data. I made an amateurish attempt at creating some structured data on my own, and posted it on the data section of this site, but it’s far froom perfect. That’s why I’m excited that Stephen Rushe has put together some data with an improved version of the yaml format I used. Its available at:
http://deeden.co.uk/misc/cricket/
Specifically, there’s better information about wickets that fell, more player names, and non-striker information. And if you have ideas or data sources of your own, do leave a note in the comments.
Conventional wisdom in cricket says that taking wickets helps slow the scoring rate. Not surprisingly, this is supported by actual data. As the keen fan is no doubt aware, cricket is very much a game of momentum. When a bowler is in full swing, taking wickets and bowling economically, it’s likely that the next balls and overs will be more of the same — wickets falling, and not many runs. Similarly, when a batsman hits the first three balls of the over to the fence, you can expect some carnage off the last three as well. Using Twenty20 data from the WorldTwenty20 and IPL available at http://data.againstthespin.com, I ran the numbers on some of these generalizations. All of the differences you see below are highly statistically significant.
As you’d expect, the scoring rate drops the ball after a wicket is taken. Usually there’s a new batsman trying to get his eye in, and deny the bowler the confidence boost that comes from taking two wickets in two balls. What you may be surprised at is that the effect is so pronounced. The scoring rate drops about 40%, from 7.0 runs per over to 5.0 runs per if a wicket fell the previous delivery.

Bowlers also turn on the pressure after a dot ball. Batsmen score only at 6.4 runs per over the delivery after a dot ball, compared to 7.2 runs per over if they managed to score off the previous delivery — that’s a reduction of 11% in the scoring rate. So it’s important for batsmen to keep the scoreboard ticking, and thus keep the momentum on their side. There’s a confounding factor here, though — the ability of the bowler. A bowler that bowls a dot ball is likely to be a good bowler, and thus already likely to have a lower economy without the effect of the dot ball. Controlling for the bowler’s economy would make this result more meaningful. Unfortunately I wasn’t able to do that for this analysis. I think the results are interesting nonetheless, and I suspect this difference would still be significant after controlling for the bowler’s economy. This same confounding factor (player’s ability) is present in all three graphs, but probably affects this one most.

It’s not only the bowlers who can pile on the pressure with dot balls and wickets. If a batsman hits one ball for a boundary, more runs are likely to be scored off the next ball as well — 18% more runs in fact. The scoring rate of 6.7 runs an over when the previous ball wasn’t a four or six jumps to 7.9 runs per over following a boundary.

It would be interesting to see if these effects, particularly the post-wicket scoring reduction, linger on for several balls after the event. But that’s a topic for a later post. If you liked this post, you can subscribe to future posts (2-3 per month) by email.
Commentators often talk about players planning out an over. A batsmen may want to see off the first couple deliveries, and a bowler may try to keep it tight off the last ball of the over. It is interesting to look at the actual data behind those strategies. Using ball-by-ball data from the Against the Spin data repository, we can look at run-scoring and wicket-taking across the six deliveries of each over.
This particular analysis used all matches from the 2007 and 2009 World Twenty20 tournaments, and the 2009 IPL. The patterns are interesting, and are largely preserved when looking at these tournaments individually. There is some evidence that batsmen may score less of the first ball of the over, possibly trying to be cautious against the new bowler. For their part, bowlers take some time to settle down, conceding a significantly higher number of wides & no balls of the first delivery of the over. Batsman are also more likely to take a single of the last ball of an over, presumably in an attempt to keep the strike (graph not shown). The error bars on the graphs represent two standard errors of the mean.
Batsmen are less likely to hit the first ball of a bowler’s over for a boundary than they are to hit subsequent balls to the fence. They also score less off the first ball; that graph is very similar, and hence omitted here.

The third ball of an over has a higher likelihood of taking a wicket, though this may just be a statistical fluke.

This graph is perhaps the most interesting. The average runs conceded in no balls & wides (these are the extras where the bowler is at fault) is significantly higher off the first ball of an over, compared to the middle of the over. This suggests that bowlers may be a little tight at the beginning of an over, resulting in overstepping, or a wild delivery.

Here’s a challenge to the reader: find another publicly available cricket data source that gives you the data to build the graphs above. I’m curious as to what’s out there, since I haven’t been able to find such a source myself.
Pakistan grabbed the World Cup and the headlines in the final on Sunday. But Sri Lanka fought remarkably well, first to post 138 after losing 4 wickets in the first 6 overs, and then to defend it remarkably well for the first 17 overs of the chase. In the end, Pakistan won by 8 wickets with plenty of balls to spare, but the match was much closer than that.
To get a better idea of how close the match was at various points in the chase, I came up with a measure (based on the Duckworth-Lewis system) of the batting team’s “momentum” during the 2nd innings of a chase. To be precise, I used the ratio of the proportion of the target remaining, to the proportion of resources remaining. This plot shows which team had the momentum during the 2nd innings of the ICC World Twenty20 Final. Pakistan are the batting team, and Sri Lanka are the bowling team. Scores over 100 indicate the bowling team has the advantage under the standard Duckworth-Lewis system, while the batting team has the advantage under 100. The distance from 100 indicates the size of the advantage. Two balls before Afridi hit that six off Udana’s bowling, Sri Lanka were actually ahead on Duckworth-Lewis! (Note: this is the standard edition of the D-L system, used in international matches until 2003; the Professional edition of the D-L formula is currently used, and that isn’t available to the public.)

I was reading the original 1998 paper of Duckworth and Lewis last week. They came up with a brilliant concept, and avoided so many of the pitfalls that plagued earlier systems. It’s somewhat unfair to consider it merely as a system to reset the target in case of rain. It’s really a better way of thinking about the whole game of limited-overs cricket, using the notion of resources. Indeed, the concepts can be applied to arguably better measures of player performance, as Lewis did here and here.
That said, if there’s a chink in its armor, I think its the fact that (under the original model) the team batting second is expected to consume resources in the same manner that the team batting first did. And the pattern of resource consumption is determined by data from thousands of first innings. I think I can sum up my thoughts in the following two sentences.
- The first team’s optimal strategy will be one that maximizes the expected value of their score.
- The second team’s optimal strategy while batting will instead be one that maximizes the probability their score exceeds that of the first team.
And can lead to fairly different strategies. This difference is because the margin of victory or defeat is (usually) irrelevant. Only the actual outcome matters. In particular, if a team is chasing a very high target, their best approach may be to hit out from the beginning and hope to get lucky, because trying to maximize their expected total will probably lead to a losing score. At least by hitting out, they give themselves a small chance at victory, even though they could face a huge defeat if the strategy backfires. If you read their 2004 paper, Duckworth and Lewis did indeed address the fact that the resource consumption pattern should be different if the team batting second is chasing a high target, and came up with an even better model to account for this scenario. I believe those revisions created the model that is used today – is that correct? I’m still working my way through that paper. Once I finish that paper, I’ll see how it addresses my concerns.