


ABSTRACT 
Batting performance measures containing strike rate adjustments take into account the important fact that if two batsmen had scored the same number of runs in a match, the one with the better strike rate had performed best. But match conditions can influence the batting and bowling performances of cricket players. On a good pitch a batsman can get a good score at a high strike rate, but if the pitch was bad, a similar good score is normally accompanied by a much lower strike rate. The main objective of this study is to propose a method that can be used to make batsmen’s scores comparable despite the fact that playing conditions might have been very different. The number of runs scored by a batsman is adjusted by comparing his strike rate with the overall strike rate of all the players in the specific match. These adjusted runs are then used in the most appropriate formula to calculate the average of the batsman. The method is illustrated by using the results of the Indian Premier League 2009 Twenty20 Series played during May and June 2009. The main conclusion is that the traditional average is not the most appropriate measure to compare batsmen’s performances after conclusion of a short series. 
Key words:
Batting average, Indian Premier League, ratings, sports

Key
Points
 It is unfair to compare the score of a batsman obtained on a good pitch under ideal batting conditions with that of a batsman who had to battle under severe conditions.
 By comparing a batsman’s strike rate with the overall strike rate of the players in the specific match, his score can be adjusted to get a better figure for his true performance.
 The results demonstrate clearly that the use of adjusted scores lead to rankings that differ from those based on the traditional measures.

During the past decade or two a large number of papers have been published on cricket performance measures and prediction methods. The majority of these papers concentrate on limited overs matches. Methods based on the utilization of remaining resources are found in the Duckworth/Lewis approach (Duckworth and Lewis, 2002) and its references, (Johnston et al., 1993), (Beaudoin and Swartz, 2003) and (De Silva et al., 2001). Optimal batting orders are discussed in (Swartz et al., 2006) and in (Norman and Clarke, 2010), batting strategies using dynamic programming in (Preston and Thomas, 2000) and (Johnston et al., 1993), the effect of winning the coin toss in (De Silva and Swartz, 1997). Prediction methods are found in (Cohen, 2002), (Gilfillan and Nobandla, 2000) and (Swartz et al., 2009). Graphical methods have been discussed in (Kimber, 1993), (Barr et al., 2008), (Bracewell and Ruggiero, 2009) and (van Staden, 2009). Batting performance measures rely heavily on the batting average. Various authors have defined measures which take the average and also the strike rate into account, e.g. (Croucher, 2000), (Barr and Kantor, 2004), (Basevi and Binoy, 2007) and (Barr et al., 2008). (Barr and van den Honert, 1998) defined a measure based on the average and a consistency measure. (Lemmer, 2004) went a step further by combining the strike rate with the former two. In (Lemmer, 2008a) it was shown that the batting average can be unsatisfactory in the case of a small number of scores if the player had a large proportion of not out scores. (Lemmer, 2008b) proposed an estimator for the average which behaves better than the batting average under such circumstances. Details are given in the next section. The present study focuses on the construction of a batting performance measure specifically designed for the case of a small number of scores per batsman. The results are of importance for ODI and Twenty20 series because most batsmen obtain only a small number of scores, many with a large proportion of not out scores, and it is fair that batsmen’s performances should be compared by means of the most appropriate measure.
Preliminary resultsIt is well known that the batting average, AVE, can be misleading (see Croucher, 2000, p. 96) if a batsman had a relatively high proportion of not out scores. In (Lemmer, 2008a) it was mentioned that in the 1999 World Cup Series Lance Klusener scored 281 runs in eight innings and was out only twice. His highest score was 52 but his average was 140.5! Nobody will reason that this was a good estimate of his next score or that his average in the next series will be on the same level. (Kimber and Hansford, 1993) used the product limit of the survivor function to find an estimator of the average, but (Lemmer, 2008a) showed that it behaves almost as badly as the batting average in the case of a high proportion of not out scores, especially if a small number of innings had been played. (Lemmer, 2008a) used a fundamental method, making proper provision for notout scores, to find an estimator for the average. It was found that the best estimator for the average is The simpler estimator generally gave values close to e_{6}  cf. (see Figure 2 in Lemmer, 2008a). From the latter study it follows that a good estimator should have a value in the band between e_{6} and e_{2}. In the case of a small number of scores containing very large notout scores, e_{2} can be unrealistically large and e6 unrealistically small. In (Lemmer, 2008b) it was proposed that should be used in such cases, and this is the estimator that is used in the present study. In a conference lecture (van Staden et al., 2010) compared various estimators specifically in the case of a small number of innings containing large notout scores. They considered the product limit estimator of (Danaher, 1989), a Bayesian type estimator of (Damodaran, 2006), an estimator based on exposuretorisk by (Maini and Narayanan, 2007) and their own variation on the latter. All of them have values smaller than e6 for all the batsmen who had notout scores. These estimators will not be considered in the present study. (Croucher, 2000) motivated why both AVE and the batting strike rate SR should be taken into account. Here AVE = R/out where R is the total number of runs scored and ‘out’ the number of times the batsman was out. SR = 100 R/B with B the total number of balls the batsman faced, i.e. SR = the number of runs scored for every 100 balls faced. (Croucher, 2000) defined the batting index BI = AVE x SR and used this to rank batsmen. (Basevi and Binoy, 2007) used an equivalent measure CALC = R^{2}/(out B). Hence CALC = (R/out) (R/B) = AVE (SR/100) = BI/100. (Lemmer, 2008b) concluded that CALC weights SR too highly. (Barr and Kantor, 2004) proposed a criterion AVE^{1α}SRα where 0 ≤ α ≤ 1 is a measure of balance between average and strike rate. The subjective choice of the value of α is not an attractive feature of this measure. See also (Barr et al., 2008). If α = 0.5 their weighting is the same as in CALC. None of these measures address the problem of the present study specifically. In the case of an ODI or Twenty20 series each player has only a small number of scores and this necessitates special consideration. (Lemmer, 2008b) proposed the following formula to compare the batting performances in the first Twenty20 World Cup Series: In a short series it is better to replace the average strike rate by the global strike rate of all the players in the series, i.e. GSR = 100 x total number of runs scored divided by the total number of balls bowled. The main reason is because every batsman who did not score any runs, has a zero strike rate, which can influence the average strike rate markedly. Denote the revised formula by This will be studied in the sequel. Other approaches of using strike rates follow.
Single match approachThe ability to score at a fast rate is very important in limited overs matches, firstly in oneday matches and especially in Twenty20 matches. The appropriate use of the strike rate is therefore very important. Batting conditions may differ substantially between matches due to factors like pitch and weather conditions. The assessment of a batsman’s performance should take these into account. It is therefore useful to introduce a strike rate adjustment into the scores of every match individually. This approach was introduced by the author in an unpublished paper presented at the 57^{th} Session of the International Statistical Institute in Durban, South Africa, during August 2009. The scores of the batsmen in the two ODI series between South Africa and Australia in January and April 2009 were used. Denote the batsman’s match strike rate by MSR = 100 x R/B where R denotes the number of runs scored by the batsman in the match and B the number of balls he faced. His adjusted score for the match is The importance of using GMSR is that batsmen’s scores are adjusted according to the specific match conditions, which may differ markedly between venues. In the second ODI between South Africa and Australia played in April 2009 in Centurion, GMSR was equal to 60.4 and in the fourth ODI played in Port Elizabeth GMSR was equal to 95.5. A batsman, who had a strike rate of 90 in the Centurion match, had a strike rate adjustment of 1.22 because he had batted much faster than the group. The same strike rate of 90 in the Port Elizabeth match would only give him a strike rate adjustment of 0.97. The T values are adjusted runs and therefore e26 can be calculated from them to obtain a performance measure, ET, for the case of more than one match. Thus
The effect of using match strike rates rather than series strike rates will be studied by comparing the values of TG and ET for each player who had batted in at least three matches and had an average larger than 20 in the IPL Twenty20 2009 Series. The data has been obtained from (Cricinfo, 2009) and the results are given in (Table 1), where po denotes the proportion of not out scores, D = (ET  TG) and CV a quantity to be defined later. Pandey had scores of 2*, 114*, 48 and 4 (* denotes a notout score) which gave him the large AVE = 84 compared to e_{26} = 65.5 which is, in the author’s opinion, much more realistic if his scores are taken into account. After four matches which contained 50% not out scores, his top ranking was mainly due to a single very good score. Other batsmen with large differences between AVE and e_{26} also have small values of n and large values of p_{o} : Agarkar (5; 0.60), Harris (5; 0.80), Powar (3; 0.67), van Wyk (5;0.40), etc. When a batsman starts batting, he has to be cautious in order to get acquainted with the conditions. The longer he stays in, the faster he can score runs. It is logical that in a Twenty20 match there will normally be a strong correlation between the number of runs scored and the strike rate. By using the full set of scores of the batsmen in the present study it was found that the correlation coefficient between R and MSR was r(R,MSR) = 0.57 which is highly significant. Even the relative strike rate RSR = MSR/GMSR is highly significantly correlated with R: r(R,RSR) = 0.51. A batsman who scores a large number of runs at a high rate, should be rewarded appropriately. This is exactly what the measure ET achieves. ET is therefore much more sensible than TG, which scales all scores up or down according to an overall series strike rate. The mechanism of ET can best be illustrated by looking at the scores of a specific player, say K. Sangakkara, in (Table 2). His best relative strike rate, in his fourth and twelfth matches, was 24% higher than the global strike rate of the group, so his best scores of 60 and 56 were scaled up by (1.24)^{0.5} to 66.8 and 62.4. In his first match his score was scaled down. His good scores were accompanied by good relative strike rates and his low scores by low relative strike rates. It is not surprising that the correlation between his scores and his relative strike rates was significantly high: r(R,RSR) = 0.83 with pvalue 0.00. Apparently, the exciting batsmen are those who have high strike rates when they score many runs. From the results of the study on the ODI scores between South Africa and Australia the author concluded that large differences between ET and TG occurred when the batsman’s strike rates varied substantially. In ET the GMSR values play an important role, so one has to work with the RSR values. If a player’s RSR values are very similar the strength of ET is inhibited and one can expect the difference D between ET and TG to be small. The strength of ET comes out when the RSR values vary in size, accentuating high RSR values by multiplying them with good scores leaving low RSR values for the low scores. To study the variation in RSR it is necessary to standardize by using the coefficient of variation, CV, of the RSR values, i.e. CV = (standard deviation of RSR)/(mean(RSR)). Based on the high correlation between R and RSR it can be expected that larger values of CV will be associated with large differences D. This is confirmed by the fact that for the data set r(D,CV) = 0.37 with p = 0.01, Ojha had both the largest D and the largest CV values because his highest scores of 68, 52 and 22 had the highest RSR values of 1.01, 1.42 and 2.26, resulting in r(R,RSR) = 0.56 with p = 0.15. McCullum had the second largest CV value, but a small D value mainly because his highest RSR values of 2.77 and 1.97 were multiplied by 9 and 6, resulting in a low correlation of r(R,RSR) = 0.16 with p = 0.61. The ranking of players according to TG differs from that according to ET, as can be seen in the last column of Table 1. JP Duminy ranked 6^{th} according to ET but 9^{th} according to TG. This is due to the fact that his good RSR values were accompanied by very good scores  r(R,RSR) = 0.65 with p = 0.02. Sangakkara ranked 18^{th} according to ET but 22^{nd} according to TG and he had r(R,RSR) = 0.83 with p = 0.00. Jayawardene ranked 14^{th} according to ET and 10^{th} according to TG. He had r(R,RSR) = 0.74 with p = 0.02. His RSR values were fairly modest, which explains why ET lagged behind TG in the respective rankings. In the ODI series between South Africa and Australia, Duminy ranked third according to ET, sixth according to TG and seventh according to AVE. An important question is whether D, the difference between ET and TG, will diminish as the number of innings increases. This is apparently not the case for a small or moderate number of scores, as reflected by r(n,D) = 0.09 with p = 0.54. After 12 matches JP Duminy had the second highest difference D = 3.58 and after 13 matches Sangakkara had the third highest difference D = 3.45. Thus even after a moderately large number of innings the motivation for ET remains valid because it weights each score according to the batsman’s relative strike rate in the specific match and this weighting does not become less important as the number of scores increases. The importance of taking strike rates into account in batting performance measures can clearly be seen by comparing the rankings according to ET and AVE. M. van Wyk ranked fourth according to ET but second according to AVE, whereas Symonds ranked eighth according to ET and 11^{th} according to AVE. Large differences in rankings occurred in the case of D. Smith (19^{th} according to ET, 27^{th} according to AVE) and Y. Pathan (31st according to ET, 46^{th} according to AVE).
In the assessment of batting performances it is imperative that batting conditions should be taken into account. A very useful way to do this is to calculate the global strike rates in the different matches. From Table 2 it is clear that there was great variation in the global strike rates, which varied in Sangakkara’s case from 81.3 to 147.5. In the whole series the smallest global strike rate was 81.1 (53^{rd} match) and the largest 160.6 (34^{th} match). The essence of this study can best be understood by comparing the number of runs scored with the adjusted number of runs, T, in each match (Table 2). By using the relative strike rates of the batsmen a much better assessment is made of their performances in the different matches. In a serious ranking of batsmen it is reasonable to require that each one should have batted in at least eight, say, matches in order to avoid the situation where one single good score could give the batsman a top position, as was the case with Pandey, who had played only in four matches. The use of ET requires additional computational effort compared to TG, but it has been shown that it makes a difference in the ranking of the players. It is therefore imperative that ET should be used for the assessment of player performances in any series of limited overs matches. In limited overs cricket a batsman should not only get a good score, but the faster he gets it the better. This is especially true if he scores much faster than the overall scoring rate of all the players in the match. The global strike rate takes match conditions (pitch, weather, etc.) into account. ET is based on the relative strike rate which compares the batsman’s strike rate with the global strike rate, and is thus a measure that takes all these factors into account. ET is undoubtedly superior to the traditional average and measures based on the latter. A possibility for further research is to check to what extent the exponent 0.5 in T should be modified for Twenty20 data. This will entail adjusting the formula of BP to be suitable for Twenty20 data. Up to now very few batsmen had a sufficient number of international Twenty20 scores to merit such a study.
Cricket players’ performances are influenced by many factors. When career performance measures are calculated, most of these factors tend to average out. In a short series this does not happen and therefore a method like the foregoing is very important in order to compare performances in a fair way. Researchers must persevere in the creation of more realistic measures and they must use them to convince cricket authorities that there are measures that are much better than the traditional ones.

AUTHOR BIOGRAPHY 

Hermanus H. Lemmer 
Employment: Emeritus (Research) Professor, Department of Statistics, University of Johannesburg, Auckland Park, South Africa. 
Degree: MSc (Mathematical Statistics), DSc 
Research interests: Cricket statistics 
Email: hoffiel@uj.ac.za 



REFERENCES 
Barr G.D.I., Holdsworth G.C., Kantor B.S. (2008) Evaluating performances at the 2007 cricket world cup. South African Statistical Journal 42, 125142.

Barr G.D.I., Kantor B.S. (2004) A criterion for comparing and selecting batsmen in limited overs cricket. Journal of the Operational Research Society 55, 12661274.

Barr G.D.I., van den Honert R. (1998) Evaluating batsmen’s scores in test cricket. South African Statistical Journal 32, 169183.

Basevi T., Binoy G. (2007) . http://contentrsa.cricinfo.com/columns/content/story/311962.html.

Beaudoin D., Swartz T.B. (2003) The best batsmen and bowlers in one day cricket. South African Statistical Journal 37, 203222.

Bracewell P.J., Ruggiero K. (2009) A parametric control chart for monitoring individual batting performances in cricket. Journal of Quantitative Analysis in Sports 5, 119.

Cohen G.L., Cohen G., Langtry T. (2002) Proceedings of the Sixth Australian Conference on Mathematics and Computers in Sport. Cricketing chances. NSW.. Sydney University of Technology.

Cricinfo (2009) Indian Premier League 2009 Results () . http://www.cricinfo.com/ipl2009/engine/series/374163.html

Croucher J.S., Cohen G., Langtry T. (2000) Proceedings of the Fifth Australian Conference on Mathematics and Computers in Sport. Player ratings in oneday cricket. NSW. Sydney University of Technology.

Damodaran U. (2006) Stochastic dominance and analysis of ODI batting performance: the Ondian cricket team, 19892005. Journal of Sports Science and Medicine 5, 503508.

Danaher P.J. (1989) Estimating a cricketer’s batting average using the product limit estimator. The New Zealand Statistician 24, 25.

De Silva M.B., Pond G.R., Swartz T.B. (2001) Estimation of the magnitude of victory in oneday cricket. Australian and New Zealand Journal of Statistics 43, 259268.

De Silva M.B., Swartz T.B. (1997) Winning the coin toss and the home team advantage in oneday international cricket matches. The New Zealand Statistician 32, 1622.

Duckworth F., Lewis T., Cohen G., Langtry T.. (2002) Proceedings of the Sixth Australian Conference on Mathematics and Computers in Sport. Review of the application of the Duckworth/Lewis method of target resetting in oneday cricket. NSW. Sydney University of Technology.

Gilfillan C., Nobandla N. (2000) Modelling the performance of the South African national cricket team. South African Journal for Research in Sport, Physical Education and Recreation 22, 97110.

Johnston M.I., Clarke S.R., Noble D.H. (1993) Assessing player performance in oneday cricket using dynamic programming. AsiaPacific Journal of Operational Research 10, 4555.

Kimber A.C. (1993) A graphical display for comparing bowlers in cricket. Teaching Statistics 15, 8486.

Kimber A.C., Hansford A.R. (1993) A statistical analysis of batting in cricket. Journal of the Royal Statistical Society, Series A 156, 443455.

Lemmer H.H. (2004) A measure for the batting performance of cricket players. South African Journal for Research in Sport, Physical Education and Recreation 26, 5564.

Lemmer H.H. (2008a) Measures of batting performance in a short series of cricket matches. South African Statistical Journal 42, 83105.

Lemmer H.H. (2008b) An analysis of players’ performances in the first cricket Twenty20 World Cup Series. South African Journal for Research in Sport, Physical Education and Recreation 30, 7177.

Maini S., Narayanan S. (2007) The flaw in batting averages. The Actuary May07, 3031.

Norman J.M., Clarke S.R. (2010) Optimal batting orders in cricket. Journal of the Operational Research Society 61, 980986.

Preston I., Thomas J. (2000) Batting strategy in limited overs cricket. The Statistician 49, 95106.

Swartz T.B., Gill P.S., Beaudoin D., de Silva B.M. (2006) Optimal batting orders in oneday cricket. Computers and Operations Research 33, 19391950.

Swartz T.B., Gill P.S., Muthukumarana S. (2009) Modelling and simulation of oneday cricket. Canadian Journal of Statistics 37, 143160.

van Staden P.J. (2009) Comparison of cricketers’ bowling and batting performance using graphical displays. Current Science 96, 764766.

van Staden P.J., Meiring A.T., Steyn J.A., FabrisRotelli I.N. (2010) Annual Congress of the South African Statistical Association, November 2010, Potchefstroom, South Africa. Meaningful batting averages in cricket. Unpublished conference paper.






