CRICKET: PREDICTING GLORIOUS UNCERTAINTIES

Remember the now-famous no-ball bowled by Jasprit Bumrah in the ICC Champions Trophy 2017 final? Pakistan’s Fakhar Zaman, then only on 3, was caught behind off Bumrah’s bowling, only for him to be judged not out because of the no-ball. Zaman went on to score 114.

In the wake of the recently concluded Asia Cup and in particular the two damp-squib contests where Pakistan were swatted aside by India without breaking a sweat, you may be wondering if the outcome of the Champions Trophy final would have painted an altogether different tale had Bumrah not overstepped in the fourth over of the Pakistan innings. You certainly wouldn’t be alone.

The quest for answers to questions like this gave birth to CricketML, (http://cricket.mit.edu) — a collaboration between researchers at the Massachusetts Institute of Technology (MIT) and Columbia University which aims to address a wide variety of questions in the game of cricket through modern statistical and machine learning lenses. The aim is to statistically capture cricketing “wisdom” and intuition. We begin by attempting to use “context” of a game to forecast the entire future of an innings and by providing an alternative to the Duckworth-Lewis-Stern (DLS) method. Led by researchers from Pakistan and India, whose teams remain bitter rivals on the field, we are proud that our collective love for the game has allowed us to merge our passion (cricket) with work (research).

Researchers at MIT and Columbia are using machine-learning algorithms to predict cricket scores. One of the lead researchers explains to Eos how they go about doing so and why they believe they have a better alternative to Duckworth-Lewis-Stern

But for now, back to Bumrah’s no-ball. In imagining that counterfactual and wondering what would have happened, we are acknowledging that the fall of a wicket, and in particular an early wicket, would have significantly altered the context for the remainder of the innings. The fall of an early wicket could have led to a few more wickets or slowed the pace of the innings altogether — a fact followers of the Pakistan cricket team are all too familiar with. As observers of the game will attest, context determines much of how a typical innings in cricket evolves. This context includes everything from the fall of wickets to relative strengths of the teams to the significance of the match-up itself, to the pitch and overhead conditions. Indeed, when we declare that a team is ascendant in a game, we are articulating our human interpretation of the context of the game. However, context is not static — it evolves as the innings progresses and influences our understanding of the game.

At CricketML, we attempt to understand the complex components of context of an innings in cricket in an automated manner. This allows us to better understand the game, quantify the value of certain passages of play and also make predictions about the future. It is our hope to understand, capture and articulate the typical behaviors in the game. Importantly, this excludes the unpredictable, often freakish, events that make it exhilarating to watch sports. Specifically, we do not attempt to and could not have predicted or make sense of any of Wasim Akram’s famous hat-tricks or Rohit Sharma’s double-centuries.

FORECASTING THE FUTURE

Score forecasting is an exercise all fans of the sport engage in. Take the Champions Trophy 2017 final, for instance. At the 30 over mark, Pakistan had made 179 runs for the loss of one wicket. Everyone watching the game tried to estimate the total Pakistan would put on the board. Our work extends this exercise by proposing a score forecasting algorithm which produces an estimate of the entire remainder of an innings. Mathematically, we show that the evolution of an innings, i.e. the pattern of scoring and loss of wickets, captures the entire context of a cricket innings and no other information is needed. Naturally, the more of an innings we get to observe, the better we capture the context. Next, we look across all past cricket innings, in the relevant format of the game, to find those that can combine to serve as a hypothetical version that closely resembles the innings under consideration. This is the motivation behind an idea known as synthetic control. Once we produce this hypothetical version of the current innings, we can then use it to produce a forecast for the remainder of the innings. For reference, our algorithm would have forecasted a final score of 336. Pakistan ended their innings on 338.

Does this mean that we now have a holy grail that can forecast everything that will happen in the future of a cricket innings? If so, is there a point in watching a game when we could simply stop half way and forecast the remainder? No. We certainly do not claim to have discovered the holy grail. In fact, our estimated forecasts are predicated on the assumption that nothing out of the ordinary will happen during the remainder of the innings. A Shahid Afridi blitz or Yuvraj Singh’s six sixes or an Imran Tahir hat-trick are inherently unpredictable events and comprise the beauty of the sport. Those are the sort of extraordinary feats that make cricket, like all sports, such compelling viewing. Algorithms cannot forecast such events.

At the 45 over mark, CricketML’s forecast of the remaining five overs would estimate the final score as 347. Pakistan scored less due to some excellent death-overs bowling by India

In the Champions Trophy 2017 final Pakistan were 294-4 in 45 overs and, by modern batting standards, most of us were expecting Pakistan to finish near 350 runs in 50 overs. Sure enough, our algorithm also updated its forecast to 347 runs. However, some excellent death-overs bowling by India restricted Pakistan to 338. Eventually, that slowdown did not cost Pakistan, but could Pakistan have come to rue that missed opportunity had Mohammad Amir not induced rare back-to-back mistakes from the bat of Virat Kohli? Had Pakistan failed to defend the total, analysts would have spent a good amount of time focusing on the relative slow-down in the final few overs of the Pakistan innings, a fact our algorithm alluded to well before the start of the Indian innings.

DECLARING WINNERS

While score forecasting can project how an innings is likely to progress from any point onward, deciding which team is in the ascendency at any given point during the second innings of a limited overs game is, arguably, of much greater interest. Put differently, if no more play is possible, can one declare a winner? The most common examples of such scenarios are games where weather-related interruptions lead to shortened innings. The ICC’s solution is the DLS method which is a statistical answer to “who is in the ascendency” at every point during the second innings. The DLS method was introduced as an attempt to decide outcomes of games in a fair manner. However, the use of the DLS method can often lead to murmurs of dissatisfaction among players and we often hear captains declaring their intent to bat second if rain is forecasted. Should a fair method cause such consternation among cricketers? Through our work, we show that there is indeed a bias introduced by the DLS method in the favor of the chasing team.

Further, we establish that the bias is not due to randomness — it is statistically significant. During a fifteen-year period between 2003-2017, teams batting second won 51 percent of the games where DLS was not needed. However, in the same period, 59 percent of the games that were decided by the DLS method were won by the team batting second — a statistically significant bias of over 8 percent over nearly 2000 games.

We approach the target resetting problem by taking the context in to account. In this case, the context is the typical path (runs and wickets) that the team needs to take to achieve their target exactly. We look across historical data to re-calibrate hundreds of innings such that they hit the desired target precisely. We then use a combination of those innings as the context for a typical successful chase of the target score. This provides a reference which can be used to compare the current innings to. At every point in the innings, if the team batting second has made equal or more runs for the loss of the same number of wickets as the reference trajectory, then that team is declared the winner. Otherwise, the team who batted first is declared the winner.

We let the context of actual past games determine how targets are typically chased down. This is done in an algorithmic manner where the significant increase in run-rates over the last few overs is automatically captured through the context that the algorithm is learning. Therefore, unlike the DLS method, which requires frequent updates to its parameters, our algorithm automatically adapts to the changing nature of the modern game without any human intervention. As a result, we notice that our algorithm produces revised target scores which are a little higher than those produced by the DLS method, leading to a reduction in bias in favor of the chasing team because they would have to make a few more runs than what the DLS method would recommend. We claim that our algorithm produces fairer outcomes compared to the DLS method. For instance, our method would have produced a higher target for South Africa than that suggested by the DLS method in the famous tie against Sri Lanka which ended up knocking South Africa out of their own World Cup in 2003. At the minimum, it would surely have prevented Mark Boucher from erroneously playing a dot ball on the last ball of the over thinking they had won the game!

The writer is a lecturer of Machine Learning at the Massachusetts Institute of Technology
He tweets @jehangiramjad

Published in Dawn, EOS, November 18th, 2018

Read more