Match-winning data

In Bennett Miller’s Oscar nominated movie Moneyball (2011), Brad Pitt plays the real life character of Billy Beane, the general manager of the Oakland Athletics, a struggling team in Major League Baseball (MLB) in the United States.

Based on the book, “Moneyball: The Art of Winning an Unfair Game” by Michael Lewis, the movie tells the story of how Beane single-handedly changed the face of baseball using data analytics in his quest for becoming a successful team on a small budget, competing against teams with massive payrolls such as the New York Yankees.

The global sports industry today is worth $480 - $620 billion, according to a 2011 study of sports teams conducted by A.T. Kearney, a leading consulting firm. Ranging from food and memorabilia at stadiums, to media rights and licensing, clothing and equipment, and immense endorsement deals, sports have become a juggernaut in the global economy.

With that kind of money at stake, the competition between sports teams and the search for that added edge in individual sports has risen to heights never seen before. With the aid of science, athletes are getting bigger, stronger, faster and sharper, while teams and organisations are analysing every facet of each sport and collecting mountains of data that seem incomprehensible… until now.

With the emergence of data mining and the field of analytics, known as “Big Data”, the vast amounts of statistics that are collected for each player, team, game, and season are beginning to have new meaning that is beyond just a cumulative measure of an athlete’s or team’s performance. Data mining can be used by sports organisations for statistical analysis, pattern discovery, as well as outcome prediction, because patterns in data are often helpful in forecasting future events.

In a pivotal scene from Moneyball, the character Peter Brand (Jonah Hill) says to Beane, “Baseball thinking is medieval. It’s stuck in the Dark Ages. I have a more scientific view of the game.”

And science is what made all the difference.

Based on a philosophy first posited in the ’70s by Bill James, widely heralded as one of the foremost statistical analysis pioneers in baseball, Beane and his staff used statistics to draft players they could afford, while at the same time produce results that are competitive.

The answer was in the numbers. Beane decided to take advantage of the fact that players drafted from high school and college could be acquired for salaries that are a fraction of what veterans on the free-agent market would demand. Beane and his management team also concluded that traditional baseball statistics were misleading.

Michael Lewis, in his book, writes that Beane found that RBI (runs batted in) totals, for example, prized by most general managers, are also deceptive. In order to have high RBI stats, you need the players ahead of you to get on base.

The data found that the most influential statistics in showing a player’s ability to score runs were on-base and slugging percentages. The Athletics looked to build a team based on the ability to get people on base and then bring them in to score.

In 2002, this methodology led the Oakland Athletics to finishing first in the American League West with a record of 103 wins and 59 losses, despite losing three marquee players to larger market teams. They also set an American League record of winning 20 consecutive games.

The following year, Boston Red Sox owner John Henry hired Bill James as a consultant. That season, following his data-based approach, the Red Sox would win the World Series and become champions for the first time in 86 years.

“But statistics themselves can be very misleading,” writes Osama K. Solieman, in a book called, "Sports Data Mining". “Certain players are able to build impressive stats but have little effect on a game. On the other hand, there are players who make a significant impact on the game without having impressive statistics.”

In American football, he writes, “the defensive back that is more prone to taking risks to one who plays solid cover defence. The former may accumulate more interceptions than the latter – a statistic often used to indicate a defensive back’s value – but will also allow the opponent greater success when the gamble does not pay off.”

Jeff Van Gundy, former head coach of the National Basketball Association’s (NBA) Houston Rockets, called defensive rebounds off of missed free-throws “the biggest selfish glut of all time”, in an article by Sports Illustrated’s Chris Ballard about the inaccuracies of traditional statistics.

The player who helps block out the opposition from grabbing the basketball, Van Gundy argued, is just as important as the player who collects the rebound. Some players have become adept at using certain situations to boost their statistics; because higher statistics, even if misleading, will eventually result in bigger paydays. For this reason, professional teams hire internal statisticians to help make sense of all the information.

Then there is Formula 1. A sport that is technology-driven like no other. It produces torrents of data that need to be constantly analysed and interpreted, which can lead to split-second decisions that can change the outcome of a race. The engineers and technicians of race cars have grown to have a razor-sharp focus on extracting every last bit of performance out of a car’s design, impossible without comprehensive data.

Today, teams thoroughly test configurations in their cars, which involve collecting huge amounts of data and not just relying on the test drivers’ feedback. Marc Gallagher, the former head of engine-maker Cosworth’s Formula 1 Business Unit, with more than 25 years of experience in the sport, is no stranger to this.

Gallagher says that there are 200 different streams of data coming from sensors within the engine alone. “Engineers can actually monitor more than 450 data streams for real-time insights during a race,” he says. “To win in Formula 1, you need to have the right data at the right time.”

Racing team McLaren’s cars, for example, send exorbitant amounts of data back to the pit teams during a race, where the information is analysed in real-time using data compression technology. As a result, the team can act on the incoming data in time to make adjustments that could be the difference between winning and losing.

While F1 may be the ultimate user of data, it can be applied to all sports, including Pakistan’s national past time – Cricket.

Traditionally, captains and coaches of cricket teams have taken decisions based on gut feeling, which by definition is unscientific and irrational. But that may be about to change.

Mu Sigma, a Bangalore-based analytics company in India, is developing real-time cricket analysis software which can analyse various structured and unstructured cricket data using mathematical and statistical modelling, and present real-time suggestions.

Imagine a situation in which Pakistan is playing arch-rival India in a nail-biting final, and the latter, batting second, requires just four runs from the last over. The dilemma before the captain is manifold.

Who should bowl? A spinner or a fast bower? And if a spin bowler, a leg-spinner or someone who bowls off-break? If a fast bowler, then which one? Who’s got the best bowling record against that particular batsman? What are that particular batsman’s tendencies?

According to Mu Sigma, the solution can rip through historical data to pick-up the closest possible situation, inferring with the current situation to predict a real-time result. Beyond that, it can help individual cricketers analyse their opponents by figuring out their strengths and weaknesses.

But analytics aren’t just being used to improve player or team performance.

The proliferation of smartphones, coupled with the explosion in social media, has fuelled sports fans’ thirst for information as they clamour to consume and share the latest news, facts and insights, wherever they are.

Recently, IBM has been working closely with the Wimbledon lawn tennis championships; they’ve been providing players, coaches and spectators with everything from live scores to serve speeds, unforced error stats and match insights. They have also developed mobile apps that provide real-time information such as queuing times and available seats.

Thus, not only is data and analytics helping athletes and sports organisations improve their performance and take sports to new levels, it’s giving spectators a chance to be keenly involved in the experience like never before.

The scary part of this field of analytics is that it is still in its infancy. The possibilities of its usage in sports with its neatly defined parameters, has been a perfect place to realise its practical potential. And time will reveal the endless possibilities it holds for the future.

Read more