In Bennett Miller’s Oscar nominated movie Moneyball (2011), Brad Pitt plays the real life character of Billy Beane, the general manager of the Oakland Athletics, a struggling team in Major League Baseball (MLB) in the United States.
Based on the book, “Moneyball: The Art of Winning an Unfair Game” by Michael Lewis, the movie tells the story of how Beane single-handedly changed the face of baseball using data analytics in his quest for becoming a successful team on a small budget, competing against teams with massive payrolls such as the New York Yankees.
The global sports industry today is worth $480 - $620 billion, according to a 2011 study of sports teams conducted by A.T. Kearney, a leading consulting firm. Ranging from food and memorabilia at stadiums, to media rights and licensing, clothing and equipment, and immense endorsement deals, sports have become a juggernaut in the global economy.
With that kind of money at stake, the competition between sports teams and the search for that added edge in individual sports has risen to heights never seen before. With the aid of science, athletes are getting bigger, stronger, faster and sharper, while teams and organisations are analysing every facet of each sport and collecting mountains of data that seem incomprehensible… until now.
With the emergence of data mining and the field of analytics, known as “Big Data”, the vast amounts of statistics that are collected for each player, team, game, and season are beginning to have new meaning that is beyond just a cumulative measure of an athlete’s or team’s performance. Data mining can be used by sports organisations for statistical analysis, pattern discovery, as well as outcome prediction, because patterns in data are often helpful in forecasting future events.
In a pivotal scene from Moneyball, the character Peter Brand (Jonah Hill) says to Beane, “Baseball thinking is medieval. It’s stuck in the Dark Ages. I have a more scientific view of the game.”
And science is what made all the difference.
Based on a philosophy first posited in the ’70s by Bill James, widely heralded as one of the foremost statistical analysis pioneers in baseball, Beane and his staff used statistics to draft players they could afford, while at the same time produce results that are competitive.
The answer was in the numbers. Beane decided to take advantage of the fact that players drafted from high school and college could be acquired for salaries that are a fraction of what veterans on the free-agent market would demand. Beane and his management team also concluded that traditional baseball statistics were misleading.
Michael Lewis, in his book, writes that Beane found that RBI (runs batted in) totals, for example, prized by most general managers, are also deceptive. In order to have high RBI stats, you need the players ahead of you to get on base.
The data found that the most influential statistics in showing a player’s ability to score runs were on-base and slugging percentages. The Athletics looked to build a team based on the ability to get people on base and then bring them in to score.
In 2002, this methodology led the Oakland Athletics to finishing first in the American League West with a record of 103 wins and 59 losses, despite losing three marquee players to larger market teams. They also set an American League record of winning 20 consecutive games.
The following year, Boston Red Sox owner John Henry hired Bill James as a consultant. That season, following his data-based approach, the Red Sox would win the World Series and become champions for the first time in 86 years.
“But statistics themselves can be very misleading,” writes Osama K. Solieman, in a book called, "Sports Data Mining". “Certain players are able to build impressive stats but have little effect on a game. On the other hand, there are players who make a significant impact on the game without having impressive statistics.”
In American football, he writes, “the defensive back that is more prone to taking risks to one who plays solid cover defence. The former may accumulate more interceptions than the latter – a statistic often used to indicate a defensive back’s value – but will also allow the opponent greater success when the gamble does not pay off.”
Jeff Van Gundy, former head coach of the National Basketball Association’s (NBA) Houston Rockets, called defensive rebounds off of missed free-throws “the biggest selfish glut of all time”, in an article by Sports Illustrated’s Chris Ballard about the inaccuracies of traditional statistics.
The player who helps block out the opposition from grabbing the basketball, Van Gundy argued, is just as important as the player who collects the rebound. Some players have become adept at using certain situations to boost their statistics; because higher statistics, even if misleading, will eventually result in bigger paydays. For this reason, professional teams hire internal statisticians to help make sense of all the information.
Then there is Formula 1. A sport that is technology-driven like no other. It produces torrents of data that need to be constantly analysed and interpreted, which can lead to split-second decisions that can change the outcome of a race. The engineers and technicians of race cars have grown to have a razor-sharp focus on extracting every last bit of performance out of a car’s design, impossible without comprehensive data.
Today, teams thoroughly test configurations in their cars, which involve collecting huge amounts of data and not just relying on the test drivers’ feedback. Marc Gallagher, the former head of engine-maker Cosworth’s Formula 1 Business Unit, with more than 25 years of experience in the sport, is no stranger to this.
Gallagher says that there are 200 different streams of data coming from sensors within the engine alone. “Engineers can actually monitor more than 450 data streams for real-time insights during a race,” he says. “To win in Formula 1, you need to have the right data at the right time.”