Offering Big Money for Crunching Data
Why would anyone start a company that asks strangers to compete in data crunching? And how could this idea possibly be profitable?
The answer, according to the 28-year-old founder of Kaggle—Anthony Goldbloom, lies in something called the Roger Bannister effect. The company, based in San Francisco, has a data analysis site that run predictive modeling competitions. In 1954, Bannister, as you may remember, was the first person to run a mile in less than four minutes. After he broke that barrier, it was only weeks later that his record was matched by Australian John Landy.
“Competition ends up making people better than they otherwise were,’’ Goldbloom says.
The Goal of the Game
The goal of Kaggle's sponsored competitions is to solve complex, real-world problems for companies. It contracts with organizations that need data modeling or algorithms and invites data scientists (including PhDs from computer science, statistics, econometrics, math and physics) to develop and refine analytical techniques to solve those problems.
"Before the site came along, companies hired consultants to develop predictive models, but with no guarantee that the results would work," Goldbloom says.
According to Goldbloom, the competitions are intense and the results have far exceeded his expectations.
"In every single competition—and we’ve run 80 of them—we have way outperformed the best research any company or academic researcher has done,’’ he says. “People don’t have preconceptions of how [a problem] ought to be solved.’’
Kaggle tests the algorithms in the background and features a live leader board, showing which problem solver is the most accurate. Kaggle's revenue comes from charging a setup fee and a portion of the prize money, Goldbloom says. In return, the competition host gets the intellectual property behind the winning entry.
Big Data Down Under
Goldbloom started Kaggle in Australia after working as a research economist at the Reserve Bank of Australia and the Australian Treasury. He was restless with his work there and won an internship at The Economist magazine, where he became obsessed with researching so-called “big data.’’ The term refers to the information compiled by big corporations such as Wal-Mart or big organizations, such as the Obama campaign. Having data isn’t the same as using it, and often companies and campaigns need to hire brainpower to analyze the numbers they’ve got.
Goldbloom realized that he could start a company to invite hungry geeks to work on real-world data in exchange for the host company offering a cash prize. Not long after the company's launch, he moved Kaggle to California, where it currently operates with 13 employees.
Goldbloom says he hopes to run 100 competitions this year, including private ones, which usually involve sensitive data sets that can’t be released to the public.
Playing the Numbers
The site has been used by both small and large businesses. For example, tech company Jetpac put up $5,000 in prize money, asking for an algorithm for its iPad app, which collects the best travel photos from your Facebook friends and assembles them into an album. And Heritage Provider Network, a California-based managed healthcare firm, asked for an algorithm to predict how many days a patient will spend in the hospital in the next year. With that known, providers might develop new care plans to help people before they need emergency care. The top prize is for this one is a whopping $3 million.
Goldbloom’s favorite contest, he says, is being run for the William and Flora Hewlett Foundation. The nonprofit wants to find a commercially viable algorithm that will automatically grade student essays. The foundation says it hopes to find a better way to assess student performance than multiple-choice tests, and award $60,000 to the winner.
"When they came to us, I didn’t think we should run it. I didn’t think it was possible,’’ Goldbloom says. “I thought it would be hard to match the performance of a human being. I’ve been very surprised.’’
Photo credit: Thinkstock