Even in hype-filled Silicon Valley, few buzz phrases are freighted with higher expectations than big data. Salespeople are knocking on the doors of Fortune 500 companies, promising to help them analyze a mounting flood of information from websites, smartphones, social networks and an increasing array of sensor-laden devices.
A brick-and-mortar retailer, for instance, might discover that a returning customer, based on her purchase history, social-media feed and location, is an expectant mother and ping her smartphone with a discount on diapers the moment she enters the store.
Underpinning the big-data craze is Hadoop, a software suite named for a toy elephant belonging to the son of a Yahoo programmer who helped develop the software in the mid-2000s. While traditional databases like those offered by Oracle Corp. store predefined information in rows and columns on individual servers, Hadoop can spread uncategorized data across a network of thousands of cheap computers, making it a less costly, more scalable way to catalog multiplying streams of input.
The software, distributed under an open-source license, is free to use, share and modify, and many vendors, from database stalwarts like Microsoft Corp. to analytics services like Splunk Corp., have embraced it to push big data beyond its Silicon Valley stronghold.
The market for big-data tools may be valued at $41.5 billion by 2018, International Data Corp. says. Investors have poured over $2 billion into businesses built on Hadoop, including Hortonworks Inc., which went public last week, its rivals Cloudera Inc. and MapR Technologies, and a growing list of tiny startups.
Yet companies that have tried to use Hadoop have met with frustration. Bank of New York Mellon used it to locate glitches in a trading system. It worked well enough on a small scale, but it slowed to a crawl when many employees tried to access it at once, and few of the company’s 13,000 information-technology workers had the expertise to troubleshoot it. David Gleason, the bank’s chief data officer at the time, said that while he was a proponent of Hadoop, “it wasn’t ready for prime time.”
“The dirty secret is that a significant majority of big-data projects aren’t producing any valuable, actionable results,” said Michael Walker, a partner at Rose Business Technologies, which helps enterprises build big-data systems. According to a recent report from the research firm Gartner Inc., “through 2017, 60% of big-data projects will fail to go beyond piloting and experimentation and will be abandoned.”
It turns out that faith in Hadoop has outpaced the technology’s ability to bring big data into the mainstream. Demand for Hadoop is on the rise, yet customers have found that a technology built to index the Web may not be sufficient for corporate big-data tasks, said Nick Heudecker, research director for information management at Gartner.
It can take a lot of work to combine data stored in legacy repositories with the data that’s stored in Hadoop. And while Hadoop can be much faster than traditional databases for some purposes, it often isn’t fast enough to respond to queries immediately or to work on incoming information in real time. Satisfying requirements for data security and governance also poses a challenge.
“Venture capitalists were sold on this idea that Hadoop was going to supplant traditional database technology in the enterprise,” Mr. Heudecker said. “But enterprises didn’t just jump on the bandwagon.”
Even as Hortonworks’ IPO boosts the technology’s profile, a new generation of tools is emerging to fill the gaps.
Hortonworks has suffered not only from immature technology but also from a firm commitment to base its business on free software. The company’s revenue comes mainly from providing tech support to companies experimenting with Hadoop.
In November, Hortonworks reported its revenue for the first nine months of 2014 was $33.4 million—far short of the $100 million that Chief Executive Rob Bearden had said in March he expected for the year. It racked up an $87 million loss in the period, nearly double its loss in the previous quarter and a number that “set the new high-water mark for the scale of operating losses public investors are willing to tolerate,” said Amplify Partners founder Sunil Dhaliwal.
Hortonworks priced its first batch of public stock 34% below what investors had paid in a private funding round in March. The move underscored some observers’ doubts about the prospects for a company based solely on Hadoop. But investors in last Friday’s IPO pushed Hortonworks’s capitalization to $1.1 billion, excluding stock awarded to employees.
“It’s hard to sell free stuff,” said John Schroeder, chief executive of rival MapR. Although many startups have sprung up to commercialize open-source software, only one public company in that line is widely regarded as successful: Red Hat, which distributes and supports the open-source Linux operating software. And Red Hat doesn’t look that successful compared with leading companies, from Amazon to VMWare, that augment open-source software with proprietary code, notes Peter Levine, a general partner at Andreessen Horowitz.
In an interview Friday, Hortonworks’s Mr. Bearden said the company’s IPO was “certainly validating that open source is an incredibly viable business model.”
Hortonworks’ rivals MapR and Cloudera offer proprietary accessories to Hadoop intended to make it more valuable to large companies. Cloudera, which pioneered the Hadoop market in 2008, has raised more than $1 billion at a valuation of about $4.1 billion. MapR, founded the following year, has raised $174 million. Both Mr. Schroeder and Cloudera CFO Jim Frankola acknowledged challenges in bringing Hadoop to corporate America. “We’ve learned what Hadoop is good at and what Hadoop is not good at,” Mr. Frankola said.
Meanwhile, enterprises are eager to forge into areas where Hadoop falls short, especially tasks that require processing incoming data in real time, such as using smartphone location data to offer just-in-time deals.
For corporate big-data projects, Hadoop may be only one arrow in an expanding quiver. Databricks, with $47 million in venture funding, commercializes Spark, which is open-source software that’s more adept than Hadoop at handling real-time data. Altiscale, with $42 million, offers Hadoop as a service delivered in the cloud. Splice Machine, which has raised $22 million, makes a tool that queries Hadoop as though it were a traditional database. Other tools, including the recent Google spinoff Metanautix, aim to supplant Hadoop entirely.
The Hadoop vendors are responding with improvements and additions. Hortonworks spearheaded an update that lets other applications run on top of Hadoop. Cloudera and MapR have extended the software with proprietary, enterprise-grade features like automatic backup, and MapR is building solutions tailored to specific industries, including financial services, health care and telecommunications. All three will contend with an increasingly chaotic, rapidly evolving marketplace.
“Right now, there’s a whole alphabet soup of technologies out there, which in many ways makes the market more confusing,” says T.M. Ravi, founder of The Hive, an incubator for big-data companies. “In the end, there may be room for one stand-alone company—if that.”