Link: A useful big data story

In 2011 Friedberg decided to sell exclusively to farmers, and WeatherBill changed its name to The Climate Corporation. “We needed to feel a little less Silicon Valley and less whimsical,” said Friedberg. For the next few years he would spend half his time on the road, explaining himself to people whose first step was toward mistrust. “Farmers don’t believe anything,” he said. “There’s always been some bullshit product for farmers. And the people selling it are usually from out of town.”

He’d sit down in some barn or wood shop, pull out his iPad, and open up a map of whatever Corn Belt state he happened to be in. He’d let the farmer click on his field. Up popped the odds of various unpleasant weather events—a freeze, a drought, a hailstorm—and his crops’ sensitivity to them. He’d show the farmer how much money he would have made in each of the previous thirty years if he had bought weather insurance. Then David Friedberg, Silicon Valley kid, would teach the farmer about his own fields. He’d show the farmer exactly how much moisture the field contained at any given moment—above a certain level, the field would be damaged if worked on. He’d show him the rainfall and temperature every day—which you might think the farmer would know, but then the farmer might be managing twenty or thirty different fields, spread over several counties. He’d show the farmer the precise stage of growth of his crop, the best moments to fertilize, the optimum eight-day window to plant his seeds, and the ideal harvest date.

From The Fifth Risk.
Original source: A useful big data story

Link: Hadoop Needs To Be A Business, Not Just A Platform

Financial goop on Cloudera and HortonWorks merging:

The deal for the merger of the two companies is surprisingly simple. Shareholders in Hortonworks will get 1.305 shares in Cloudera and Cloudera will be the remaining company in fact, if not necessarily in name. This means that Cloudera shareholders will own 60 percent of the combined company and Hortonworks shareholders will own the remaining 40 percent. The combined companies had a fully diluted equity value of $5.2 billion before the merger was announced. At the time the deal was announced, the combined firms had more than $500 million in cash, no debt, and 2,500 customers who largely do not overlap. There are more than 120 customers who spend $1 million a year and another 800 customers who spend more than $100,000 a year for subscriptions and such.
Original source: Hadoop Needs To Be A Business, Not Just A Platform

Reactions to Cloudera’s IPO, prospects – Notebook

There’s lots of opinions on Cloudera’s IPO today. Here’s some that I’ve collected in my notebook.

Not valued high enough?

Despite the share-price being up 20% at close, some negative commentary focuses on their valuation dropping from Intel’s funding round, e.g., from Brenon at 451:

The chipmaker paid up for the privilege, putting a ‘quadra unicorn’ valuation of $4.1bn on Cloudera. Altogether, Cloudera raised more than $1bn from private market investors, making the $225m raised from public market investors seem almost like lunch money.

And then there’s the small matter of valuation. In its debut, Cloudera is only worth about half of what Intel thought it was worth when it made its bet.

https://twitter.com/alex/status/857992394595119104

The counter-point goes a little something like this (as pointed out by Derrick Harris):

“Much has been made of the huge valuation of that Intel-led round, but that’s all misguided noise,” according to IPO Candy, a website founded by Kris Tuttle, the director of research at Soundview Technology Group. “Intel didn’t make the investment for a financial return so the valuation isn’t relevant.”

Back in 2014, Intel was still smarting from missing the shift to mobile computing and Big Data was a favorite as the next big thing. The Santa Clara chip giant’s bet was placed chasing a strategic return, not so much banking a direct return on investment.

You know, all of this is a little bit of ¯_(ツ)_/¯. As I recall, Facebook’s IPO was all wiggly-woggly. If Cloudera makes a lot of money, gets bought for a lot of money, etc., no one will care to remember, just like with Facebook. Success is the best deodorant.

Their business, finances

Also from 451, earlier this month, a profile of their business:

Cloudera is nearly one-third bigger than Hortonworks, recording $261m in sales in its most recent fiscal year compared with $184m for Hortonworks. Both are growing at roughly 50%.

Since 2008, the company has grown steadily. As of January 31, it reports more than 1,000 customers. However, Cloudera is currently emphasizing and banking its success on what it calls the Global 8,000, which are the largest enterprises worldwide. The company notes that its number of Global 8,000 customers increased from 255 as of January 31, 2015, to 381 as of January 31, 2016, and 495 as of January 31. For the year ended January 31, the Global 8,000 represented 73% of Cloudera’s total revenue, while a further 10% of total sales came from the public sector. The company reports 1,470 fulltime employees as of January 31, a slight increase from its headcount of 1,140 the prior year.

More from Katie Roof at TechCrunch:

Cloudera’s market cap is now about $2.3 billion, significantly less than the $4.1 billion valuation Intel gave in 2014. This increasingly common phenomenon is now nicknamed a “down round IPO.”

In an interview with TechCrunch, CEO Tom Riley insisted that this was not a problem for the company because of the “growth prospects ahead of us.” If it performs well in the stock market, it could ultimately achieve the $4 billion-plus value. Square, which went public in 2015 at half its private market valuation, has since seen its share prices more than double.

(Side-note: comparisons of companies, Square and Cloudera, that have nothing to do with each other except being “tech” – and Square is payment processing, not “pure tech,” at that! – drive me a bit crazy, as listeners know.)

And a quick revenue/spend write-up from her:

Cloudera’s revenue is growing, totaling $261 million for the fiscal year that ended in January. The company brought in $166 million at the same time last year.

Losses were $186.32 million, down from $203 million in the same period the year before.

And, according to Jonathan Vanian: “Cloudera spent $203 million on sales and marketing in its latest fiscal year, up 26% from the previous year.”

TAM

I don’t really follow this space well enough anymore to quickly figure out the TAM: I suspect Cloudera operates in several data and BI related ones.

Cloudera isn’t only Hadoop, but 451 put the Hadoop market at $1.3b in 2016, growing to $4.4b in 2020, with a CAGR of 38.3% between 2015 & 2020.

If you throw data warehousing, BI, analytics, and an injection of the mega-databases TAM together, you get a really big TAM, anyhow. Keep in mind though that one of the traps of (definitionally orthodox) disruptors in this space is lowering the TAM of their respective markets, a la Red Hat in operating systems. I don’t get the sense that Cloudera is on that game plan, but others in the market might be.

Buyers’ plans & needs

With respect to what people would do with Cloudera and others in this space (including Pivotal), here’s a good ranking of the information infrastructure priorities Gartner recently found in enterprises:

info plans survey

Also of public/private cloud interest from the summary of that survey: “Based on survey responses, plans for on-premises deployments for production uses of data will drop from today’s 45% to 14% in 2018.”

Looking forward

People in the tech industry care a great deal about IPO’s like this. We’re all curious what The Market’s read on valuation of enterprise IT business models is for our own benefit, and just a general sense of the health of the sector. There’s also usually people you know at the company, so “yay” for people you know.

One day isn’t long enough to tell anything, though, cf., in a completely different space, that Facebook debut weirdness. People got all excited about Cisco buying AppDynamics because that seemed to show some “healthy” signs that money valued this kind of software/SaaS.

At any rate, people still seem to love the Big Data and such. From Cloudera’s CEO, Tom Reilly: “We’re competing with IBM and Watson, so our customers seeing the strength of our finances allows us to do more.” Think of all the free marketing!

And, Mike Olson (original CEO) adds:

The ensuing years have been remarkable. Our company has grown with the market. The original technology has morphed almost beyond recognition, adding real-time, SQL, streaming, machine learning capabilities and more. That’s driven adoption among some of the very biggest enterprises on the planet. They’re running a huge variety of applications, solving a wide variety of critical business problems.

Our early bet has proven correct: Data is changing the world. In applications like fraud detection and prevention, securing networks against cyberattacks and optimizing fleet performance in logistics and trucking, we’re delivering value. We’re helping to address big social challenges, improving patient outcomes in healthcare and helping law enforcement find and shut down human trafficking networks.

Against that background, an IPO takes on a more appropriate scale. We started Cloudera because we believe that data makes things that are impossible today, possible tomorrow. There’s more data coming, and there are plenty of impossible things to work on. Our journey is only well begun.

I admittedly don’t know Cloudera’s business model too well, but my sense is that they align well with the “have something to sell” model that many open source companies in the enterprise space forget to put in place.

Programming society with big data and small cash payments

The mathematical modeling of society is made possible, according to Pentland, by the innate tractability of human beings. We may think of ourselves as rational actors, in conscious control of our choices, but in reality most of what we do is reflexive. Our behavior is determined by our subliminal reactions to the influence of other people, particularly those in the various peer groups we belong to. “The power of social physics,” he writes, “comes from the fact that almost all of our day-to-day actions are habitual, based mostly on what we have learned from observing the behavior of others.” Once you map and measure all of a person’s social influences, you can develop a statistical model that predicts that person’s behavior, just as you can model the path a billiard ball will take after it strikes other balls.

Source: Big data and the limits of social engineering

Team work bringing down the average

The workers were told, essentially, that they were to be rewarded for collective achievement rather than individually. So instead of maximizing individual satisfaction, which often comes through competition with other people, employees considered their impact on colleagues. The theory, which plays out in the results, is that with relative rankings, top performers reduce their effort to avoid hurting their co-workers’ egos and to prevent schisms in the team.

That’s kind of sweet actually. One would also think that the incentives are disconnected from the thing you’re trying to fix: if you had to pay for all that fuel yourself, out of your take of the margin, would you be more efficient or less? That’s probably unreliable as well. Also: you’d think these tricks of fleet management would be long solves, e.g., all that lore about UPS and Fedex trucks. But, there’s probably tons of ongoing change and variability in all that.

Also: notice the Big Data angle, the technology that enabled the study.

Team work bringing down the average

NBC Universal turned to Spark to analyze all the content meta-data for its international content distribution. Metadata associated with the media clips is stored in an Oracle database and in broadcast automation playlists. Spark is used to query the Oracle database and distribute the metadata from the broadcast automation playlists into multiple large in-memory resilient distributed datasets (RDDs). One RDD stores Scala objects containing media IDs, time codes, schedule dates and times, channels for airing etc. It then creates multiple RDDs containing broadcast frequency counts by week, month, and year and uses Spark’s map/reduceByKey to generate the counts. The resulting data is bulk loaded into HBase where it is queried from a Java/Spring web application. The application converts the queried results into graphs illustrating media broadcast frequency counts by week, month, and year on an aggregate and a per channel basis.

NBC Universal runs Apache Spark in production in conjunction with Mesos, HBase and HDFS and uses Scala as the programming language. The rollout in production happened in Q1 2014 and was smooth.

Apache Spark Improves the Economics of Video Distribution at NBC Universal – Databricks

Shit’s bonkers out there. If I’d have proposed that to an architect “back in my day,” they’d have told me to go shot myself. They’d say: “uh, so, how about we just make a database table and ETL tool that does that?”

The last part – all those different things used – is amazing. Again, the architect would say: “we write things in Java. Try again.”

Granted, the point is: things like Spark and friends let you move beyond dealing with just tidy data and analitects. But, still, sloppy is as sloppy does, right?