Why open source?

Kees van Bochove

12-08-2011 5 min read

When I explain to people what The Hyve does, I try to observe their reactions. This is fun, but it also tells me a lot about how they feel about our business model, which is important feedback to me. Often they smile and nod, until I get to the point where I explain that we accomplish our services by using and providing open source software. "But how do you then make money?" is a question I get very often. "Isn't that contradictory, that you give the products you develop away as open source, but at the same time you try to make money with them?" If I then go on and tell that most of our current projects are from EU or national government funded projects, or from generous clients like TNO Quality of Life who care about sharing and disseminating their research, they feel relieved. Their model of the world doesn't have to fall apart, business is still business and IP is still crucial in making successful a biotech startup.

However, frankly, that's not what I want to achieve in the end! Put very boldly, I would like to turn the bioinformatics business upside down. Many bioinformatics companies have become big by perfecting and maintaining their software, and generate a stable income by selling licenses for it. This is a perfectly valid business model, in which the money you earn from licenses allows you to maintain the products and keep adding new and exciting features to it. This worked in the nineties, and as of 2011, it still works. It is the safe choice for pharma and nutrition companies that are struggling to defend their R&D costs in bioinformatics and especially systems biology, which, as I learned yesterday, still only has poor to moderate contribution to actual drug development.

This isn't the path I want to tread, though. My prize example is a technology startup called Cloudera. Cloudera has a strong relation with an open source project called Apache Hadoop, recognizable by the elephant logo. Hadoop is a software solution which allows for storage of and computations on big data on commodity hardware, replicating data three times so that it doesn't matter when a few machines fail.

Interestingly, it is also the first and only available implementation of the MapReduce algorithm, which was published by Google and undoubtedly implemented in-house at Google first. But the first available production-grade implementation of this algorithm was (and is) an open source one! Cloudera contributes to Hadoop and even supplies a freely available distribution of Hadoop and a number of related open-source projects such as Pig and Hive. So how does Cloudera make money? They make money by implementing Hadoop solutions for their customers, some of them being really large companies. In this way, customers profit directly from the innovations in the open source project. Also, the money they invest in it is converted into improvements to the open source software, which Cloudera commits back into the open source product. The end result is a real collaboration between the involved companies on the level of software development, without them ever have to interact directly.

So what about trust? What if Cloudera someday decides to become evil and double all their service rates, or sell the product for an absurdly high price? The answer is clear - they will soon be out of business. Because of the open source nature of the product, every other company can decide to start to specialize in Hadoop and sell services around it. What this means, is that Cloudera has to stay innovative and continue to satisfy their clients with great service, or else they will be pushed out of business by better competition. This sounds like the ideal side of capitalism, and frankly I think it is. It's reality right now for Hadoop and Cloudera. I think bioinformatics can learn a tremendous amount from them. Isn't science all about collaboration and working together? Wouldn't an open source collaboration between industry and academia by far be the easiest and legally least demanding way to collaborate? Aren't for example the pharma companies showing they want to move to an open collaboration model, with initiatives like Pistoia Alliance and the IMI projects? I think it's time for bioinformatics software to open up. And with The Hyve, we hope to play our modest part in it.

Next: Why open source? (Part 2)