A Real Use Case for Blockchains: A Global Data Commons

This week I am continuing our series of excerpts from our Convergence Ecosystem vision paper, which can be read here. The previous instalments can be found here:

Today, I want to talk about a use case that only blockchain technology can deliver. They are few and far between and usually revolve around applications that demand censorship-resistance. Indeed, a public data commons demands just that.

The Convergence Ecosystem will lead to a global data commons. It’s not inevitable, but blockchains and distributed ledgers are disruptive technologies that change the structure of the data value chain. Yes, I know that word is overused, but in this case it is true. The point of value capture in the value chain will change. Instead of web companies capturing value and profit by controlling data, data could be stored on decentralised data file systems and blockchains making it accessible to all, not just a select few platforms that collected it.

In the Convergence Ecosystem, a few different technologies are all interconnected forming an authentication, validation and security layer: blockchains and distributed ledgers, decentralised consensus protocols, self-sovereign identity & reputation, and decentralised storage and data integrity. We believe the development in these four areas are contributing to the development of a data commons. These decentralised technologies are critical to the creation of a true global public utility. A public utility must be citizen owned and not be able to controlled by a single entity either Government or corporation. The transparency, usage provenance, tamper-evidence and censorship resistance features of blockchain technology are perfect for a global public utility.

Public blockchains are the ideal foundation for a data commons

Public blockchains are in many ways worse than existing databases. They are slower, have less storage, use more energy, and are less private. Sure, sharding, proof-of-stake or other proof-of-x, and privacy-protecting tools like multi-party computation and zk-snarks are attempting to address some of these issues. But the key thing to remember is that the original Bitcoin blockchain was designed specifically as a peer-to-peer digital cash systems; it is perfectly designed for that use case. The design choices were made to improve one feature: censorship resistance. Public blockchains aren’t owned or managed by one Government or company that can choose who views or uses it. This is what crypto people mean when they say blockchains cut out the middleman (although the so-called middleman will almost certainly integrate at another point in the value chain so it’s more appropriate to say blockchains will change where the middlemen make their money). Governments have traditionally had the ability to censor information and communication; but today Silicon Valley tech monopolies do on a global scale. Twitter, Facebook and Google have all come under fire recently because of their decisions to limit freedom of speech. If you control a network you pick and choose who uses it. This is too much power for a single entity.

We now have the tools to ensure no single entity controls data. With all communications, money, and health becoming digital; data infrastructure will be too valuable to be controlled by one nation or company. In fact, for individuals and society more broadly, global data infrastructure, just like the Internet, should be a public good. Never has so much data been available for collection and analysis. And everyone wants it. As sensors are embedded in everyday objects, and we move to a world of ubiquitous computing, everybody is fighting for who ‘owns’ the data. This is yesterday’s war. Public blockchains offer an open-source, decentralised, shared database that anyone can view and interact with based on programmable rules.

We are seeing the emergence of this new data infrastructure. We aren’t there yet, we still need to process more transactions at faster speeds and use less energy in doing so. Data needs to be private, stored in an accessible way, and shared across different blockchain flavours. We also need a way for individuals, organisations and machines to buy and sell data in a marketplace. The storage and access to data is important, but it will be the data marketplaces that finally provide a business model for data creators. There will finally be a way for people and machines to make the most of the data they collect. A marketplace provides an economic incentive for the more efficient allocation of data. Individuals can sell it instead of giving it away for free; organisations can monetize it instead of letting it sit idle on databases; and machines can buy and sell data automatically to increase their utility. In my view, a peer-to-peer marketplace for data is the second most important idea to come from the blockchain industry after peer-to-peer electronic cash.

A data commons give control back to users and limits monopoly control of the most valuable resource in the digital economy

2018 will see the beginnings of this global data sharing and monetisation network. Data creators will begin to earn money from uploads, likes and retweets. This is a far more profound change than it may seem. Disruption has typically come from startups offering seemingly inferior products that serve a niche which is underserved by the incumbent. Blockchain-based networks won’t just disrupt particular companies; they go much further, they disrupt a digital norm: the existing assumption that we should be giving away personal data for free. Digital monopolies including Facebook, Google and Amazon, get data from users for free. Every like, search and purchase feeds the learning system to further improve the algorithms; in turn bringing more customers and engagement. In value chain terms, data is supply, and AI algorithms are demand. Digital monopolies are searching everywhere for more and more data to feed their algorithms. Facebook buying WhatsApp and Instagram. Google with self-driving cars and Google Home. And Amazon with Alexa Echos and Dots.

Blockchains and decentralised public infrastructure change the game. Blockchains reduce the value of hoards of private data. It makes proprietary datasets much less valuable because as more and more machines, individuals and organisations use a public data infrastructure, a global data commons becomes more attractive to data sellers. As this data commons grows with more datasets, it will attract more data buyers, creating powerful network effects. In other words, data becomes more of a commodity; and it is no longer the source of value in and of itself. Firms that control supply — data — no longer dominate markets. The point of value capture in the value chain will change from data to brand and trust.

As data becomes less valuable, the customer relationship becomes ever more important. Startups and incumbents alike will compete for customers’ data based on trust. The global data commons will mean individuals will choose where their data is sold or rented. This global data commons will at first attract individuals that care about privacy and self-sovereign data. Machines will soon follow as machine operators and owners look for new revenue streams. Some organisations, especially the public sector, will be attracted by the non-corporate controlled nature of the decentralised infrastructure as well as the cost and liability reductions in not storing consumer data. Smaller organisations and startups will sign-up to access standardised data that would otherwise take too long or cost too much to acquire. Today, data is siloed with no business model for creators to monetise it. Blockchain technology and other decentralised infrastructure are emerging as a new data infrastructure to support machines, individuals and organisations to get paid for the data they generate. Blockchain-based data infrastructure, including data exchanges, will commoditise data and help realise the vision of a data commons and the first real global public utility.

The Only Thing That Matters in Machine Learning is…

The hot trend in machine learning is giving away stuff for free. Tech companies have always been advocates of the open-source community and are happy to release parts of their code as open-source. Over the last year, however, the big players in machine learning have given away complete codebases. Google made its TensorFlow open source and Facebook gave away its optimised deep learning modules for Torch, another open-source library. Then, Microsoft released its Distributed Machine Learning Toolkit (DMTK) for free and, not to be outdone, IBM open-sourced its SystemML platform.

These developments have explicitly confirmed what observers already know; tech companies no longer see software and algorithms as valuable assets to keep proprietary. The most valuable asset, today, is data. The second most valuable asset is the talent to use this data.

2015, the year of open source

Facebook — Deep learning modules for Torch

In January, Facebook was the first to open-source its machine learning code. Facebook’s artificial intelligence (AI) efforts are run out of its AI research lab known as FAIR. In the lab, Facebook uses Torch, an open-source developer toolkit for machine learning tasks. Torch is used by numerous companies including Twitter, NVidia, AMD, and Intel. Torch has been best applied to deep learning and convolutional neural nets, which have been successful in understanding images and video. Earlier this year, Facebook made its optimised deep learning modules open-source. These modules are significantly faster than the default modules in Torch and allow developers to train larger neural nets in less time.

IBM — SystemML

In June, IBM — a company synonymous with AI with its Deep Blue and Watson systems — recently contributed SystemML, its machine learning platform to the fastest-growing open-source community, Apache Spark. IBM will offer Spark as part of its broader IBM Bluemix open cloud technology platform.

Google — TensorFlow

In November, Google released TensorFlow for free. TensorFlow is Google’s second-generation machine learning system, replacing DistBelief. The system represents computations as stateful dataflow graphs, making it easy to run networks across multiple machines with different hardware. Developed by the Google Brain team, including deep learning legend Geoffrey Hinton, it’s used in various Google products including Gmail and Photos. Its most high profile use is in the RankBrain system, Google’s AI engine that handles a substantial amount of Google’s search queries.

Microsoft — Distributed Machine Learning Toolkit (DMTK)

Finally, in November, just 3 days after Google, Microsoft open-sourced its framework and algorithms for distributed machine learning. The DMTK is designed to allow machine learning tasks to be easily scaled. The toolkit also includes LightLDA, an efficient algorithm for topic model training, and Distributed Word Embedding, a tool for natural language processing.

Software prices tend to zero as the value of data rises

Machine learning tools are making it easier to understand the abundance of data that is being collected. Deep learning techniques are enabling systems to learn from unstructured data. Much of the real world is messy, complex, and rarely fits nicely into the rows and columns that traditional approaches to intelligent machines, software, and databases require. Videos, unlabeled text, and voice are all being analysed by systems that can now infer context, making insights more accurate and valuable.

“While laggards in the industry debate the merits of on-premise servers versus cloud services and struggle to merge vast numbers of databases, technology leaders are pushing further ahead.”

Intellectual property is being handed over to the open-source community to use as they want. As most companies are just beginning to devise their Big Data strategies, Google, Facebook, Microsoft, and IBM have devised their strategies, built Big Data and machine learning tools, and are now giving them away for free.

Most companies consider their proprietary software to be a competitive advantage and how they provide value to customers. As traditional hardware companies are slowly trying to become software- and services-based companies, the ground beneath them has shifted.

Telcos are trying to adapt to a world of software-defined networking rather than routers and switches, and manufacturers are moving from providing tools and widgets to usage analytics and predictive maintenance. As they arrive in this new dawn of software and services with the promise of fat margins, they will find it was a mirage. Software on the Internet has almost zero marginal costs. Prices will trend to zero. The real value is data.

Using machine learning tools is hard

Google, Facebook, Microsoft, and IBM have not given away all of their software. Google, Microsoft, and IBM also have machine learning platforms through which they offer machine learning APIs to paying customers. These companies want to attract developers to build on their platforms to make it more valuable. They are open-sourcing their tools basically so developers can learn how to use them. This is great for future hiring and it fosters a thriving developer ecosystem.

Valuable platforms attract users and developers. Developers have limited resources and will only allocate resources to platforms which generate the greatest revenues. This is why small developers build iOS apps first, Android apps second, and Windows Mobile never. Platform dynamics are winner-takes-almost-all. Companies can court developers, pay them to build for the platform, and take a lower cut of sales; but if the platform doesn’t have users, it doesn’t matter. See Windows Mobile.

“The challenge for non-software companies trying to build platforms for their own customers is that open-source is not part of their culture.”

Customer value is created with machine learning applications from third-party developers providing new innovative services. To get developers on board, open-source will be the only way. Data will be the only sustainable competitive advantage.

Recent advice to the industry has been to move away from making physical things and to making digital things. However, charging for digital things on the Internet is harder than ever. With machine learning, making digital things is not even enough. Companies need to give away the digital things. This will be a bitter pill to swallow for the management and boards of many companies going through a digital transformation.

The only thing that matters now is data.