The Road to Data 3.0


If you are reading this, congratulations on being a part of the next generation of innovation we like to call “Web 3.0” (or simply “web3”). You could be an artist looking to leverage the decentralized economy via an NFT launch, a builder working to further on-chain protocols with smart contracts, a business looking to leverage blockchain technologies to level up your internal logistics, or even just an innocent bystander looking to learn more about the next big thing - whatever the reason, we are glad you are here!

As the 2022 State of Crypto Report by a16z summarizes, the adoption of web3 is growing, here to stay, and is empowering the next wave of creators. And while all of that is extremely exciting, the road to web2 scale (i.e. billions of users) is far from paved. For instance, the estimated 7-50 million Ethereum users have all had to learn about concepts like private keys, wallets, gas fees … and that’s just to get started in the ecosystem.

Creators in web3 don’t fair any better. Building Decentralized Applications (dApps) often requires a working knowledge of blockchain data types, smart contract deployment and management strategies (hint: it’s not like traditional software development), gas fees, tokenomics, pseudo-anonymous “users", etc, etc. There is a silver lining for creators and users alike though: the growth of web3 can largely be attributed to trailblazing individuals and companies *continuing* to work on solving everything from user identity to transaction throughput of the underlying networks.

Data 3.0

One often overlooked aspect of the entire ecosystem though is the accessibility of data itself. As builders we’ve grown [relatively] accustomed to sending data out into the ether (🥁) without knowing how we will get it back in any sort of usable form. This leads to reliance on centralized sources to figure that out on our behalf, and charge us for that service - at least partially defeating the original purpose of decentralizing the data to begin with. To truly enable the future of decentralized data we must rethink our approach to Data 3.0.

An Example

Let’s take a look at one of the most common examples in the space today: tracking NFT project owners. NFT (Non-Fungible Token) projects have significantly contributed to the rise of adoption in web3 over the last couple of years. OpenSea for instance, the largest NFT marketplace, has facilitated up to $4.8 billion USD in sales in a single month at the height of the last crypto cycle (source).

What makes NFT projects particularly interesting for our Data 3.0 example though is their relative ease to develop (relative to web3 in general that is) combined with the repeated centralization these projects utilize. At their base layer, NFTs are smart contracts (permanent, unchangeable software) on a given blockchain (often Ethereum) that enable users to own, and transfer, a fixed number of items. NFT project creators can write and deploy their smart contracts, distribute the initial set of items (whether through giving them away or letting web3 users buy them from the smart contract directly), and then leverage a marketplace like OpenSea to enable trades between web3 users.

An overly simplified lifecycle of an NFT project may go something like this:

  1. Creator A launches the highly successful NFT Project X
  2. Thousands of web3 users trade NFT Project X items as owners
  3. … time passes …
  4. Creator A now wants to give each of the current owners of Project X a gift for being such a great community
  5. Creator A realizes they can’t readily get a list of current owners from the blockchain directly and is therefore presented with the following choices:
    1. Scrap the gift idea
    2. Write a script to check the current owner of every item against the blockchain. This requires web3 development knowledge and would only generate point in time snapshots making it difficult to keep up with items that may be rapidly switching hands.
    3. Learn how to build an indexer to crawl and subscribe to updates from the blockchain, replaying every trade sequentially, to determine the current owner of each item. This solves the problem, but requires even more web3 development knowhow and the ongoing maintenance of the indexer itself.
    4. Use a readily available API from a centralized source (that has built their own indexer) to provide this data. This is by far the easiest solution, but requires reliance on a 3rd party provider that may or may not be generally reliable (looking at you OpenSea…) and often incurs a service cost.
  6. Creator A chooses a 3rd party API provider
  7. Owners of NFT Project X get their gifts and everyone is happy

What’s the catch?

This all sounds fine and dandy right? And it mostly is. Generally the 3rd party providers can be trusted to provide real time, consistent data. But what happens when that 3rd party isn’t reliable or changes the way it serves that data? Or more importantly, what happens when Creator A has a use case no longer supported by the standard offering (e.g. they want to only give gifts to owners that have held their NFTs more than 90 days)? And all of this is just the tip of the iceberg.

While providers are rapidly appearing to fill the holes in Data 3.0, most of them are doing so in use case specific ways; OpenSea for NFT data, Alchemy for raw blockchain data, etc. Ethereum’s developer docs - which are honestly great - don’t even mention “indexing” except to point at a 3rd party provider, The Graph. And The Graph, which describes itself as “an indexing protocol for querying networks like Ethereum and IPFS”, has limited support for customizations and forces participation in something akin to an indexing marketplace.

There’s a missing link in the Data 3.0 lifecycle and the impacts are only just starting to be felt. Web3 creators should be empowered to own, leverage, and define their data as they see fit. Join us in shaping the future of Data 3.0.