FVM and the Future of Filecoin

IOSG
8 min readMar 29, 2023

--

Protocol Labs mentioned three stages of their master plan to decentralize the internet:

  1. Build the world’s largest decentralized storage network.
  2. Onboard and safeguard humanity’s data.
  3. Bring retrieval and computing capabilities to the data to build scalable applications.

Filecoin has achieved step 1, by being the largest decentralized custodian of data with over 661.54 PiB of data currently stored by the network. Filecoin’s storage market continued to grow in Q4'22, as active deals grew 117% QoQ and 1,798% YoY

The next step for Filecoin is to help onboard and safeguard humanities data, which would be an ongoing process, and to reach step 3, there needs to be new infrastructure built to help retrieve and compute over data which otherwise will make Filecoin look like a bunch of hard drives around the world only starring archival data.

Enter, the Filecoin Virtual Machine. By integrating a VM into the Filecoin protocol, developers can create decentralized applications (dApps) that leverage the power of Filecoin’s storage network and execute smart contracts in a secure and reliable manner, and can now create additional layers of value over the existing data on Filecoin

The FVM is designed to execute smart contracts on the Filecoin network. When a smart contract is deployed, it is compiled into WASM bytecode. This allows developers to create useful applications that behave with an immutable set of rules, which unlocks many use cases such as bringing all the markets of Filecoin on-chain, perpetual storage, DataDAOs, etc. While all this is great, we have seen many a chain fail without good developer experience and also the lack of a developer market that can readily build on the chain.

Filecoin made the decision to launch and support FEVM. When an EVM is deployed to FEVM, it is compiled with WASM and an actor instance is created in FEVM that runs the EVM bytecode. The user-defined FEVM actor is then able to interact with the Filecoin network via built-in actors like the Market and Miner APIs.

This is a very critical step for Filecoin as EVM-based development has proven to have good developer experience and also a large existing developer base.

Filecoin also has a problem with the utilization of network power. Even after growing throughout the year, only about 3% of Filecoin’s storage capacity has been used. Launching the FVM will cause an increase in the number of dapps and could result in greater storage and utilization of the network itself.

By providing a secure and reliable environment for executing smart contracts, the Filecoin VM helps to unlock the full potential of the Filecoin protocol and enables new and innovative use cases for decentralized storage and computing.

While there are many different use cases that can be built out using the FVM, the Protocol Labs and the FVM teams came out with a “Request for Startups” List which they feel is important for the Filecoin ecosystem to thrive. Here are a few major RFS from the list:

According to Protocol Labs, the stars denote the importance of the project’s existence on the Filecoin network.

One cannot help but notice the repeated mention of DataDAOs, what are they and why are they important?

The phrase “Data is the New Oil” is one that has been making the rounds over the past couple of years with good merit. The largest internet companies in the world always operated on data being their most valuable resource both internally (market/user insights) and externally (selling data). Why is data valuable? Data is only as valuable as the insights that it can generate and to generate insights one needs to perform computations on the data itself.

In the current state of Filecoin, monetizing data on-chain is not possible because the storage deals are peer-to-peer and the deals are made off-chain while being settled on chain. Monetizing data needs basic infrastructure to build systems for access control, payment systems for subscriptions, data augmentation, packaging, etc.

This is what FVM can unlock, all in a decentralized community-centric model. DataDAOs are DAOs whose mission revolves around the preservation, curation, augmentation, and promotion of datasets considered valuable by their stakeholders.

Every stakeholder in a DataDAO can now be incentivized to contribute in various different manners SPs responsible for storage and preservation, replication workers can ensure local and quick availability, data providers can be now compensated for selling any data that they have, data experts can participate in packaging data, ML engineers can be compensated for providing models that can be run over the gated data which in turn can augment the value of the data.

We can look at Data as a value spectrum:

  1. Raw Data (Least Valuable)
  2. Packaged Data
  3. Computations over Data
  4. Verifiable Insights (Most Valuable)

Raw and Packaged data can be uploaded on Filecoin in the pre-FVM state but it could not be gated on-chain.

Computations over data (ex. AI/ML models, NFT creation, etc.) can now be performed using the FVM. Although at the current state of Filecoin, complex and heavy computations cannot be run on-chain as Filecoin nodes do not have enough computation power at the moment.

This shows a clear scope for off-chain Dcompute platforms with Filecoin interoperability.

Solutions like Bacalhau are working towards a vision of Compute-over-Data with close Filecoin interoperability. Bacalhau is a fairly new project which has the potential to make the Filecoin network a lot more valuable as it allows for the data stored in Filecoin itself to be more valuable.

DataDAOs in their initial state need not cater to the full data value chain. While that could be the end goal of a DataDAO, there are many DataDAOs that have started work in particular verticals or use cases. Some notable mentions include:

  • Lagrange DAO: A DAO for data value realization and decentralized science (DeSci). It provides data sharing and analytic spaces for DeSci.
  • GlacierDAO: a DAO that opens the replication of Git repositories that contain code deemed to be of public interest. In addition, it will allow users to pool funds together in order to fund the replication of these repositories on the Filecoin network
  • SPN DAO: a DataDAO that enables consumers to turn credit card transaction data into assets, allowing them to have direct control over the use and monetization of their data

While DataDAOs are currently focusing on the collection and curation of Data, to move up the value chain, there will be a need for Compute over Data and DCompute infrastructure.

Projects that are already working on this with a focus on Filecoin include:

Bacalhau: Bacalhau is a platform for fast, cost-efficient, and secure computation by running jobs where the data is generated and stored. With Bacalhau you can streamline your existing workflows without extensive rewriting by running arbitrary Docker containers and WebAssembly images as tasks.

Shale: Shale is working to bring cloud computing to Filecoin, enabling Storage Providers to leverage existing storage power for computation and directly compete with other cloud storage providers such as AWS and Google Cloud. Public users will have the opportunity to rent computing instances from Storage Providers and access Filecoin+ storage deals through a local network. This is a solution similar to Akash Network and StackOS but with a strong focus on computing over Filecoin data.

DataDAOs are a unique use case that can be unlocked end-to-end in a decentralized manner only through the Filecoin network

Filecoin in its older state was just storage and since it is an evolving protocol, had a lot of problems that needed to be addressed, such as:

  1. Even if my SP is slashed for not storing my data, how can I repair or retrieve my data?
  2. I have high-quality content that needs to be delivered using a caching layer for my decentralized website, and all my deals with the SP happen off-chain.
  3. How do I gate my data on Filecoin if I have enterprise-level data?
    And many more.

The FVM and Retrieval Markets Filecoin is working on solving a lot of these problems. The FVM unlocks the possibility of creating and incentivizing replication of data on Filecoin so that even if one SP defaults, the data can be retrieved or repaired from other nodes.

Retrieval Markets are going to help created decentralized CDN networks that can be beneficial to Web3 Social and Gaming projects.

Access control platforms can be built with E2E encryption of the data using the FVM and also projects like Medusa are working on it with an additional layer of interoperability with other chains.

In distributed data storage systems like Filecoin, performing computations on them is a little tough as data could be residing in many different nodes very far apart from each other. Aggregating the data and then performing computations over it kills the purpose of a decentralized storage system entirely.

With the world moving towards AI model training over large datasets, the storage of large datasets on Filecoin has great economic value (because it’s cheap), but building training models on top of these systems is very hard at the moment.

ChatGPT (GPT-3) is trained on a computer that has 1000 V100 GPUs (at ~147 TFLOPs/GPU) that has an average cost of about $10 Million.

Stable Diffusion is trained on 256 A100 GPUs that cost close to $600k.

The computation power required for these is immense and with the current minimum requirement of Filecoin nodes (8-core GPU with 128 GB RAM). Filecoin alone cannot cater to the computation-heavy internet era that we live in.

There are two ways to solve this problem:

  1. Filecoin becomes a computation powerhouse by making the minimum computation requirements of storage provision very high. This is not a switch that can be turned on overnight. This has to be a gradual process.
  2. Outsource computation over data on Filecoin to third-party Dcompute platforms like Bacalhau and Shale (Filecoin centric) or other solutions like Akash Network, StackOS, etc.

Another problem is that many training models are already built and if the data needs to be ported to Filecoin, the compute networks must also be able to support the existing models that have been built using TensorFlow, Ray, etc.

In the short term, I see Decentralized Cloud Computing networks like Shale, Akash, etc. winning the market because of the ease of deployment and less developer overheads, while in the long term Filecoin must also strive to become a computation powerhouse by upgrading existing resources or fining a more efficient way of decentralized computation.

A longer term solution has been proposed by Rex St.John in his piece “ Harvested Compute and Pervasive AI: The Protocol Labs Bull Thesis “ where he suggests that the Filecoin network should focus on the support of computations using RaspberryPi and Nvidia Jetson with the sheer number of units that are already out in the market and can support the bootstrapping of nodes a lot more consistently.

FVM helps make Filecoin more useful than just an archival data storage network. Data residing in Filecoin can go up the value chain with the entry of the FVM and subsequently DataDAOs and Compute-over-data infrastructure.

DataDAOs incentivize participants to add more data, package data, monetize it and also compute over it (with necessary infrastructure).

Filecoin has the potential to become the premier storage network in the modern internet era and with the FVM and supporting infrastructure like Medusa and Bacalhau, I can see a lot of scope for high-quality growth in the Filecoin ecosystem

Originally published at https://medium.com on March 29, 2023.

--

--