Grass wants to put AI training data on a layer 2 blockchain — and it’s using Solana to do it

Grass wants to put AI training data on a layer 2 blockchain — and it’s using Solana to do it
Companies already using Grass’ network to train AI include a financial insights platform for hedge funds. Credit: Darren Joseph
  • AI training network Grass is building a layer 2 on Solana.
  • The project plans to record the data scarped through its network on the layer 2 to make it publicly available.
  • Doing so could help combat data poisoning and empower open-source AI.

As artificial intelligence hype gains momentum in crypto, Grass — an app that lets users monetise their Internet connection — has adopted an unusual strategy: develop a Solana-based layer 2 blockchain network to support training AI data.

Training AI, such as large language models, requires the software to comb through huge amounts of data searching for patterns. Many websites restrict IP addresses associated with data centres, making it impossible for those training AI to use the data centres when scraping the web.

To solve that problem, Grass lets those training AI tap into its network of users to avoid restrictions. Grass users can earn points, and later cash rewards for joining and offering their unused network resources.

Founded in 2022, the company, based in Toronto, Canada, raised $3.5 million in December from investors including Polychain Capital. It claims to have over 1 million users and multiple clients but didn’t disclose their names.

Stay ahead of the game with our weekly newsletters

Chris Nguyen, chief technology officer of Wynd Network, the company behind Grass, told DL News the company chose Solana because it’s “the best execution environment available for digital assets with respect to both throughput and gas fees,” and that choosing a high-throughput, low-fee network like Solana was “especially important considering the scale of data that Grass is processing.”

He explained that using a blockchain instead of a public database to store training data has several benefits. It combats data poisoning, the malicious contamination of data to compromise the performance of AI, and empowers open-source AI, the practice of making the source code for AI algorithms freely available for inspection.

“It also allows for the level of transparency necessary to reward users adequately for their network and compute contributions to this database,” Nguyen said.

But creating a database of AI training data isn’t cheap. Paying transaction fees to publish that data to a blockchain only increases the costs of doing so.

Join the community to get our latest stories and updates

The company intends Grass to be a decentralised network made up of users with residential internet connections. Users who have the Grass app installed help train AI by having web requests routed through their internet connection.

Training AI with Grass

But Grass letting clients use its users’ web connections raises questions about privacy and security.

“When you install the web app and run a node on Grass, all it’s doing is routing web requests through your internet connection,” Nguyen said. “The extension doesn’t have access to anything else, so it has no visibility into your device or browsing activity.”

There’s also the issue of whether the data Grass scrapes is publicly available. In December, the New York Times sued OpenAI and Microsoft after AI models created by the companies were trained using the publication’s news articles.

It’s possible that Grass could run into a similar problem.

Nguyen said Grass will never scrape a website with a login and will scrape only public data. “As long as it’s not login-gated, anything on the public internet is considered public data,” he said.

Grass’ layer 2 network

Currently, Grass’ data network doesn’t run on a blockchain.

With its upcoming layer 2 network, the project will start recording metadata every time data is scraped from the web. This metadata will record and verify the website where it was scraped from.

By encoding this data on a blockchain, Grass hopes to make publicly available the origin of data used to train AI, enabling developers to confirm its source.

Grass says it expects its app to eventually facilitate millions of web requests per minute. At those levels of activity, encoding data directly on Solana would become too expensive and clog the network. Instead, Grass will publish the data on a layer 2 blockchain.

The layer 2 will use zero-knowledge proofs — a type of cryptographic technique that can bundle up data while maintaining privacy and security by not revealing the actual data.

After the data is published on the layer 2 network, it will be grouped into batches to reduce its size and sent to Solana for validation.

How do companies use Grass?

Nguyen said the companies already using Grass’ network to train AI include a financial insights platform for hedge funds, and a data repository that trains large language models using data scraped from forums and blog posts.

But there are also other ways networks like Grass could be used that users might not be comfortable with.

Similar apps that let their users sell unused network resources have previously had issues with customers using them for questionable activities, such as bulk registration of social media accounts, accessing potential click-fraud sites, and scraping government and personally identifiable information databases.

Nguyen previously told DL News that all buyers go through a rigorous vetting process, and Grass accepts only vetted organisations, such as nonprofit AI data repositories and incorporated companies that pass know-your-customer checks.

“The exciting thing about the network, in my opinion, is that literally any type of public data can be accessed using it, which means any kind of product can be produced,” Nguyen said.

Tim Craig is DL News’ Edinburgh-based DeFi Correspondent. Reach out with tips at tim@dlnews.com.