Near | Why AI needs to be Open?

IOSG
13 min readApr 23, 2024

IOSG Ventures & Illia Polosukhin (Co-Founder of Near)

On April 17th, the 12th Old Friends Reunion hosted by IOSG Ventures took place as scheduled. The theme of this event was < Singularity: AI x Crypto Convergence >, and as such, we invited prominent representatives from the industry who are emerging as key figures. The purpose of this gathering was to facilitate discussions among participants about the convergence of artificial intelligence and cryptocurrency, and the implications of this convergence for the future. In such events, attendees have the opportunity to share their insights, experiences, and ideas, fostering collaboration and innovation within the industry.

Next up was one of the keynote speeches at the event, presented by Illia Polosukhin, Co-founder of NEAR Protocol, titled “Why AI Needs to be Open”.

Why AI Needs to be Open

Illia Polosukhin

Let’s just dive into “Why AI Needs to be Open”, So my background is in Machine Learning, and for about ten years of my professional career, I have been working across a variety of Machine Learning. But before crypto, natural language understanding, and NEAR actually started as an AI startup, I worked at Google. We developed the framework that is now powering most of the modern kind of AI, which is called transformers. And I left to start a Machine Learning company so that we can teach machines to code because it changes how we interface with computing. If we do that now, back in 20 1718, it was too early. You didn’t have to have computing capacity as well as data to do that.

And what were doing, was engaging people around the world to do work for us to label data with mostly students. They were in China, Asia, and Eastern Europe. Many of them didn’t have bank accounts in many of these countries. The US doesn’t want to send money very easily, so we started to want to use blockchain as a solution for our problem. We wanted to send money to people in a programmatic way not care where they are and make it easier to afford. And the challenge with crypto, by the way, is that right now, I mean, NEAR solves a lot of this, but generally, you need to first buy some crypto to be able to transact in blockchain to earn crypto, which is a very kind of reverse.

Like, you come to the job and they’re like, hey, first, you know, you need to buy some equity of this company before you can use it. So a lot of this is what we are solving as NEAR. But let’s dive in a little bit on the AI side. So language models are not new things. They’ve existed since the 1950s. It’s a statistical tool being used across natural language kind of tooling. For a long while now, in 2013, there was a new kind of innovation that happened as deep learning was getting kind of reborn, in a way. And the innovation happened that you can match words, right? Which is symbolic symbols to a vector in a multidimensional space. And now with these vectors, you can do the math, we can do distances, you can transform them.

And this allowed us to start doing advanced deep learning and training models to do a lot of interesting things. Now, back then, were doing neutron neural networks, which are pretty much modeled after humans, where we read one word at a time. And so, as AI was doing that is very slow, right? If you are trying to put something in front of users on Google.com, nobody’s gonna wait, you know, like, oh, let me go and read Wikipedia and give you an answer in five minutes, right? You want the answer right away. And so transformers the model that’s powering ChatGPT, Midjourney kind of all of the recent advancements came from this idea that we want something that is able to process data in parallel, being able to reason about it, and being able to give you answer.

And so one of the major innovations here is this idea that every single word, every single token, every single patch of an image is processed in parallel, leveraging the highly parallel computing we have with GPUs and other accelerators. And through that, it allows us to reason about it at scale. Scale and able to scale up training to process autonomic training data. And so then after this, we saw this Dopamine, did amazing jobs, came with the shot training. It had massive amounts of text, and starting to see amazing results in reasoning and understanding of language in the world.

Now, what this led to is kind of like accelerating innovation AI, where before it was a kind of tool that a data scientist, a machine learning engineer, would use and then interpret in some way in their product, or be able to go and talk to decision makers about what’s in the data. Now we have this model of actually talking to people directly. You may not even know you’re talking to the model because it’s actually hidden behind the product. And so we kind of have this transformation that went from, we have kind of experienced people who understood how this technology works to people who kind of use it as, you know, as we use the car. So we don’t understand that you know, what are gaming and engine was profitable.

So just to give you some kind of background, when we say we’re using GPUs for training models, this is not a gaming GPU that, you know, many of us had on our desktop to play some video games. This is super confusing.

Each machine has eight GPUs normally, they’re all interconnected by one muzzle board, and then they are stacked into racks of, I think, 16 machines. Now, all of those racks are interconnected with specialized networking cables as well, to make sure that the information can flow extremely fast between the GPUs directly. So the information doesn’t fit the CPU. You don’t actually process it on the CPU at all. You have all of the computation that happens on GPUs online. So it is a supercomputer setup. It’s not, again, traditional, hey, this is a GPU thing. And so models of the scale of GPU4 are trained on 10,000 H100 for about three months, and it costs $64 million. So this is just to give you a context of, like, what. What is the scale of the current cost? Expenditures are for training some of the modern models.

Now, importantly, when I say the systems are interconnected, the current connectivity for H100, which are previous generation, is 900 gigabytes per second. For context, your CPU connected to your RAM inside your computer, all local, is 200 gigabytes per second. So sending data from one GPU to another GPU in the same data center is faster than your computer can. Your computer can talk to itself pretty much inside the box. And the new generation is pretty much 1.8 terabytes per second. So from a developer perspective, this is not an individual unit of computing. These are supercomputers, which have a kind of a single gigantic RAM and compute power, providing you extreme kind of large-scale computation.

Now, what this leads to is this problem that we have, these big companies who have access and resources to build these models that are now pretty much providing us this service, which I have no idea what went into it, right? And so this is an example, right? You go to, you know, pay purely centralized company provider, and you type in a query. And so what happens is there are several teams which are not software kind of engineering teams that went in and decided how the results will be shown, right? You have a team that decides what data goes into the dataset.

An example, when I was at Google, if you just scrape Internet, the amount of times, the number of times that Barack Obama was born in Kenya and Barack Obama was born in Hawaii is exactly the same on the Internet, because people love speculating about controversies. And so you make a decision what to train on. You make a decision to filter out some of this information because you don’t believe this is true, right? And so individual decisions like this have been made to decide what data is going there. And these decisions are very much, you know, biased by whoever makes them. You have a legal team that decides what content we cannot look at what’s copyrighted, and what’s illegal. We have an ethics team that decides on what’s not ethical and what we should not include.

So you have a lot of this kind of filtering and manipulation in a way that happens. These models are statistical models. They just pick up whatever’s in the data. If you don’t have something in data, they will not know about it. If you have something in data, very frequently they will know this as a fact. Now, even scarier when you’re actually getting a response. Right. Now, presumably, you’re getting it from the model. Right. But there are no guarantees. You don’t know how the results are generated. A company may sell your specific session to the highest bidder to actually change the results. Right. Imagine you asking which car to buy, and Toyota decides, hey, we should really bias the output and we’re going to pay this company ten cents to do that.

And so even though you’re using those models as kind of just like knowledge base that should be neutral and represents data, there’s a lot of things that happen before you get in the results that actually bias it in a very specific way. And this leads to a lot of issues already, right? This is kind of just like one week of different legal lawsuits between big companies and outlets. SEC, everyone pretty much is trying to sue each other right now because there’s so much ambiguity and kind of power that comes with these models. And if you look kind of ahead, the problem is that the big tech company will always have an incentive to continue generating more revenue, right? Like if you’re a private public company and you’re reporting the revenue, you need to continue showing growth.

And to do that, if you already achieved the target market, let’s say you already have 2 billion users. There’s not that many more new users on the Internet. You don’t have many more options except to maximize average earnings, meaning you’re extracting more value from the user, which potentially they just don’t even have, or like you need to change their behavior. Generative AI is really good at actually manipulating and changing users’ behavior, especially if people assume that it’s coming as everything, knowing intelligence. And so we have this kind of really risky situation where we have regulatory pressure, where regulators don’t fully understand how this technology works. We have pretty much users who are not protected against manipulation.

Manipulative content, misleading content, even without the Ads, by the way, like you can just take a screenshot of something, change the title, put it on Twitter, and people go nuts. You have economic incentives that are skewed to continue maximizing revenue. And again, this is not actually like, you’re not being evil like inside Google, right? When you decide on which model to launch, you’ll run an AB test and you’ll see what brings more revenue. And so you’ll be iteratively maximizing revenue by extracting more value from users. And you have users and communities that have no input into what goes into the model, what data is used, and what goals it’s actually trying to achieve. So this is all the case for users on the app. This is a kind of modulation.

This is why it’s extremely important for us to actually push the envelope here, not just in a way that, hey, we have this whatever company making kind of investments and this is the only way we need a sufficient alternative, right? Web 3 is kind of an important tool, right? We have cryptocurrencies which allow us to have new incentive structures and we have a lot of cryptography distributed systems to actually allow us to build better software and better systems. So this is kind of a high-level aspect, a little bit intersecting with what was before data infrastructure applications. And so I’ll just cover a few quick things, right? So one is we need to understand content reputation.

Again, this is not a pure AI problem, although the language models create a huge, massive leverage and scale for people to manipulate and exploit information. But you know, this is somebody who photoshopped an image and posted it on the news about Milton. And so what you want is you want a way to have a cryptographic reputation that is traceable, that is surfaced when you’re looking at different content. So imagine you have community nodes that are actually cryptographic and are available on every page across every website. Now, if you look beyond this, all of this distribution platform is going to get disrupted because the models now will be pretty much reading all this content and providing you personalized summary, and personalized output.

And so what we have is actually have an opportunity to create new creative content, not trying to reinvent, you know, let’s put a blockchain and NFTs on existing stuff, but actually a new creator economy which is around model training and inference time, where the data that people create, right, be it new publication, the photo, YouTube, the music you created going into a network based on how much new novel data that contributes to the training of the model. And so based on that, you receive kind of payouts on the world, based on content. So we transition from the eyeball economy that right now is being driven kind of by the advertising networks to really the novelty and interesting information that we’re bringing to the genetic now. Probably had a few things before, like, we cannot trust the results. We need a way to validate it.

I would mention one important thing there is a massive non-determinism that comes from floating point operations. All of these models are a lot of floating point operations and multiplications. These are non-deterministic operations.

Now, if you multiply them on actually even different architectures of GPUs. So you take one A100 and one H100, the results will be different. And so a lot of these methods that rely on determinism like cryptoeconomic and optimistic, are actually will have a lot of struggle and need a lot of kind of innovation for this to happen. Finally, there’s an interesting kind of, you know, if we have been building programmable money and programmable assets, but if you can imagine that you adding this kind of intelligence to them, you can have intelligent assets, which is defined now not by code, but by natural language ability to interact with the world, right? And this is where we can have a lot of interesting kinds of Yield Optimization, Defi, we can have trading strategies inside the world itself.

Now, the challenge is all the current events are not robust adversarial behavior. They’re not trained to be adversarially robust because the train is to predict the next token. And so it’s easier to convince a model to give you all the money. And so it’s really important to actually address that before going forward. So I’ll just leave you with this idea that we are at a crossroads, right? There’s a closed AI kind of ecosystem that has extreme incentives and flying wheels going because as they put out a product, they generate a lot of revenue that they put back into building that product. But that product is naturally designed to maximize the revenue of the company and in turn, maximize the kind of extraction from the user. Or we have this kind of open, user-owned approach where users are in control.

The models are actually sitting on your side and actually trying to maximize your well-being and kind of economic success. And they provide you a way to interface and really protect you from a lot of the dangers that are going to be more and more prevalent on the Internet. So this is why we need to do more crypto AI. This is the opportunity here to discuss it. We’re going to be announcing more interesting stuff in a couple of weeks about what we’re doing with AI on here. Thank you.

--

--