6

How much data does AGI need?

 1 year ago
source link: https://devm.io/machine-learning/artificial-general-intelligence-data
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

AGI: The most exciting project on the planet - Part Three

How Much Data Does AGI Need?

Charles Simon

09. Nov 2022


Wikidata is a “knowledge graph” which is a collection of nodes and edges representing the relationships between various pieces of information. In Wikipedia, this knowledge graph can be used to create the summary boxes which appear on many pages.

In Wikidata, there is the node for "yellow” which contains data such as the word for yellow in numerous languages and then what they call statements like “yellow is a color.”

Fig. 3

Fig. 3: Wikidata node for "yellow"

Statements can go on for pages and highlight a distinction between the knowledge graph and the information in your brain. In a knowledge graph, when you put in some information, you assume it's correct and want to keep it for a long time. In your brain, there is a constant flood of incoming information. You've got to be able to store a lot of that information for a short period and then forget what proves not to be useful. So you need to have real-time information.

Here's the real bugger in your brain. The nodes and edges cannot contain any data directly. If they did, our understanding of the brain would be much easier. We’d open up the skull and there would be all of the neuron labels and the neurons flashing whenever they fire (or when they don’t). It would be easy to know how the brain was laid out. But in the brain, there aren't any unicode strings, no floating point numbers, no images. It’s a graph where you can only know what a node means by the other nodes it’s connected to. Everything you know is only in relation to everything else you know.

Fig. 4

Fig. 4: Abstract node representing "yellow"

Let's say this image is the abstract node representing yellow, and yellow is a color where there's some sort of abstract node representing color. If you see something yellow, it will fire this abstract node of yellow and the “yellow” node has edges which connect to things that are yellow. These yellow things, such as a banana, will have numerous connections to other nodes which define other attributes of the banana.

In a knowledge graph, when you put in some information, you assume it's correct and want to keep it for a long time. In your brain, there is a constant flood of incoming information.

You have a mental model where you can keep track of a couple of bananas in your immediate surroundings. These are “instance nodes” connecting to the generic abstract banana node which can add specific detail to individual bananas while incorporating the characteristics of the abstract banana.

Now let’s consider the words related to the abstract concept of yellow. You might use the word yellow or amarillo or golden. You can have multiple words associated with an individual node, but the word nodes contain no data so you (or your brain) don’t know what they are either. A word must connect to its pronunciation and spelling, so you can say it, hear it, read it, or write it.

When you read the word “yellow” on this page, the recognized letters fire their respective nodes. This information will percolate up and eventually fire the abstract node representing yellow. There, you finally get an inkling as to what the word means because the abstract yellow node connects to yellow things like bananas which you can imagine in your mental model and have the “understanding” of what the word you read actually means.

Some robots display the kind of fluid motion with processing which essentially replaces the 56 billion neurons of your cerebellum. And it does so without modelling neurons because the programmers know about forces, physics, and feedback.

You've got to be able to run it the other way too, so that signals will pipe out to your cerebellum to coordinate hundreds of muscle contractions to say or write the word. And for each of the letters, there is an abstract node in its own right so you can speak the names of the letters to spell something aloud as well.

Your brain is a massive network of connections and you don't know what any of it means except by context – the other nodes which are connected. So you've got all of this context, and context is everything because the nodes themselves don't contain data or have labels.

It’s easy to see that AGI doesn’t need to follow this context-only model. We can put labels on nodes to save not only the memory needed for all those contextual nodes, but also the processing power needed to follow all of that context. There are numerous other software shortcuts available – your 3D vision, for example, which relies on merging the data from your two eyes so you can look at something and estimate in your mind how far away it is. Your brain does this with millions of neurons, but a computer can do it with a couple of lines of trigonometry which runs faster than your brain.

Some robots display the kind of fluid motion with processing which essentially replaces the 56 billion neurons of your cerebellum. And it does so without modelling neurons because the programmers know about forces, physics, and feedback. They can do that in a couple of microprocessors, so the idea that you need a supercomputer to emulate all of the brain’s neurons and synapses is simply unrealistic.

Our graph can be represented in software structures instead of neurons. That's more efficient because we've got lists and structures and hash tables. We can put labels and values in our nodes so we don't have to figure out what yellow means because we can just write yellow in the node’s header and know how to spell it, say it, etc. Machine learning represents another software shortcut that may or may not have anything to do with the way your brain works, but accomplishes similar tasks. Finally, while we have a maximum of 160 million nodes, we don't know how many nodes it really takes to be generally intelligent.

Where does all this lead me? Well the amount of programming needed for AGI seems like it's going to be manageable. The size of the data is manageable and the hardware already exists to handle graphs of hundreds of millions of nodes. We've got software shortcuts that we can use as soon as we figure out what the AGI program really is. This is all to say that AGI is within our grasp as soon as we learn more about how the brain actually does its job. And with numerous scientists working on this question, the insight needed to comprehend the workings of the brain could come at any time.

Finally, as we approach human level intelligence, nobody's going to notice. At some point we're going to get close to the threshold, then equal the threshold, then exceed the threshold.

AGI emergence is going to be gradual. That’s because many AGI capabilities are marketable in their own right. So I produce something that has an improvement in the way your Alexa understands you and everybody's going to love that. Somebody else produces something that's got better vision that they can use on a self-driving car and everybody's going to love that. As we approach actual human level intelligence, everybody's going to love it because all of these little pieces are marketable. And the more we attach these pieces to each other and the more they can interact and have their contacts, the better things are going to be.

Finally, as we approach human level intelligence, nobody's going to notice. At some point we're going to get close to the threshold, then equal the threshold, then exceed the threshold. At some point thereafter, we're going to have machines that are obviously superior to human intelligence and people will begin to agree that yes, maybe AGI exists. But it's not going to be a specific time that happens at a specific place. My overall conclusion is that AGI is inevitable and sooner than most people think.

Charles Simon
Charles Simon

Charles Simon, BSEE, MSCs is a nationally recognized entrepreneur and software developer who has many years of computer experience in industry including pioneering work in AI. Mr. Simon’s technical experience includes the creation of two unique Artificial Intelligence systems along with software for successful neurological test equipment. Combining AI development with biomedical nerve signal testing gives him the singular insight. He is also the author of two books - Will Computers Revolt?: Preparing for the Future of Artificial Intelligence and Brain Simulator II: The Guide for Creating Artificial General Intelligence - and the developer of Brain Simulator II..


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK