AI 2023. Meet ChatGPT. - page 114

 

OpenAI has launched a programme to find GPT bugs.

Using promts to circumvent copyright or giving out incorrect information will now be corrected


 
Vitaliy Kuznetsov #:

OpenAI has launched a programme to find GPT bugs.

Using promts to circumvent copyright or giving out incorrect information will now be corrected


Asimov is a LOH !

the first real law of robotics: "let not copyright infringement"

 

Did a great job of translating the interviews. ChatGPT helped a lot. There are some small mistakes, but the essence is much more important.

Here is an interview with Ilya Sutskever, chief scientist and co-founder of OpenAI company.

(I split the text into several parts because of its volume).

//==========================================

Part 1:

Introduction:

Yes, I'm Craig Smith, and this is I on AI. This week I spoke with Ilya Sutzkever, one of the co-founders and chief scientist at OpenAI, and one of the main minds behind the huge GPT-3 language model and its publicly available successor ChatGPT, which I think is changing the world. This isn't the first time Ilya has changed the world. Jeff Hinton said he was the main impetus for AlexNet, the convolutional neural network whose amazing performance wowed the scientific community in 2012 and sparked the deep learning revolution. As is often the case in these talks, we assume that the listeners have some knowledge, mainly because I don't want to spend the limited time I have talking to people like Ilya explaining concepts of people or events that can easily be found on Google or Bing, as I would say, or that can be explained to you by Chachi BT. The conversation with Ilya follows the conversation with Yann Lekun in the previous episode, so if you haven't listened to that episode, I highly recommend doing so. In the meantime, I hope you enjoy the conversation with Ilya as much as I did.

Ilya's bio

Craig Smith:

Yes, it's a pleasure to meet you and chat with you. I've watched many of your talks online and read many of your articles. Can you start by telling me a little bit about yourself and your background? I know you were born in Russia and studied there. What got you interested in computer science, if that was your initial impulse, or brain science, neuroscience or something else? And then I'll start asking questions.

Ilya Sutzkever:

Yeah, I can talk a little bit about that. Yes, I was born in Russia. I grew up in Israel, and then as a teenager, my family immigrated to Canada. My parents say that I was interested in artificial intelligence from a fairly early age. I was also very motivated by the study of consciousness. It bothered me and I was curious about things that might help me understand it better, and artificial intelligence seemed like a very good approach. I think those were some of the factors that encouraged me to start exploring that direction.

I started working with Jeff Hinton when I was only 17 years old. We moved to Canada and I was able to go straight to the University of Toronto. I really wanted to do machine learning because it seemed like the most important aspect of artificial intelligence, which was completely inaccessible at the time.

But to give some context, the year was 2003. Today we take for granted that computers can learn, but in 2003 we took for granted that computers couldn't learn. The biggest achievement of artificial intelligence back then was the Deep Blue game engine playing chess. But there it was: you have this game, and you have this search tree, and you have a simple way of determining which position is better than another, and it really didn't seem possible to apply that to the real world, because there's no learning here, and learning was a big mystery.

So, I was really very interested in teaching, and I was very fortunate that Jeff Hinton was a professor at the university I was at, and I was able to find him and start working with him almost immediately.

Contributing to AI

Craig Smith:

"How does intelligence work?" was your impulse, like Geoff's, to understand how the brain works, or were you more interested in the idea of machine learning?

Ilya Sutskever:

AI is such a big field that there were many motivations. Like, I wonder, but how does intelligence work in general? Now we have a pretty good idea that it's a big neural network and we know how it works to some extent, but back then, although neural networks were already known, nobody knew that Google neural networks were good for anything. So how does intelligence work in general? How can we make computers even slightly intelligent? And I had a clear intention to make a small but real contribution to AI, because there were a lot of contributions to AI that weren't real, and I could see for various reasons that they weren't real, and that nothing would come of it. And I thought nothing was working at all. AI is a hopeless field. So the motivation was to understand how intelligence works and to contribute to that. Those were my initial motivations.

Converged Neural Networks

Craig Smith:

So this was in 2003, almost exactly 20 years ago, and that's when Alex said: "I spoke to Geoff and he said it was your excitement about the breakthroughs in convolutional neural networks that prompted you to apply for the imagenet competition, and Alex had the programming skills to train the network. Can you talk a little bit about that? I don't want to drown in the story, but it's fascinating."

Ilya Sutskever:

In a nutshell, I realised that if you train a large and deep neural network on a large enough dataset that specifies some of the complex tasks that humans do, such as image processing, but also others, and just train that neural network, you're bound to succeed. And the argument was very indestructible because we know that the human brain can solve these tasks and solve them quickly, and the human brain is just a neural network with slow neurons. So we know that some neural network can do it well. Then you just need to take a smaller but connected neural network and train it on the data, and the best neural network inside the computer will be connected to the neural network that does the task. So it was an argument that a larger and deeper neural network can solve the problem, and in addition, we have the tools to train it, which were the result of technical work done in Jeff's lab. So we combine two factors: we can train these neural networks, we need it to be big enough that when trained it performs well, and we need data that can specify a solution. In the case of imagenet, all the ingredients were in place. Alex had very fast convolution kernels, imagenet had big enough data, and there was a real opportunity to do something absolutely unprecedented, and it absolutely succeeded.

 

Part 2:

//======================

Predicting the next element is all you need.

Craig Smith:

In 2017, the article "attention is all you need" came out, introducing self-learning and transformers. At what stage did the GPT project begin? Was there any serendipity about transformers and self-learning? Can you tell us about that?

Ilya Sutzkever:

So for context, in OpenAI from the earliest days we explored the idea that predicting the next item is all you need. We explored that with much more limited neural networks. We realised that we needed to keep increasing the size, and we did, and that's what led ultimately to GPT-3 and essentially where we are today.

Scaling up

Craig Smith:

Yeah, and I just wanted to ask, I got carried away with the story, but I'm so curious. I want to get to the problems or shortcomings of big language models or big models in general. But Rich Sutton wrote about scaling and how all we need to do is scale. We don't need new algorithms, we just need to scale. Did he influence you or was it a parallel line of thinking?

Ilya Sutzkever:

No, I would say that when he published his paper, we were very excited to see some external people thinking in similar directions, and we thought it was very eloquently laid out. So, for context, in OpenAI, from the very beginning we've explored the idea that predicting the next element is all you need. We explored that with much more limited neural networks at the time, but the hope was that if you have a neural network that can predict the next word, the next pixel, it's basically about compression. Prediction is compression, predicting the next word is not. Let me think of a better way to explain this because there were a lot of things going on, they were all related.

We were really interested in seeing how far the next word prediction would reach and whether it would solve the problem without the teacher. So before the advent of GPT, the teacherless problem was considered the holy grail of machine learning. Now it's completely solved and nobody even talks about it, but at the time it was very mysterious, so that's why we explored the idea. I was very interested in it, and I thought that if predicting the next word was good enough, it would give us a task without a teacher if it learnt everything about the dataset. That would be great, but our neural networks were not up to the task. We were using recurrent neural networks. When Transformer came out, literally the next day, it was clear that Transformer solves the limitations of recurrent neural networks on learning long-term dependencies. It's a technical thing, but it seemed that if we switched to Transformer right away, the very initial effort to build GPT would continue, and then, like with Transformer, it would start to work better, and you'd make it bigger, and then....

The conclusion that people have drawn is that it doesn't matter what you do to scale, but that's not really true. You have to scale something specific. The great breakthrough of deep learning is that it gives us the first way to use scale productively and get something in return.

In the past, what did people do on big computer clusters? I think they made them for weather simulations or physics simulations or something like that, but that's about it. Maybe some more for making films. But they had no real need for compute clusters, because what to do with them?

The fact that deep neural networks, when you make them bigger and train them on more data, work better has given us the first thing that becomes interesting for scaling, but maybe one day we'll find that there's some small detail to focus on. That will be even better for scaling. How many such details could there be? And of course, with the benefit of history, we'll say, "Does it really matter? It's such a simple change." But I think the true statement is that it matters what you scale. At this point, we've just found a thing that we can scale and get something in return.

Limitations of large language models

Craig Smith:

If we talk about the limitations of large language models, they are said to encapsulate their knowledge in the language in which they are trained, and most human knowledge, I think everyone would agree, is non-linguistic. I'm not sure Noam Chomsky agrees with this, but there is a problem with large language models, as far as I understand them. Their goal is to satisfy a statistical query sequence. They have no underlying understanding of the reality associated with language. I asked GPT about himself. He learnt that I was a journalist and had worked for various newspapers, but went on to talk about awards I had never won, all of which sounded great but had no correlation with reality. Is there anything being done to address this in your future research?

Ilya Sutzkever:

Yes, before I comment on the question directly asked, I want to comment on some earlier parts of the question. I think it's very difficult to talk about limitations, or constraints, even in the case of the language model, because two years ago people were confidently talking about their limitations, and they were quite different. So it's important to keep that in mind. How confident are we that these limitations that we see today will still be with us two years from now? I'm not so sure. There's another comment I want to make about the part of the question that says that these models just teach statistical regularity, and therefore they don't know what the nature of the world is, and I have a point of view that's different from that.

In other words, I think that learning statistical regularities is a much more meaningful thing than it seems at first glance. The reason why we don't initially think that way is because we, at least most people, haven't spent a lot of time with neural networks, which at some level are statistical models, along the lines of a statistical model just inputting some parameters to figure out what's really going on. But I think there's a better interpretation. It's an earlier observation that prediction is compression.

Prediction is also a statistical phenomenon. However, to predict, you ultimately need to understand the true process that generates the data. To predict data well, to compress it well, you need to understand more and more about the world that generated the data. When our generative models become incredibly good, they will have, I argue, an amazing degree of understanding of the world and many of its subtleties. But it's not just the world, it's the world seen through the lens of the text. It's trying to learn more and more about the world through the projection of the world onto the space of text expressed by people on the internet. And yet that text is already expressing the world. And I'll give you an example, a recent example that I think is really fascinating. We've all heard about Sydney's alter ego, and I saw this really interesting interaction with Sydney, when Sydney became combative and aggressive, when a user said he thought Google was a better search engine than Bing, how can we better understand this phenomenon? You could say it's just a prediction of what people will do, and they will indeed do it, which is true, but perhaps we're now reaching a point where the language of psychology is starting to be relevant to understanding the behaviour of these neural networks.

Now let's talk about the limitations. It's true that these neural networks tend to hallucinate, but that's because the language model is great for learning about the world but a little less good at producing good results, and there are various technical reasons for this, which I could elaborate on if you find it useful. But I'll skip that for now.

There are technical reasons why a language model is much better at learning about the world, producing incredible representations of ideas, concepts, people and processes that exist, but its outputs are not quite as hoped for, or rather, not as good as they could be. So, for example, for a system like ChatGPT, which is a language model with an additional reinforcement learning process called learning with reinforcement from human feedback, it is important to understand the following: we can say that the pre-learning process, when you are just training a language model, you want to learn everything about the world. Then learning with reinforcement from human feedback, now we care about their output. Now we say every time the inference is inappropriate, don't do it again. Every time the inference doesn't make sense, don't do it again. And that works quickly to produce good inference. But now the level of output is not the same as it was during pre-training, during the language model learning process.

Now about the possibility of hallucinations and the tendency for these neural networks to make things up. Indeed, this is true. Currently, these neural networks, even ChatGPT, do make up something new from time to time, and this also severely limits their usefulness. But I really hope that just by improving this later stage of learning with reinforcement from humans, we can teach it not to make things up. You may ask, will it really learn? My answer is let's find out.

 

Part 3:

//====================================

Human Feedback

Craig Smith:

Feedback and that feedback comes from the ChatGPT public interface. If he tells me that I won a Pulitzer Prize (which unfortunately I didn't), can I tell him that he's wrong and that will educate him or create some sort of penalty or reward so that the next time I ask him, he'll be more accurate?

Ilya Sutzkever:

The way we do things today is that we hire people to teach our neural network how to behave, and currently the exact way in which they indicate the desired behaviour is a little bit different, but really what you described is the right way to train it. You just interact with it, and it sees from your reaction, it concludes, "oh, that's not what you wanted, you're not happy with its output, so the output wasn't good, and it should do something differently next time." So hallucinations in particular are one of the biggest problems, and we'll see, but I think there's a pretty high chance that this approach can completely solve that problem.

Multimodal understanding vs. text-only understanding

Craig Smith:

I wanted to talk to you about Jana Kuhn's work on collaborative embeddings of predictive architectures and his idea that what big language models lack is a non-linguistic model of the world that the language model can refer to. It's something that's not being built, but I wanted to hear what you think about it and whether you've studied it at all.

Ilya Sutzkever:

I have studied this request, and there are some ideas in there that are expressed in different languages, and there are some maybe small differences from the current paradigm, but in my opinion they are not very significant, and I would like to clarify.

The first claim is that it is desirable for a system to have a multimodal understanding, where it doesn't just know about the world from text. My comment would be that indeed, multimodal understanding is desirable because you learn more about the world, you learn more about people, you learn more about their state, and so the system will be able to better understand the task it has to solve and the people and what they want. We've done a lot of work in that direction, primarily in the form of two major neural networks that we've done, one called CLIP and one called DALL-E. Both of them are moving in that direction of multimodality.

But I also want to say that I don't see the situation as binary, that if you don't have a vision, if you don't understand the world visually or through video, things won't work, and I wanted to talk about that. I think some things are much easier to learn with images and diagrams and so on, but I argue that you can still learn them from text alone, just more slowly. And I'll give you an example: consider the concept of colour. Of course, you can't learn the concept of colour from text alone. However, when you look at embeddings, I need to take a little detour to explain the concept of embedding.

Every neural network represents words, sentences and concepts through representations, embeddings, high-dimensional vectors. And one of the things we can do is to look at those high-dimensional vectors and see what is like what, how the network sees this concept or that concept. And so we can look at the colour embeddings, and the colour embeddings turn out to be exactly right. You know, it's like if it knows that purple is more similar to blue than it is to red, and it knows that purple is less similar to red than it is to orange, it knows all these things just from the text. How can that be? So if you have a vision, the differences between colours are immediately apparent to you, you perceive them immediately, whereas in text it takes longer, you probably already know how to speak and you already understand syntax and words and grammar, and it's only later that you say, "Oh, these colours I'm actually starting to understand." So that would be my point about the need for multimodality, which I argue is not necessary, but it's definitely useful. I think it's a good direction to explore, I just don't see it in such explicit either/or statements.

A sentence in the paper claims that one of the big problems is predicting high dimensional vectors that are uncertain. For example, predicting an image, as the article claims, is a significant challenge, and we need to use a certain approach to solve this problem. But one thing I found surprising, or at least unnoticed in the article, is that current autoregressive transformers already have this property. I'll give you two examples: one, one page in a book is given, we need to predict the next page. There can be so many possible pages that it's a very complex high dimensional space, but we handle it just fine. The same is true for images; these autoregressive transformers work fine with images. For example, with OpenAI we worked on igpt; we just took the transformer and applied it to pixels and it worked beautifully, it could generate images in very complex and subtle ways. It had a very nice and supervised representation learning with Dall-E; again the same thing, you just generate, think of it as large pixels, not general millions of pixels, we're clustering pixels into large pixels, let's generate a thousand large pixels.

I think in Google's image generation work that they released earlier this year called Party, they're also taking a similar approach. So the part where I thought the paper made a strong statement that current approaches can't handle predicting high dimensional distributions, I think they definitely can, so maybe that's another argument in favour of converting pixels to vectors.

Craig Smith:

Tell me, what are you talking about when you talk about converting pixels to vectors?

Ilya Sutskever:

Basically, it's turning everything into a language. A vector is like a text string, right? However, you're turning it into a sequence. Yeah, a sequence of what? For example, you could argue that even for humans, life is a sequence of bits. Now, other approaches are used, such as diffusion models where bits are produced in parallel rather than one at a time, but I argue that at some level this difference doesn't matter. I argue that conceptually it doesn't matter, although in practical terms you can get efficiency gains on the order of 10, which is a huge achievement.

AnArmy of Human Trainers

Craig Smith:

On this idea of having an army of human trainers who work with a ChatGPT or large language model to guide it, essentially using reinforcement learning, intuitively this doesn't seem like an effective way of teaching the model the underlying reality of its language. Isn't there a way to automate this? Apparently Yam is talking about developing an algorithmic way to train a core reality model without the need for human intervention.

Ilya Sutzkever:

I have two comments on this. First, I would disagree with the wording of the question. I argue that our pre-trained models already know everything they need to know about the underlying reality. They already have this knowledge about language and also a huge amount of knowledge about the processes that exist in the world that give rise to that language.

And perhaps I should reiterate this point. It's a small tangent, but I think it's very important. What big generative models learn from their data, and in this case, big language models learn from textual data, are concise representations of the real world processes that give rise to that data. That means not only people and something about their thoughts, something about their feelings, but also something about the states that people are in and the interactions that exist between them, the different situations that a person might be in - all of that is part of this compressed process that is represented by the neural network to generate text.

The better the language model, the better the generative model, the better the fidelity, and the more it captures that process. That's our first comment. And in particular, I will say that the models already have knowledge.

Now, with respect to the "army of teachers," as you put it, really, when you want to build a system that works most efficiently, you just say, "Okay, if it works, do more of that." But of course, these teachers are also using the help of artificial intelligence. These teachers don't work on their own, they work together with our tools, they're very effective, like the tools do most of the work, but you need to have control, you need to check the behaviour because you want to end up achieving a very high level of reliability. However, in general, I will say that we simultaneously this second step after we take a ready pre-trained model and then we apply reinforcement learning on it, there's really a lot of motivation to make that as efficient and accurate as possible so that the resulting language model is the most predictable. So there are these teachers who are training the model with the desired behaviour, they're also using the help of artificial intelligence, and their own efficiency is constantly increasing as they use more and more artificial intelligence tools.

So that might be one way to answer that question.

Craig Smith:

So what you're saying is more accurate outputs through this process, over time the model will become more and more discriminating, more and more accurate in its outputs?

Ilya Sutskever:

Yeah, that's right. If you make an analogy, the model already knows a lot of things, and we want to really say, "No, that's not what we want, don't do that here, you made a mistake here in the output data." And of course it's as you say, with as much artificial intelligence in the loop as possible so that the teachers who are providing the final correction to the system, their work is enhanced, they're working as efficiently as possible. It's not quite like the education process of how to behave well in the world, we have to do additional training to make sure the model knows that hallucination is never acceptable, and then when it knows that, then we start working.

It's a reinforcement learning cycle with human teachers or some other variant, but there's definitely an argument to be made that something has to work here, and we'll find out pretty soon.

 

Part 4:

//==========================

Research

Craig Smith:

That's one of the questions, where is the research going? What kind of research are you currently working on?

Ilya Sutskever:

I can't talk in detail about the specific research I'm working on, but I can mention a little bit. I can mention some general areas of research, for example, I'm very interested in making models more robust, more controllable, making them learn faster using less data and instructions, and making them not generate hallucinations. And I think all of these issues that I mentioned are related to each other. There's also the question of how far into the future we look in this question, and what I've commented on here relates to the nearer future.

Thirst for Data

Craig Smith:

The similarities between the brain and neural networks is a very interesting observation that Jeff Hinton made to me. I'm sure it's not new to other people, but the fact that big models or big language models have a huge amount of data with a small number of parameters compared to the human brain, which has trillions and trillions of parameters but relatively little data. Have you thought about it in those terms, and can you talk about what's missing in big models to have more parameters to process the data? Is it a hardware problem or is it a learning problem?

Ilya Sutskever:

Indeed, the current technology structure uses a lot of data, especially at the beginning of training. But later in the learning process, the model becomes less data-needy, so eventually it can learn very fast, although not yet as fast as humans. So, in a sense, it doesn't matter whether we need that much data to get to that point. However, in general, I think it will be possible to extract more knowledge from less data. It's possible, some creative ideas are required, but I think it's possible, and it will unlock many different possibilities. It will allow us to train the model on skills that are missing, and more easily communicate to it our desires and preferences about how we want it to behave. So I would say that fast learning is really very good, and while it's already possible for language models to learn quite quickly once they're trained, I think there's room for more development here.

Craig Smith:

I heard you made the comment that we need faster processors to take this further, and it seems like there's no end to the scaling of models, but the power required to train these models is reaching a limit, at least the limit accepted by the community.

Ilya Sutzkever:

I don't remember the exact comment you're referring to, but we always want faster processors, of course. It's common knowledge that the cost is going up, and the question I would ask is not whether what we're getting outweighs the power charge? Maybe you're paying all these costs and you're not getting anything, then yeah, it's not worth it, but if you're getting something very useful, something very valuable, then sometimes you can solve a lot of problems that you really need to solve, and then the cost can be justified.

Craig Smith:

Do you deal with the hardware issue? For example, are you working with Cerebris, with wafer based chips?

Ilya Sutskever:

Right now, all of our hardware comes from Azure and the GPUs that they provide.

Democracy

Craig Smith:

You were talking about democracy and the impact of AI on democracy. I've been told that if you have enough data and a big enough model, you can train a model on that data and it can come up with an optimal solution that will suit everybody. Do you have any hopes or thoughts about where this might lead in terms of helping people run society?

Ilya Sutzkever:

Yes, that's a great question because it's a look into the future. I think our models will become much more capable than they are now. I have no doubt that changes in the way we train them and use them will allow them to find solutions to problems like this.

It is unpredictable how governments will use this technology as a source of getting different advice. One thing that I think might happen in the future is that because neural networks will be so pervasive and have such a big impact on society, we will find that it is desirable to have some kind of democratic process in which, say, the citizens of a country will provide the neural network with some information about how they want things to be, how they want the neural network to behave, and so on. I can imagine this. It could be a very rapid process of democracy, where you get a lot more information from each citizen, and you aggregate it to determine exactly how we want such systems to operate. That, of course, opens up a lot of questions, but it's one of the things that could happen in the future. I can see it in the World Model.

Craig Smith:

The example of democracy that you give suggests that individuals will have the ability to input data, but, and this relates to the World Model question, do you think that AI systems will eventually be so large that they will be able to understand the situation and analyse all the variables?

Ilya Sutzkever:

What do you mean to analyse all the variables? Ultimately, you have to pick and choose which variables seem really important. I want to go deeper into them because a person can read a hundred books or they can read one book very slowly and carefully and get more information. So there will be some element of choice here. Also, I think it's probably fundamentally impossible to understand all aspects in some sense. Any time there is a complex situation in society, even in a company, even in a medium-sized company, it is already beyond the comprehension of any single individual, and I think that if we build our artificial intelligence systems in the right way, AI can be incredibly useful in almost any situation.

[Music] Ending:

Craig Smith:

That's it for this episode. I want to thank Ilya for his time. I also want to thank Ellie George for helping me organise the interview. If you want to read a transcript of this conversation, you can find it on our website, ionai, it's e-y-e hyphen o n dot ai. We love to hear from our listeners, so feel free to email me Craig, c r a i g at e-y-e hyphen o n dot ai. I get a lot of emails, so mark the email with the subject line "listener" so I don't miss it. We have listeners in 170 countries and territories. Remember, the Singularity may not be close, but AI is changing your world, so pay attention.

 
Understanding the volume of the text, I will prepare a dry semantic "extract", which in my opinion will convey the essence of the developer's thoughts. Key aspects of his understanding of the topic. No matter how fantastic it sounds, the future of mankind may depend on them.
 

After watching the podcast, it's obvious that Ilya Sutskever is a top level expert in ML, NLP, Deep Learning, and has a deeply meaningful understanding of all related topics.

(I'm highly impressed and honestly wasn't expecting it.)

Ilya Sutskever: Deep Learning | Lex Fridman Podcast #94
Ilya Sutskever: Deep Learning | Lex Fridman Podcast #94
  • 2020.05.08
  • www.youtube.com
Ilya Sutskever is the co-founder of OpenAI, is one of the most cited computer scientist in history with over 165,000 citations, and to me, is one of the most...
 

There is an offline version of GPT4All on github - https://github.com/nomic-ai/gpt4all.

Checked it without internet. Weighs about 4GB. Understands English, but constantly fails to cope with the questions asked)

Versions MAC, Linux, Windows. Checked it on Windows. At first 38mb exe is downloaded, then the rest is pulled from the internet during installation.

But maybe someone will test the depth of knowledge. And yes, in spite of the fact that he writes that it is based on OpenAI, it is still this

Original GPT4All Model (based on GPL Licensed LLaMa)



 
Free dolly: introducing the world's first truly open LLM with instructional customisation.
Reason: