AI 2023. Meet ChatGPT. - page 51

 

I had another unpleasant mishap while working with ChatGPT.

I was going to translate a webinar by Stanford Professor Christopher Potts, head of the Department of Linguistics at the Faculty of Computer Science, on recent breakthroughs in natural language processing and language model technology. Took the text of the hour-long webinar video from YouTube and threw it into Google translator in chunks. As a result, I got 19 pages of unedited and unreadable translation.

Realising that editing it myself would take a whole day, I decided to "feed" ChatGPT pieces of text and ask it to retell them in the first person, which the AI enthusiastically took up. The result was quite decent. However, several times ChatGPT "hung up" and I told it to "keep going" and it kept going.

Towards the end, I had some doubts and decided to check if the original text corresponded to the AI's paraphrase and to my dismay I realised that after the word "continue" the ChatGPT was writing nonsense. Fortunately, I only told him "continue" a few times and I think he retold the main text correctly.

Anyway, I will finish and publish the professor's lecture soon. It's a very interesting speech.

 

And so Stanford University Professor Christopher Potts' speech, "GPT-3 & Beyond."

(Translated by Google and edited by ChatGPT)

(Original here: ( 6) Stanford Webinar - GPT-3 & Beyond - YouTube )


Presenter:

Chris Potts is Professor and Chair of the Department of Linguistics in the Department of Computer Science. He is an expert in natural language and teaches a postgraduate course on Understanding Natural Language, which has evolved into a professional course. Chris also has an interesting podcast and has conducted many research papers and projects. Links to more information about him are available on the platform. Chris isinterviewed in this context and the presenter thanks him for his participation in the programme.

Chris Potts:

I believe that we are living in a golden age of natural language, a time of much innovation and great change. In 2012, when I started teaching a course on natural language understanding at Stanford, I could not have imagined how effective and widespread the models and technologies we see today would become. I admire models like DALL-E 2 and Stable Diffusion, which provide superior text-to-image conversion, and GitHub Copilot, which relies on the Codex model to create code. I also really like new search technologies like You.com that are changing the experience of searching the web. I believe that the social impact that the development of NLUcauses is one of the main factors that unites us in this golden age.

The incredible language models created by OpenAI, including GPT-3 and the new Davinci-003 model, have the ability to convert speech to text and answer complex questions. While these models don't always have a deep understanding of the world, they can still provide complete and accurate answers to many questions. Free open source models are also available that can be downloaded and used if sufficient computing resources are available. While these models are not always flawless, the speed of their progress towards reliability and trustworthy use is impressive.

In the paper by Kiela et al. 2021, we talk about benchmark saturation, which has been achieved faster than ever before. In the graph I presented, the x-axis is time lagged since the 1990s and the y-axis is a normalised measure of human capability assessment. MNIST digit recognition and speech-to-text conversion, known as Switchboard, were launched in the 90s and took about 20 years to reach the human capability score. ImageNet was launched in 2009 and saturation was reached in 10 years. But with the GLUE and SuperGlue tests, we've seen the pace of change accelerate. Despite being cynical about this measure of human performance, we have seen a rapid increase in the rate of change in this area. I'm sure this carries over to the present with our largest language models. This was described in a good post from Jason Way. We are seeing the emergence of new capabilities in large language models, which is an impressive development.

Take a look at the graphs on the screen. The x-axis on the graphs shows the size of the model, and the y-axis shows the accuracy. At a certain point, very large models can achieve such accuracy that they can even handle very complex tasks. The graphs show that for 137 tasks, models can demonstrate this emergent ability, including tasks that were explicitly created to stress-test language models. One of the main factors that accounts for the success of language models is their raw size. To further develop the technology, it is important to understand how the size of models changes over time. In 2018, the largest model had about 100 million parameters, and by the end of 2020 there was a megatron model with 8.3 billion parameters and a GPT-3 model with 175 billion parameters. Now there is already a PaLM model from Google that exceeds 500 billion parameters. The scale of these models is absolutely enormous, and even at one time, when the models consisted of only 100 million parameters, I foresaw that they would be gigantic. All of this has the potential to impact people's lives, both positively and negatively, and that we are living in a golden age of language model development.
Stanford Webinar - GPT-3 & Beyond
Stanford Webinar - GPT-3 & Beyond
  • 2023.01.31
  • www.youtube.com
GPT3 & Beyond: Key concepts and open questions in a golden age for natural language understandingListen in as Professor Christopher Potts discusses the signi...
 

Each of us Each of us can contribute to the NLU in this era of giant models. I realise that not all of us Each of us can contribute to the NLU in this era of giant models. I realise that not all of us can answer in the affirmative if they have $50 million and a love for deep learning infrastructure. But that doesn't mean we can't help.

There are many things we can do to contribute. We can work on building smaller models. We can help create better tests to help us measure the performance of our systems. We can also help solve the last mile problem for productivity applications.

But if I have to single out one topic that we can all get involved in, it's the quest for augmented contextual learning. That's something we can get involved in a lot of innovative ways. And, of course, we can help create accurate human-understandable explanations about how these models work.

But to realise the potential of these technologies, a huge number of things need to be done. It requires dual development, innovation from subject matter experts well-versed in human-computer interaction and experts in artificial Now, in 2023, we know so much that helps us develop new ways of working with artificial intelligence. We've discovered a thing called step-by-step reasoning, which allows models to think in a more intelligent way. For example, Omar Khattab showed me an example of a prompt from a logic and common sense exam, and gave me instructions on how to use special markup to sample the reasoning I wanted to see. This allows the model to create its own reasoning, which in turn helps generate answers to the questions. This is interesting because it makes it much easier to program an AI system using only hints, rather than writing complex deep learning code. This gives us more opportunities to experiment and develop new systems.

I want to move to learning in the context of augmented search, which combines language models with retriever models. Big language models like Transformer and BERT have revolutionised search. In 2018, Google and Microsoft started using aspects of BERT in their search technologies. We are now combining language models and retriever models to improve search. This gives us the ability to create more accurate search results using context and natural language understanding.

I'd like to discuss all these things in more detail, but we're running out of time. So I suggest we focus on search Now, in 2023, we know so much that helps us develop new ways of working with artificial intelligence. We've discovered a thing called step-by-step reasoning, which allows models to think in a more intelligent way. For example, Omar Khattab showed me an example of a prompt from a logic and common sense exam, and gave me instructions on how to use special markup to sample the reasoning I wanted to see. This allows the model to create its own reasoning, which in turn helps generate answers to the questions. This is interesting because it makes it much easier to program an AI system using only hints, rather than writing complex deep learning code. This gives us more opportunities to experiment and develop new systems.

Not everyone can answer affirmatively to the question of whether they have 50 million Each of us can contribute to NLU in this era of giant models. I realise that not all of us can answer in the affirmative whether they have $50 million and a love for deep learning infrastructure. But that doesn't mean we can't help.

There are many things we can do to contribute. We can work on building smaller models. We can help create better tests to help us measure the performance of our systems. We can also help solve the last mile problem for productivity applications.

But if I have to single out one topic that we can all get involved in, it's the quest for augmented contextual learning. That's something we can get involved in a lot of innovative ways. And, of course, we can help create accurate human-understandable explanations of how these models work.

 

Contextual learning became popular thanks to the GPT-3 article, which did a thorough initial study and showed promising results. I realised that it works like this: we have a large language model that we prompt with a bunch of text, such as the passage and title from the GPT-3 article. We give the model a context and a question-and-answer demonstration to help it learn in context. Our goal is to coax the model to answer the questions by elicitation to find the answer as a substring of the passage we gave it. When we have a relevant question that we want the model to answer, we prompt the model with that prompt and it puts it into some state and then generates an answer that we evaluate for success. In contextual learning, the model learns from the context we give it, as opposed to the standard paradigm of standard supervision where we create a dataset of positive and negative examples and train the model on that dataset.

Perhaps surprisingly, I can already see that our model will not be able to scale to the complexity of human experience. For different emotions, such as optimism and sadness, we will need separate datasets and possibly separate models. And that's just the beginning of all the problems we might want to solve with our models. But we can use a single, big, frozen language model to achieve all of these goals. We'll give examples of models just expressed in flat text of positive and negative instances, and hopefully that's enough to recognise the context and establish the differences. The model must learn the meanings of all the terms and our intentions and determine how to make distinctions on the new examples from the prompt. What are the mechanisms behind this? I'm going to identify some of them for you.

 

There is a lot of material in our course to help you master Transformer representations and maths. I'll skip the details and say that if you study this model in depth, you'll follow the same path we all do. First you will ask the question: how does this work? The circuitry looks complicated, but then you'll realise that it's just a set of mechanisms. But then you will have a deeper question: why does it work so well? This question is still not fully resolved, and many people are working on explaining why Transformer is so effective.

The second important aspect is self-control, which is a powerful mechanism for generating rich representations of form and meaning. The goal of a model in self-monitoring is to learn from patterns of coincidence in the sequence on which it is trained. This is purely distributive learning. The model learns to assign high probability to attested sequences. This is a fundamental mechanism for learning associations between streams of symbols, including language, computer code, sensor readings, and images.

Self-monitoring differs from standard supervised learning in that the goals do not mention specific symbols or relationships between them. This allows the model to be trained on a large amount of data without human effort. This has led to the rise of large-scale pre-training. As a result, we get rich results and this extends the capabilities of the models.

 

It was incredible in terms of creating efficient systems. We got ELMO, which was the first model capable of creating contextualised word representations, and then came the really big language models - BERT, GPT and finally GPT-3 on a scale previously unimaginable. But we must not forget the role of human feedback in all this. The best OpenAI models are called Instruct models, and they are trained to do much more than just self-monitor. This is because in the first stage the model is tuned to human-level observation, and in the second stage the model generates output that people rank, and this feedback is used to reinforce the learning mechanism. The important human input makes these models more efficient and takes us beyond the self-monitoring step, which is important to consider. Despite this, it must be recognised that most of the revolutionary steps in AI are the result of a huge amount of human effort that is often hidden from the public. Human feedback plays an important role in building better models, and OpenAI teams work in great detail to provide feedback for different tasks. Overall, the process is one of sequential and incremental thinking, and human feedback plays an important role in improving models.

Now, in 2023, we know so much that helps us develop new ways of working with artificial intelligence. We've discovered a thing called step-by-step reasoning, which allows models to think in a more intelligent way. For example, Omar Khattab showed me an example of a prompt from a logic and common sense exam, and gave me instructions on how to use special markup to sample the reasoning I wanted to see. This allows the model to create its own reasoning, which in turn helps generate answers to the questions. This is interesting because it makes it much easier to program an AI system using only hints, rather than writing complex deep learning code. This gives us more opportunities to experiment and develop new systems.

I want to move to learning in the context of augmented search, which combines language models with retriever models. Big language models like Transformer and BERT have revolutionised search. In 2018, Google and Microsoft started using aspects of BERT in their search technologies. We are now combining language models and retriever models to improve search. This gives us the ability to create more accurate search results by utilising context and natural language understanding.

I think these are just two well-known stories about many examples of major search technologies. In that era, they included elements of BERT. And then, of course, in the current era, we have startups like You.com that have made big language models quite important to the whole search process in the form of providing results, as well as interactive search with dialogue agents. So all of this is exciting, but I'm an NLPer at heart. And so for me in some ways the more exciting direction is the fact that finally search has revolutionised NLP, helping us bridge the gap to much more relevant knowledge-intensive tasks.

To give you an idea of how this is happening, let's use the answers to the questions as an example. So, prior to this work in NLP, we used to put question-answer or QA in the following way: you saw this already in the GPT-3 example - you would have, as indicated during testing, a headline and a contextual passage, and then a question. And the task of the model is to find the answer to that question as a literal substring of the context snippet, which was guaranteed by the nature of the dataset. As you can see, models are really good at this task. Superhumans are definitely up to the task. But it's also a very rare task. It's not a natural form of answering a question in the world, and it's certainly different from the scenario of, for example, doing a web search.

 

So the promise of open-ended formulations of this task is that we're going to connect more directly to the real world. In this formulation, we've just been asked a question during testing, and the standard strategy is to rely on some kind of search mechanism to find relevant evidence in a large corpus or maybe even on the Internet.

Now I'm biased in describing things by assuming that we're retrieving a passage, but there's actually another approach - LLM for everything. In this approach there is no explicit retriever, just a big opaque model that processes the question and produces the answer. It's a very inspiring vision, but there are a lot of danger zones.

The first concern is efficiency. In this approach, we are asking the model to play the role of a repository of knowledge and language features, which leads to an explosion in the size of the models. If we could isolate these functions, we could use smaller models. Another problem is updateability. If something changes in the world, the model needs to be updated to reflect those changes. This is a very interesting problem, but we are still far from being able to guarantee that changes in the world will be reflected in the behaviour of the model, which affects the validity and explainability of its behaviour.

There is also the problem of the origin of the answer. We don't know if the model can be trusted and where it got its answer from. In a standard search we usually get several pages to verify information, but here we only get one answer. If the model were to tell us where it got its information from, that would be useful, but we still couldn't trust that information. This violates the contract between the user and the search technology.

Despite all these problems, LLM models for everything are very effective and can synthesise information, which makes them very interesting. But it pays to be cautious and consider alternatives such as search-based approaches.

I was very impressed with Davinci-003's response to the question of whether professional baseball players are allowed to glue little wings on their caps. This question was posed to me after reading a great article by Hector Levesque about stress testing our models by asking them questions that seemingly contradict any simple distribution or statistical learning model and really finding out if they have a model of the world. Davinci-002 gave a Levesque-style answer that there is no law against it, but it is not common, but seems true. However, when I asked Davinci-003 the same question, he replied that professional baseball players are not allowed to glue little wings on their hats because Major League Baseball has strict rules about the appearance of players' uniforms and caps, and any modifications to the hat are not allowed. I was disappointed and wished I had a web page with proof that this was true.

I realised that we had broken the implicit contract with the user that we expect from search, so I decided to consider an alternative to NLP based search or augmented search. I asked a very complex question in a standard search box and decided to use a language model to encode the query into a dense numerical representation that reflected aspects of its form and meaning. We were then going to use the same language model to process all the documents in our document collection so that each would have some numerical representation of deep learning.

Now, based on these representations, we can evaluate documents against queries just like in the good old days of information retrieval. We have some results and can offer them to the user as ranked results. However, we can go further and use this rich semantic space to reproduce the

 

Further, we can use this model for augmented search. This means that we can use information about the user's query to find additional sources of information that may be useful to them. For example, if a user is looking for information about professional baseball players, we can offer them a news summary of recent games, player statistics, or interviews with coaches.

In addition, we can use this model to create more accurate answers to complex questions. For example, if a user asks the question "Which baseball players won the World Series in 2010?", we can use the model to analyse a large amount of data and provide an accurate answer.

However, as with any machine learning model, this model is not perfect. It may have problems understanding complex queries or may not accurately estimate document relevance enough. Therefore, we must constantly improve this model by experimenting and training it on new data.

Thus, augmented search based on a language learning model can be a powerful tool to provide more accurate and useful information to the user. It can be used to find answers to complex questions, augmented search and provide the user with breaking news and other useful information.

We have the ability to use another language model which we call generator or reader. We can extract information from different sources and synthesise them into an answer that directly addresses the user's needs. This is very efficient because our system can have far fewer parameters than a fully integrated approach, and updating it is quite easy. We use a frozen language model for re-processing when pages in the document repository change, and we can ensure that the changes in information are reflected in the results. In addition, we track the origin of the information because we have all the documents that are used to generate the answers. We use a search-based approach that is superior to fully integrated language models, but we have retained the advantages of language models because we have a read generator that can synthesise information into answers that meet user needs.

When we design systems nowadays, we use pre-trained components such as our index and retriever, and language models such as the read generator. But the question is how to combine all these components into a complete solution. The standard deep learning answer to this question is to define a set of task-specific parameters that link all these components. In practice, this can be very challenging for researchers and system designers, as they can be very opaque and the scope is large. But maybe we are coming out of an era where we should be doing this at all.

So we have a system that is built on using models that can communicate in natural language and a retriever that extracts text and creates text with evaluations. We can have the retriever communicate with the language model to pass messages between them. This will enable the development of systems that have a wide design space and influence over what they are intended for.

To give an example, we can use a system to find the answer to a question asked by a user. We start with a prompt containing the question and retrieve a snippet of context using a retriever. We can also add training examples demonstrating the intended behaviour of the system to the hint to train on the context.

We can use the retriever to find the appropriate context for each of the training examples and help the system understand how to reason in terms of the evidence. This opens up a huge design space and allows for integrated information packages from which the model can benefit.

I'm probably starting to see this. We can rewrite the text a bit using the retriever and intertwining language model. We can also think about how we chose the background passage. I assumed that we would just get the most relevant passage according to our query. But we can also rewrite the user query to use the demonstrations we built and get a new query that helps the model. This is especially useful if we have an interactive mode where the demonstrations are actually part of a dialogue story or something like that.

And finally, we can pay attention to how we generate the response. I assumed we would take the highest generation of the language model, but we can do a lot more than that. We can filter generations just for those that match a substring of a passage that reproduces the old way of answering a question, but now in this completely open formulation. This can be incredibly powerful if we know that our model can extract good background passages here. Those are two simple steps. We can also use a full Retrieval Augmented Generation (RAG) model, which creates a full probability model and allows us to marginalise the contribution of passes. This can be incredibly powerful to maximise the power of the model and generate text dependent on all the work we've done here.

I hope this has given you an idea of how much can happen here. I think there's a new mode of programming emerging that involves using large pre-trained components to develop coding prompts that are essentially full AI systems that are completely tied to message passing between these frozen components. We have a new paper called Predictive Search Demonstration, or DSP, which presents a lightweight programming environment in which we consider new approaches and experiment with different combinations of models and learning methods. Experience shows that such new ideas and methods can lead to better results. I think we are only at the beginning of the journey in developing these systems. But already we can see great potential in what we are doing. I'm excited to be a part of this process and to be a part of creating these innovative technologies.

Our team continues to work on improving our system and researching new methods. We believe that our research and development will have a significant contribution to the field of artificial intelligence and will lead to new and more efficient tools for working with data and automating processes.

Overall, I think we are on the cusp of a new era in AI development. The technologies we are developing will have broad applications in a variety of fields, including science, business, medicine and others. I am confident that we can achieve incredible results if we continue to work together and are not afraid to experiment with new ideas and methods.

We've only explored a tiny fraction of this space, and everything we do is suboptimal. And these are just the conditions under which you get a huge jump in productivity when doing these tasks. So I suspect the bolded line we have here won't be long, given how much innovation is going on in this space. And I want to make a pitch for our course here.

Essentially it works like this: you have an assignment that helps you build some basic plans, and then to an original system that you take to a competition, which is an informal data and modelling competition. Our newest one is called OpenQA with multiple shots and a ColBERT search. It's a version of the problems I just described for you. This is a problem that even five years ago could not be meaningfully posed. And now we're seeing students doing incredible, cutting-edge things in this mode.

This is exactly what I just described for you. And we're at a point where a student project can lead to a paper that amazingly does lead to cutting-edge features. Again, because there's so much research that needs to be done. I don't have a lot of time. I think I'll just briefly again name these important other areas. that I haven't paid attention to today, but I think they're very important, starting with datasets. I talked about system design and task execution, but that's now and always will be the case. that contributing to new benchmark datasets is basically the most important thing you can do.

I like that analogy. Jacques Cousteau said: "Water and air are the two basic fluids on which all life depends." I would extend that to NLP. Our datasets are the resource on which all progress depends. Now Cousteau has expanded this to say, "We have become global dustbins." I'm not so cynical about our datasets. I think we've learnt a lot about how to build effective datasets.

 

We're getting better at it, but we need to be wary of this metaphorical pollution. And we always need to push our systems with more complex tasks that get closer to the human capabilities that we're actually trying to force them to achieve. And without adding datasets, we could be fooling ourselves when we think we're making great progress.

The second thing I wanted to call out relates to the explainability of the model. We now live in an age of incredible influence, and this has rightly driven researchers to questions of system reliability, security, trust, approved use, and pernicious social bias. We need to take all of these issues seriously if we are going to responsibly exert all of the influence that we are currently achieving.

It's all incredibly complex because the systems that we're talking about, these huge, opaque, non-analytically comprehensible devices like this that just cloud our understanding of them. And so for me it sheds light on the importance of achieving analytical guarantees about our modelling behaviour. I think that's a necessary condition for taking any of these topics seriously. And the goal there, in our terms, is to achieve a correct, human-readable understanding of the explanation of model behaviour. We have great coverage of these methods in the course, hands-on materials, screencasts, and other things that will help you participate in this research. And also as a side effect to write absolutely outstanding discussion and analysis sections for your papers.

I think the last mile problem is extremely important. We can reach 95% of our AI goal, but the last 5% is no less challenging than the first 95%. My group and I have been thinking a lot about the accessibility of images and understanding their context by blind and visually impaired users. This is a critical social problem that needs to be addressed. Image-based text generation has become incredibly good over the last 10 years, but we need to improve the descriptions of these images so that visually impaired users can get all the information they need. This will require more research in HCI and linguistic research, as well as fundamental advances in AI.

I made two predictions about artificial intelligence and its impact on our lives. The first prediction was that AI will be used more and more in various fields, including customer service. This will lead to people often being unsure whether they are communicating with a human or an AI.

The second prediction was that the negative effects of NLP and AI would increase along with the positive effects. I worried about things like the spread of misinformation, market disruption, and systemic bias.

However, I failed to predict many important things, such as progress in text-to-image conversion models like Dall-E2 and Stable Diffusion. I thought this area was going to wither for a long time, but we've had an incredible set of advances.

I have to admit that my predictions were made in 2020 and they were supposed to last 10 years, but most of them have already come true. However, I believe that predicting over a longer period of time is impossible now as we are in a constant state of progress and change.

I am very interested in your predictions for the future, but I think I will stop here and not make any further predictions until the end of 2024.

 

Having reread the "lecture" I regretfully realised that the work was in vain and the text does not convey the professor's real speech. Everything is distorted beyond recognition.

Tomorrow I will try to write my generalisation of the content of the webinar.

Reason: