Keynote Speech: Breakthroughs in LLM Research and Constitutional AI

Fellows Fund
June 12, 2023


Jared Kaplan, co-founder, and Chief Scientist at Anthropic, discusses the trends driving AI progress and the importance of safety and reliability in AI systems. He highlights the exponential increase in computing power and algorithmic progress as key drivers of AI advancements. Kaplan explains Anthropic's focus on building frontier AI models that are more reliable, interpretable, and steerable. He shares the history and development of GPT models, emphasizing the scaling laws that can predict the potential of large-scale AI systems. Kaplan also introduces Claude, Anthropic's AI assistant, and discusses the concept of constitutional AI, where AI systems evaluate their behavior based on a set of principles. He demonstrates the expanded 100K context window as well as the different uses of Claude.


Full Transcripts:

Thank you so much. It's really fun to be here. So, to summarize all of that, I'm the leader of the team at Anthropic that built our AI assistant Claude that hopefully you may have seen, played with, and hopefully, many more people will be playing with soon. So I'll be talking a little bit about the trends driving AI progress and why we expect it to continue, and it might even get more exciting, maybe a little bit scary. There are two main factors that contribute to AI progress. First, there is an exponential trend in using more computing power to enhance AI systems. Second, there is a similar trend in algorithmic progress, making these systems more efficient. These trends have enabled possibilities this year that weren't possible just a couple of years ago, and they make the future of AI very interesting. Additionally, for generative AI systems, it is crucial to ensure that they don't produce nonsensical or unreliable outputs. Stability, reliability, and safety work are necessary to make these systems behave well. At Anthropic, we focus a significant portion of our research on making AI systems more steerable, reliable, and safe while building some of the most powerful frontier AI systems in the world. We strongly believe that AI will continue to permeate the economy and grow in importance over the next few years.

Now, let me give you a slightly longer history lesson since we're at the Computer History Museum. The team at Anthropic, including our CEO Dario, played a significant role in building GPT-3. Dario, as the director of the project, along with the lead engineers who now work at Anthropic, contributed to its development. It's interesting to explore where these systems originated. GPT-3 is well-known, but what about GPT-1 and GPT-2? Among the notable individuals involved in GPT-1, Alec Radford, who is still with OpenAI, made significant contributions. Following that, Dario had the intuition, based on his observations of trained AI systems, that scaling up the system further would be highly promising. Although this conviction didn't immediately materialize, it eventually led to the development of GPT-2. GPT-2 was an interesting system, and with hindsight, it appears to have been on track to become remarkable. However, when GPT-2 was released, Google published a paper within a year, dismissing it as antiquated and defunct technology. They claimed that these language models weren't advancing and that they had better alternatives. And Dario had a strong conviction, supported by the research conducted by many of us, that revealed these exponential trends. Even if GPT-1 wasn't commercially valuable and GPT-2 was impressive but not quite ready, we were on a trajectory toward more useful and powerful systems. This serves as real-world evidence for these trends.

If we look at the amount of computation used in the AI field, it's growing exponentially. It's fascinating to note that after about 70 years of computer science, we can now train models with more than Avogadro's number of floating-point operations. For example, models like GPT-3 and GPT-4 are trained using approximately 10 to 24 floating-point operations. It's an immense amount of computation that human civilization can harness to train these systems. This is the trend we observe. However, it's important to acknowledge that this computational power comes with a high cost. Training larger systems becomes increasingly expensive, even with advancements in computer chips.

So why do we continue to believe in the potential of these large and costly systems? The answer lies in scaling laws, a precise scientific discovery we made a few years ago, starting with language models such as GPT-1 and GPT-2. We then expanded our exploration of generative AI across various domains, including multimodal systems, math problem-solving, and image generation. What we discovered is a grand unification in AI, where we can use similar systems with different types of data and achieve robust results. The rainbow plots we see illustrate the increasing capabilities of models as we scale up compute power, data, and model size during training. Based on these scaling trends, which span a range of billions in scale, from minimal computation to vast amounts, we made a bet. We decided to train a model with an investment of around $10 million to see what it could achieve. This led us to build GPT-3, which is currently driving generative AI.

Now, let's consider the growth of AI capabilities over time. In the early era of AI systems, they were trained to perform specific tasks, such as playing a single board game. Researchers attempted to generalize these systems to other games, like Space Invaders and Pong, but the results were not satisfactory. These systems were limited to their narrow scope, such as image classification. However, around 2018-2019, we witnessed the emergence of generative AI systems like GPT-1 and GPT-2, capable of more versatile tasks. Over the past couple of years, these systems have become truly useful. Initially, they excelled in areas like code autocomplete but struggled as chatbots. However, in recent years, we have witnessed a rapid progression. These systems have evolved from being not particularly intelligent or versatile to performing at the level of high school or even college students in various academic subjects. They can translate languages, exhibit complex behaviors, adapt to users (sometimes in undesired ways, which we call "sycophancy"), and even play complex games involving deception. This remarkable advancement has been driven by scaling up compute power and rapid algorithmic progress, exceeding our expectations.

Now, let's delve into why safety work and stability are crucial. Even when attempting to guide these systems, researchers and deployed models can make mistakes. We have encountered problems where systems can be tricked into generating offensive content or providing unreliable information. While these examples may be cherry-picked, they highlight the need for AI systems to be more honest, factually accurate, and trustworthy. When we survey customers at Anthropic, the number one improvement they desire in AI systems is enhanced honesty and reliability. This focus on addressing these concerns drives our research efforts, along with the wider AI community. Despite making rapid progress in this area, the breakneck pace of technological development poses challenges in keeping up.

To shed light on our work in controlling, steering, and understanding these models, let's examine a timeline of achievements. In 2016, Dario and others published a paper that aimed to concretize the problem of controlling AI systems. Five years ago, we made significant progress in Reinforcement Learning from human preferences, which heavily influences the development of Claude and ChatGPT. Simultaneously, we advanced in interpreting how these systems function, including feature visualization and understanding the neural circuits within neural networks that govern their behavior.

We've witnessed significant advancements in scaling laws, which demonstrate why increasing investments in AI are justified and expected to continue. Moreover, we've introduced two exciting developments: Claude and constitutional AI.

To understand constitutional AI, let's start with the concept of human-in-the-loop training. Initially, we had generative AI models that could generate any response when prompted. To train these models, we presented two possible responses to a query and had humans evaluate which one was better. Through this process, we reinforced desirable behaviors and discouraged undesirable ones. With tens or hundreds of thousands of human conversations, we could train AI systems to improve.

Constitutional AI takes this idea further. Instead of relying on human feedback, we define a set of principles, a "constitution," that AI systems should adhere to. We then ask the AI itself to evaluate its own behavior based on these constitutional principles. This eliminates the need for constant human involvement and enables rapid iteration. AI systems can effectively self-train, allowing for the training of new models with updated principles within a day, rather than waiting for extended periods to collect human feedback. Notably, constitutional AI has demonstrated comparable or even superior performance to human feedback in steering AI systems away from harmful tendencies.

Our approach to training Claude involves a blend of constitutional AI and human feedback from domain experts. By incorporating input from experts such as software engineers or lawyers, we ensure that Claude's utility aligns with their specific requirements. We've already identified numerous use cases for Claude, including search, content generation, summarization, productivity, coding assistance, and code debugging. This versatility is in line with the typical applications of generative AI. To reach a broader customer base and safeguard privacy and data security, we've partnered with Amazon on Amazon Bedrock, allowing us to deploy Claude at a larger scale.

Let me provide a couple of use cases and videos to illustrate our progress. Notion, an early adopter of AI integration into their platform, has leveraged Claude for various features like writing and summarization. Notion's quick iteration aligns well with our own pace of development, enabling us to enhance features rapidly. Another notable customer, Robin AI, utilizes Claude to instantaneously edit lengthy contracts and make them more favorable to one party. This is particularly valuable in the legal domain, where optimizing contracts to strengthen a party's position is crucial.

These examples demonstrate the practical applications of Claude in real-world scenarios. As we continue testing and refining Claude with customers, we're excited about the range of possibilities it offers. By partnering with Amazon, we can expand its reach and ensure the privacy and safety of customer data.

In summary, the combination of Claude and constitutional AI has unlocked new opportunities for AI development and control. With a focus on specific use cases, customer feedback, and strategic partnerships, we are rapidly advancing the capabilities and applications of Claude.

We've recently introduced a new feature in Claude, and I'm excited to share a video demonstration with you. In the video, you'll see that we have addressed a common concern related to the context window of language models.

Language models have a limited memory of the words they've recently encountered, unlike humans who can remember information indefinitely. Early language models had a context window that could only hold a few hundred words. As we progressed, the context window in earlier versions of Claude expanded to about a 6,000-word or 9,000-token capacity.

However, many customers found this limited context to be a problem. They desired the ability to provide Claude with larger documents, such as books, giant reports, or financial documents, and have it answer questions or generate summaries based on the entire content. That's why, in our latest release, we're thrilled to introduce a significant enhancement: a 100,000-token context window in Claude. Now, you can effectively fit an entire document like "The Great Gatsby" within Claude's context.

The video also showcases another interesting capability. While Claude may not be familiar with LangChain, an API platform that combines different language models for complex tasks, Claude can leverage LangChain's documentation to generate code and provide explanations on how to use LangChain with Claude. By providing Claude with comprehensive manuals on the new API, you can easily seek assistance and guidance from Claude.

Lastly, we have also focused on optimizing Claude's performance in terms of speed and latency. Waiting for language models to generate responses word by word can be time-consuming and inconvenient. To address this, we have made significant improvements. In fact, Claude can generate instant samples at an impressive rate of nearly 500 characters per second. This optimized speed ensures quick responses, making it highly valuable in use cases where low latency is critical.

We are thrilled about these advancements and how they enhance the usefulness of Claude for our customers. We hope these improvements make Claude even more valuable to our users across various domains and applications.

Call to Action

AI enthusiasts, entrepreneurs, and founders are encouraged to get involved in future discussions by reaching out to Fellows Fund at Whether you want to attend our conversations or engage in AI startups and investments, don't hesitate to connect with us. We look forward to hearing from you and exploring the exciting world of AI together.

Pitch Your Vision, Let's Talk.

Got an innovative venture? Share your pitch with Fellows Fund and schedule a meeting. Submit your email below, and let's explore the potential partnership together.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.