How Chatbots and Large Language Models, or LLMs, Actually Work - The New York Times


AI Summary Hide AI Generated Summary

How Large Language Models Work

The article explains the inner workings of Large Language Models (LLMs), the technology powering AI chatbots like ChatGPT and Google Bard. It focuses on the core functionality of these models: predicting the next word in a sequence of text.

Building a Simplified LLM: MailBot

To illustrate, the article describes creating a simplified LLM, named MailBot, designed for email replies. The process involves defining an objective function—a goal for the AI—and training it on a vast amount of text data.

Step-by-Step Process

The article outlines a simplified step-by-step approach:

  • Step 1: Setting a Goal (Objective Function): Defining the AI's purpose, such as predicting the next word in a text sequence.
  • (Subsequent steps are not fully shown in the provided excerpt) The full process would involve data collection, training the model, and refining its abilities.

The article emphasizes that current LLMs are relatively new, but their capabilities have rapidly expanded.

Sign in to unlock more AI features Sign in with Google

In the second of our five-part series, I’m going to explain how the technology actually works.

The artificial intelligences that powers ChatGPT, Microsoft’s Bing chatbot and Google’s Bard can carry out humanlike conversations and write natural, fluid prose on an endless variety of topics. They can also perform complex tasks, from writing code to planning a kid’s birthday party.

But how does it all work? To answer that, we need to peek under the hood of something called a large language model — the type of A.I. that drives these systems.

Large language models, or L.L.M.s, are relatively new on the A.I. scene. The first ones appeared only about five years ago, and they weren’t very good. But today they can draft emails, presentations and memos and tutor you in a foreign language. Even more capabilities are sure to surface in the coming months and years, as the technology improves and Silicon Valley scrambles to cash in.

I’m going to walk you through setting up a large language model from scratch, simplifying things and leaving out a lot of hard math. Let’s pretend that we’re trying to build an L.L.M. to help you with replying to your emails. We’ll call it MailBot.

Step 1: Set a goal

Every A.I. system needs a goal. Researchers call this an objective function. It can be simple — for example, “win as many chess games as possible” — or complicated, like “predict the three-dimensional shapes of proteins, using only their amino acid sequences.”

Most large language models have the same basic objective function: Given a sequence of text, guess what comes next. We’ll give MailBot more specific goals later on, but let’s stick to that one for now.

We are having trouble retrieving the article content.Please enable JavaScript in your browser settings.

Thank you for your patience while we verify access. If you are in Reader mode please exit and log into your Times account, or subscribe for all of The Times.

Thank you for your patience while we verify access.

Already a subscriber? Log in.

Want all of The Times? Subscribe.

Was this article displayed correctly? Not happy with what you see?


Share this article with your
friends and colleagues.

Facebook



Share this article with your
friends and colleagues.

Facebook