In-Depth
The Proper Care and Feeding of LLMs: Conversation with Programmer Michael Washington
Why context windows are important, the AGI lie, mastering the RAG pattern, the three steps to LLM nirvana, and why you should never, ever, skip step two.
There's been plenty of hand-wringing about the use of AI to generate creative works, from photography to music to novels, but there are few better measures of how far large language models (LLMs) like OpenAI's GPT have come in just the past year -- and how much they still have to go before AI milestones like artificial general intelligence (AGI) can graduate from theoretical to inevitable.
Michael Washington, a Microsoft MVP and Blazor expert, has written plenty of technical books for enterprise IT pros, but now he's working on a novel and using AI to help him write it. He developed a software dubbed "AIStoryBuilders" for just this purpose; it's free, open source and available here to download. The process of using AI in an inherently creative project has illuminated for him the potential and progress of generative AI technologies, but also their limitations (which, at least today, are manifold).
In a recent interview with Pure AI, Washington shared some of the insights he's gleaned about LLMs since creating AIStoryBuilders. Following was our conversation, edited for brevity and clarity.
Pure AI: How far have LLMs come in the past year?
Washington: The two biggest things are that the capabilities of GPT-4 are better, much better than, say, GPT-3.5. Its ability to understand -- well, it doesn't really understand things -- but its ability to generate output that makes it seem as if it understands is a lot better. And also the context windows, the fact that they're so much bigger now. Those are the two things that have truly changed.
Everything else is really just window dressing. For example, OpenAI allows people to create their own GPTs and, of course, [Microsoft] is also pushing you to create your own Copilots. Those are not really new steps. They're just...allowing you to package solutions, essentially creating your own macros. The two true advancements is the fact that the GPT-4 model in that class does perform better and is capable of doing more than previous GPT models. That is a real thing. And the fact that OpenAI has made the context window much larger.
How have these advances affected the work you're doing with your novel project?
The models that I'm doing now are pretty much the 32,000-token models that do allow you to do a lot more. For example, this application that I have, a free open source application called AIStoryBuilders.com, allows people to create novels. And the reason I created this program was because I just finished writing my last book, which was about how to use LLMs, how to use the RAG -- retrieval augmented generation -- pattern, which is very important. It was a way to show how to use that to do something, which I felt was useful.
After I developed that program -- because you can switch between running that program through GPT-3, 3.5 or 4 -- that's when I realized that the performance, the quality you get, is vastly different. That's when I realized, "Oh, yeah, there is a difference." And because the context window is bigger, what that allows us to do is, when you're using the RAG pattern, a bigger context window means you can provide more information to the LLM. And, therefore, that also means you're going to get better output.
Before we get too far ahead, what is the RAG pattern? Can you do a little explainer on that?
The best way to explain what RAG is about is, it's the process of providing to the LLM the information, all the background information. We then just shove it into the model, and we allow it to sort through that information and produce a response.
"People think that the LLM is smart, they think it understands. It doesn't. It's just a mathematical calculation."
For example, if I go to [the Microsoft Copilot-integrated] Bing.com, that's what's happening. Say I am searching for how to use C# and OpenAI. I type that in and what Bing does is, it then searches the Web and pulls up a bunch of Web pages and even the content of those Web pages and feeds that to the LLM. The LLM then produces an output that says, "Here are a couple of Web sites, and basically those Web sites say that you should download Visual Studio," because it read all the articles and summarized them. That's really what the RAG pattern is. It's basically the pattern that says, "Go out, get the information. Then just dump that information into the LLM and let it sort through it, understand it" -- but it's not really understanding, it's just a completion engine -- "but let it produce the output." That's the RAG pattern.
So, is this what's giving the impression that the LLM is "understanding"?
People think that the LLM is smart, they think it understands. It doesn't. It's just a mathematical calculation. For example, Microsoft's Copilot. Say I'm typing an e-mail and I start mentioning this project, and suddenly Copilot suggests that I should have a link to something in my OneDrive. That, again, is the RAG pattern, meaning the only way for the LLM to make these really cool suggestions is for Microsoft to use the RAG pattern, feeding to the LLM: "Hey, there are all these documents related to all these projects." Then, when the person starts typing this e-mail talking about their project, it goes, "Hey, do you want to insert a link to one of these documents?"
That's the RAG pattern. That's the dominant pattern that pretty much everyone's using because it works well. It really works well. And other people may not be aware of that, but that's what's going on. ... If you don't feed to the LLM the information [the right way], it then is not helpful. The RAG pattern is what makes it helpful.
OK, back to your AIStoryBuilders application. How does all of this fit into that?
I found that with my application, if I don't carefully track everything important about the novel that someone's working on and carefully feed to the LLM -- "This is the part of the novel that you need to concentrate on" -- you'll start going off on a tangent. It won't be helpful. So that's when I realized that the RAG pattern and how we feed things to the RAG pattern, that is what makes an app good or not.
For example, there are a lot of other story-builder products out on the market. The problem that I found with the other ones is they don't track every single element of the story, so that when people were using their product, [they find that] it's good for Chapter 1, but when they want to get to Chapter 2, it doesn't [account for] all this stuff that happened in Chapter 1. You need to refer back to it. So that's why with my application I made sure I track all the elements and I'm constantly feeding it to the API, so that it keeps it on track. And that's why I developed this understanding that the LLM is only as good as what we give it.
The No. 1 thing that separates my version versus the other ones out there is the concept of timelines. Every single thing that is in the program, using the RAG pattern, is attached to a specific timeline. If I have a character who's 3 years old, his goal is to go outside and play. He's three feet tall. In another timeline, this character is 40 years old. Now, his goals are to save for retirement. He has a car. What my program does is it separates those two timelines, so that whenever you're doing a paragraph, you have to feed to the LLM which timeline to draw on. That keeps a paragraph about a kid who's playing with blocks from suddenly standing up, walking out, getting in his car and driving off.
So where does prompt engineering come in? Crafting a good prompt -- does RAG supersede that?
Let's say it's a staircase. Step No. 1 is get the information. In step No. 2, get the right information. You've got a bunch of information, but we only want to feed to the LLM the right information. ... Step No. 3, that's where you put in the prompt engineering. Construct the prompt in a way so that it allows the LLM to understand what you're trying to do. That's where the prompt engineering is important. But the first two steps, if you don't do a good job on them, you're done. And OpenAI, they realize this. That's why with their GPTs, a lot of them are trying to get people to follow the RAG pattern.
All this reliance on getting these three steps exactly right -- is this why LLMs aren't capable yet of AGI?
We're nowhere close. So that's why whenever I hear about OpenAI saying we're on the cusp of AGI, I'm sitting there going, "You guys must have something that I haven't seen." Because this thing isn't anywhere close, and to try to act like it is, that's the part that's disingenuous.
I mean, it's a calculator. It's almost like someone saying that a simple calculator is brilliant. It's just a calculator. It's a mathematical thing. It doesn't understand what it's doing. The LLM is using human language, and we perceive that if something is able to manipulate human language, that therefore it has human understanding. I'm saying no, it's manipulation of the human language, it is a mathematical algorithm. That's all it is. ... It [also] doesn't grow. That's the other thing I would say to people if they say that this thing is close to artificial general intelligence. It doesn't grow. When they turn on the new version, the capabilities of that version on Day 1 are the same capabilities that version will have on day 5,000. There was no growth.
"Whenever I hear about OpenAI saying we're on the cusp of AGI, I'm sitting there going, 'You guys must have something that I haven't seen.' Because this thing isn't anywhere close."
Do you think we will ever get there? And what would that look like?
Show me something that is capable of growth. ... It would have to reprogram itself. It would have to come up with new algorithms and then train itself on those new algorithms. Then you've got AGI. Well, do we really want that? That's the other thing -- do we really want AGI?
That's why Microsoft did such a good job when it coined "Copilot," because that is, in essence, what it is. It is always a copilot -- it is never the pilot. And that's what I think the problem is if you had AGI. It would be the pilot. I don't want that. I want it to be the copilot. So that's why Microsoft branded and stamped it "Copilot," and that's it.
Well, presumably, we're not going to see AGI in GPT-5 or 4.5, but what do you expect we will see in the next version of GPT?
What I expect to see in GPT-5 is a continuation of 4, because what 4 does [well] is its manipulation of the human language. Meaning, it knows the subtleties. I expect that to continue to get better. For example, with the AIStoryBuilders application ... the characters will say things to each other and it sounds more natural because there are certain subtleties.
For example, if I said, "Meet me at the park," but let's say we have some some history and I feed into the GPT that you don't like going to the park with me because you were mugged at the park, GPT-4 would actually generate the dialog to say, "Meet me at the park, if you feel OK about that." That extra part there, "If you feel okay about that," is because GPT-4 has been trained on so much more data. That is what a person would naturally say. Again, it doesn't really understand what it's doing; it is just well-trained.
That sort of subtlety is what I expect to get out of GPT-5. GPT-5 would be able to say things like, "Meet me at the park if you feel OK about that. But trust me, I understand if you don't, and perhaps we could, you know, go to the zoo instead."
Final thoughts?
Remember, I was talking about the three steps? The first step is get the information, but Step 2 is give it the right information. Do not skip Step 2. ... I don't think a lot of programmers realize that we have to be careful what we feed to the LLM using the RAG pattern. That is how we get it to do what we want. I think a lot of programmers out there who are in this field, I don't think they realize that and that's why the products that they're pushing out are not as good as I think they could be. So that's what I would like to share with people.