In-Depth

WAG (Web-Augmented Generation) for Not Quite Dummies

WAG (web-augmented generation) is quickly becoming an essential part of modern AI systems. WAG allows large language models, such as GPT and Llama, to supplement their core knowledge with additional information by searching the web. This is especially useful when a large language model (LLM) needs recent information, such as a company stock price or a sports score.

What is WAG?
LLMs such as GPT from OpenAI, Llama from Meta, Gemini from Google, Claude from Anthropic, R1 from DeepSeek, Grok from xAI, and Mistral AI, are quite remarkable. Because the models have been trained using many data sources, including Wikipedia, the models understand English grammar and know a lot of static facts. But if you want to query for recent information, you need to supply the LLM with the necessary data. One way to do this is to crawl the web for new information.

All of the LLMs can do WAG in roughly the same way, but the details differ from model to model. The diagram in Figure 1 illustrates a generic WAG system.

A user query such as, "What was the closing stock price of Acme Corp yesterday? If you don't know, augment your response using a web search" is submitted to the WAG system. The system determines that the closing stock price is not in the LLM core base knowledge, which triggers a search of the web/internet.

Figure 1: A Generic Web-Augmented Generation System
[Click on image for larger view.] Figure 1: A Generic Web-Augmented Generation System

Assuming the closing stock price is found, that information is added to the query context, which is then sent to the LLM. The LLM uses its knowledge of English grammar to construct a response to the user.

What Does a WAG System Look Like?
Before the availability of integrated WAG systems, it was possible to perform WAG queries, but doing so was quite difficult. Engineers had to write code to crawl and query the web, fetch results, prune the results to relevant information, and supplement the user query context. Modern LLM APIs make implementing WAG much easier.

The essential parts (with details omitted) of a Python language WAG program look like:

query = "What was the closing stock price of Acme Corp yesterday?"
info = { 
  "model": "gpt-4.1",
  "tools": [{"type": "web_search_preview"}],
  "input": [
    { "role": "developer",
      "content": "You use the Web when necessary." },
    { "role": "user", "content": question },
  ],
  "temperature": 0.3,  # and use default top_p
  "max_output_tokens": 100,
}
response = client.responses.create(**info)
print(response)

This example uses the OpenAI GPT-4.1 LLM API, plus the "web_search" tool to find recent information that's not in the model's core knowledge base.

Behind the scenes, the program executes a web search. The OpenAI documentation does not describe exactly how this works, but it's likely that the web search functionality uses a combination of SERPs (search engine result pages) from the Google search indexes and the Bing search indexes (more likely Bing because of the business relationship between OpenAI and Microsoft). The OpenAI API web search possibly uses results from independent indexes such as Mojeek too.

What Are the Implications of WAG?
The Pure AI editors asked for comments from Dr. James McCaffrey, one of the original members of the Microsoft Research Deep Learning team. "Web-augmented generation makes LLM applications much more powerful, but WAG introduces potential problems," he said. "Because WAG uses web content, and virtually anyone can post information on the web, WAG systems are susceptible to unintended incorrect information and deliberately poisoned data.

"The openness of the web, and the growing reliance of LLMs for factual content from the web, means that systems designed to verify the correctness of web data will become increasingly important. Keeping humans-in-the-loop to monitor LLM responses for critical systems such as medical and military will also increase in importance."

Featured