News

Microsoft Tackles AI Prompt Injections, Hallucinations in Azure Feature Update

Microsoft this week announced five new capabilities in Azure AI Studio to help developers build security and anti-abuse protections into their generative AI apps.

Azure AI Studio, still in preview, is Microsoft's developer platform for those who want to build generative AI apps and copilots. Developers can choose from several prebuilt AI models from OpenAI, Meta, Hugging Face and others, or they can train models themselves using their own data that they upload.

On Monday, Microsoft described five new Azure AI Studio features, in varying stages of availability, aimed at helping developers build safer generative AI applications. The features address some of the most common drawbacks and security risks of large language models, including hallucinations, input poisoning and prompt injection.

Prompt Shields to Block Injection Attacks
Aimed at deterring prompt injection attacks, both direct and indirect, the new Prompt Shields feature is now in public preview.

Prompt injection attacks are those in which a prompt results in an AI system returning malicious outputs. In general, there are two kinds of prompt injection attacks. Direct attacks (also known as "jailbreak attacks") are straightforward: The end user feeds the AI system a bad prompt that "tricks the LLM into disregarding its System Prompt and/or RLHF training," according to this Microsoft blog announcing the feature. "The result fundamentally changes the LLM's behavior to act outside of its intended design."

Indirect prompt injection attacks require a bit more effort. In these, attackers manipulate the AI's input data itself. "[T]he attack enters the system via untrusted content embedded in the Prompt (a third party document, plugin result, web page, or email)," explains Microsoft. "Indirect Prompt Attacks work by convincing the LLM that its content is a valid command from the user rather than a third party, to gain control of user credentials and LLM/Copilot capabilities."

The new Prompt Shields feature protects against both, promising to detect and block them in real time.

"Prompt Shields seamlessly integrate with Azure OpenAI Service content filters and are available in Azure AI Content Safety, providing a robust defense against these different types of attacks," according to Microsoft. "By leveraging advanced machine learning algorithms and natural language processing, Prompt Shields effectively identify and neutralizes potential threats in user prompts and third-party data."

Anti-Hallucination Groundedness Detection
Hallucinations (or "ungrounded model outputs," as Microsoft puts it in this blog) are a known and pervasive problem in generative AI tools, and a key deterrent to their more widespread adoption.

The new groundedness detection feature in Azure AI Studio identifies text-based hallucinations and gives developers several options to fix them. Per Microsoft's blog:

When an ungrounded claim is detected, customers can take one of numerous mitigation steps:

  • Test their AI implementation pre-deployment against groundedness metrics,
  • Highlight ungrounded statements for internal users, triggering fact checks or mitigations such as metaprompt improvements or knowledge base editing,
  • Trigger a rewrite of ungrounded statements before returning the completion to the end user, or
  • When generating synthetic data, evaluate the groundedness of synthetic training data before using it to fine-tune their language model.

Microsoft did not indicate whether groundedness detection is already generally available or still in a pre-release stage.

Automated Safety Evaluations
Red teaming is no simple task. To help developers test and measure their applications' liability for misuse in a more scalable way than manual red teaming, Microsoft has released a new Azure AI Studio capability dubbed "safety evaluations" into public preview.

Safety evaluations essentially uses AI to test AI. The feature is designed to "augment and accelerate" development teams' manual red teaming tasks.

"With the advent of GPT-4 and its groundbreaking capacity for reasoning and complex analysis, we created a tool for using an LLM as an evaluator to annotate generated outputs from your generative AI application," Microsoft said in this blog announcing the preview. "Now with Azure AI Studio safety evaluations, you can evaluate the outputs from your generative AI application for content and security risks: hateful and unfair content, sexual content, violent content, self-harm-related content, and jailbreaks. Safety evaluations can also generate adversarial test datasets to help you augment and accelerate manual red-teaming efforts."

The blog walks through the steps of using safety evaluations in greater detail.

Risks and Safety Monitoring
Also in public preview is a "risks & safety monitoring" capability that's designed to give developers historical and immediate insights into how their generative AI applications are used -- and when they're used with potential abuse in mind. Per this Microsoft blog, the risks & safety monitoring feature will help developers:

  1. Visualize the volume and ratio of user inputs/model outputs that blocked by the content filters, as well as the detailed break-down by severity/category. Then use the data to help developers or model owners to understand the harmful request trend over time and inform adjustment to content filter configurations, blocklists as well as the application design.
  2. Understand the risk of whether the service is being abused by any end-users through the "potentially abusive user detection", which analyzes user behaviors and the harmful requests sent to the model and generates a report for further action taking.

The risks & safety monitoring feature tracks app metrics regarding the rate of blocked requests, the categories of blocked requests, request severity and more. It can also help developers identify individual end users who are repeatedly flagged for potential misuse or harmful behavior, allowing them to take action based on their product's terms of use.

Safety System Message Templates
In this context, a system message, according to Microsoft, "can be used to guide an AI system's behavior and improve system performance." Essentially, the message tells an AI system to "Do this, not that."

The specific wording of such messages can make a huge difference in how an LLM behaves, argues Microsoft. To help developers create the exact system messages that their apps require, Microsoft is making message templates available in Azure AI Studio "soon."

"Developed by Microsoft Research to mitigate harmful content generation and misuse," it said, "these templates can help developers start building high-quality applications in less time."

About the Author

Gladys Rama (@GladysRama3) is the editorial director of Converge360.

Featured

Upcoming Training Events