Microsoft releases automated PyRIT red teaming tool for finding AI model risks

Members of a Microsoft Corp. team tasked with using hacker tactics to find cybersecurity issues have open-sourced an internal tool, PyRIT, that can help developers find risks in their artificial intelligence models.

The researchers released the code for the framework on Thursday. According to Microsoft, PyRIT can automatically generate thousands of adversarial AI prompts to test if a neural network effectively withstands hacking attempts. The tool is geared toward processing text, but it was built in a way that allows developers to add support for other types of AI input such as images.

PyRIT started out as a collection of scripts that Microsoft’s AI Red Team developed for internal use. The team is responsible for simulating cyberattacks against new AI models to find weak points before hackers do. The researchers steadily expanded the scripts with additional features until the code base grew into the framework released this week as PyRIT.

Developers have to test a newly created AI model for several types of risks before deploying it to production. They must search for cybersecurity risks, such as prompts that might cause the model to write malware. Software teams also need to look for cases where the AI may hallucinate, as well as determine if it can be tricked into revealing sensitive information from its training dataset.

Further complicating the task is that some models generate not only text but also other types of output such as images. Vulnerability tests must be repeated separately across each output type, as well as across each software interface through which users interact with the AI. Those factors mean that thoroughly testing a neural network requires developers to craft up to thousands of adversarial prompts, which is often impractical.

Microsoft created PyRIT to remove that limitation. According to the company, the framework allows developers to specify a certain type of adversarial AI input and automatically generate thousands of prompts that meet the criteria. Those prompts can be used to test an AI that is implemented in the form of a web service, as well as models offered via an application programming interface.

“PyRIT is not a replacement for manual red teaming of generative AI systems,” Microsoft’s researchers stressed in a blog post detailing the framework. “Instead, it augments an AI red teamer’s existing domain expertise and automates the tedious tasks for them.”

PyRIT can not only generate adversarial prompts but also evaluate how the target model responds. According to Microsoft, a built-in scoring engine automatically determines whether the model a developer is testing produced harmful output in response to a prompt. Software teams have the option to swap the default scoring engine with an external neural network built for the same task.

Because it’s capable of analyzing AI responses, PyRIT lends itself to performing so-called multiturn risk evaluations. The framework can enter an adversarial prompt into an AI, analyze the response and adjust its next prompt accordingly to make it more effective. “While single-turn attack strategies are faster in computation time, multiturn red teaming allows for more realistic adversarial behavior and more advanced attack strategies,” Microsoft’s researchers explained.

Photo: efes/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU

Microsoft releases automated PyRIT red teaming tool for finding AI model risks

Photo: efes/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

Recommend

英国人工智能驱动的女性健康平台 Unfabled完成160万英镑的种子轮融资，截至目前融资总...

德国高端运动时尚品牌BOGNER博格纳推出2024早春系列。

7 fast-growing scaleups with unicorn potential — meet them at TNW 2024

从争议到狂热，一款游戏的激进蜕变和背后的故事

Hackers are hunting celebs. Digital IDs can help, but add new risks

2月22日，Max Mara举办2024秋冬时装秀并同步直播。

智源研究院推出新一代多模态小模型Bunny-3B

The best password managers for 2024

据传，法国奢侈品巨头 LVMH 集团董事长、首席执行官 Bernard Arnault 再次出手投资巴...

英特尔CEO重申愿意为任何人制造芯片，包括AMD和英伟达等竞争对手

About Joyk