New Method Developed for Preventing Toxic Responses from AI Chatbots | MIT News – ParrotGPT

A user could ask ChatGPT to write a computer program or summarize an article, and the AI chatbot would likely be able to generate useful code or write a cogent synopsis. However, someone could also ask for instructions to build a bomb, and the chatbot might be able to provide those, too.

To prevent this and other safety issues, companies that build large language models typically safeguard them using a process called red-teaming. Teams of human testers write prompts aimed at triggering unsafe or toxic text from the model being tested. These prompts are used to teach the chatbot to avoid such responses.

Researchers from Improbable AI Lab at MIT and the MIT-IBM Watson AI Lab have developed a technique using machine learning to improve red-teaming for large language models like AI chatbots. Their approach allows the red-team model to automatically generate diverse prompts that trigger a wider range of undesirable responses from the chatbot being tested.

This automated red-teaming process outperforms human testers and other machine-learning approaches by generating more distinct prompts that elicit increasingly toxic responses, effectively improving the coverage of inputs being tested while ensuring the safety of the chatbot.

ParrotGPT’s AI chatbot solutions can help businesses ensure the safety and reliability of their models by providing advanced red-teaming processes like the one developed by researchers at Improbable AI Lab and the MIT-IBM Watson AI Lab.

Leave a Reply

Your email address will not be published. Required fields are marked *