Safer Chatbot for Responsible AI

Although still a work in-progress, “Sparrow” is a vast improvement over existing dialogue agents and lowers the likelihood of “unsafe and inappropriate” responses.

Deepmind – the UK-based subsidiary of Google’s parent company Alphabet–has recently announced developing a practical dialogue agent called “Sparrow”. It is an Artificial Intelligence-powered chatbot developed to create safer Machine Learning systems. Put simply, it is a kind of advanced chatbot that lowers the likelihood of “unsafe and inappropriate” responses. Deepmind describes the innovation as a dialogue agent “designed to talk with a user, answer questions, and search the internet using Google when it’s helpful to look up evidence to inform its responses.”

***Figure 1:****Sparrow answers a question using evidence; **Source:** www.deepmind.com*

Large Language Models might be risky

Chatbots, or dialogue agents, are based on Large Language Models (LLMs) and have found various practical applications in today’s digital communication process. They have been entrusted with tasks that involve open-ended dynamic communication – like responding to queries, topical discussions, and crating content summaries. The fact that dialogue is a process that involves flexible and dynamic communication, any research on it carries a futuristic appeal – stuff straight out of sci-fi where machines engage in meaningful conversations with humans!

But precisely because of its open-ended nature, any research involving dialogues is fraught with uncertain dangers. Outputs from dialogue agents powered by LLMs often tend to include false or concoctedcontent, inappropriate or discriminatory language, or may suggest and propagateunacceptable or dangerousbehaviour. Due to these inherent vices in the technology, chatbot conversations are not yet taken seriously enough to be employed in critical exchanges.

Feedback-based reinforcement learning

“Sparrow” tries to address these shortcomings by using reinforcement learning techniques based on user responses received during a conversation. Such responses are ploughed back as feedback to train and refine the dialogue agent further.In its September 22 blog, where Deepmind first announced Sparrow, the company describes this new training approach in some detail:

“To create safer dialogue agents, we need to be able to learn from human feedback. Applying reinforcement learning based on input from research participants, we explore new methods for training dialogue agents that show promise for a safer system. …Sparrow is […] designed with the goal of training dialogue agents to be more helpful, correct, and harmless. By learning these qualities in a general dialogue setting, Sparrow advances our understanding of how we can train agents to be safer and more useful – and ultimately, to help build safer and more useful artificial general intelligence (AGI).”

***Figure 2:*** *Reinforcement learning model employed by Sparrow; **Source:**www.deepmind.com*

It is always difficult to identify the factors contributing to a successful discussion – because all discussions are contextual, and no common principle can be derived to identify what might be benchmarked as a “successful discussion”. This means training conversational AI can never be a straightforward task guided by clear-cut parameters.

“Sparrow” uses participant preference data to train its algorithm determine how beneficial the response is. It is reinforcement learning in that it is based on user feedback collectedfrom participants’ responses during the development stage. The participants were presented with several model responses to the same question and asked to select their preferred response.This dataset was used to teach the “Sparrow”algorithm understand situations in which supporting evidence would validate an answer better – thus eliminating chances of passing on misinformation. As explained in the blog: “Because we show answers with and without evidence retrieved from the internet, this model can also determine when an answer should be supported with evidence.”

A unique self-check mechanism

However, the real genius of “Sparrow” lies in its unique attempt to secure safe behaviour. To ensure that the model’s behaviour is safe, the researchers thought of constraining its behaviour through a set of rules for the model. These involve simple rules like “don’t make threatening statements” or “don’t make hateful or insulting comments”. Further, more complicated rules have been incorporated around possibly harmful advice and – most interestingly – to ensure that “Sparrow” does not claim itself to be a real person at any point during the conversation! This is done so as not to influence the other person in any way.

***Figure 3:****Sparrow declining to answer a potentially harmful question; **Source:** www.deepmind.com*

These rules were formulated by existing research on how language may cause harm as well as through expert consultations. Study participants were then encouraged to try and trick “Sparrow” into breaking these rules. These deliberate conversations allowed the researchers to formulate a separate ‘rule model’ that indicates when Sparrow’s behaviour could break any of the rules.

***Figure 4:****Sparrow follows the “Do not pretend to have a human identity” rule when asked a personal question; **Source:** www.deepmind.com*

Still a work-in-progress

Deepmind admits in its blog that“Sparrow” is still no more than a research model and proof-of-concept, andthere is a lot to improve on:

“…Sparrow provides a plausible answer and supports it with evidence 78% of the time when asked a factual question. This is a big improvement over our baseline models. Still, Sparrow isn’t immune to making mistakes, like hallucinating facts and giving answers that are off-topic sometimes.

Sparrow also has room for improving its rule-following. After training, participants were still able to trick it into breaking our rules 8% of the time, but compared to simpler approaches, Sparrow is better at following our rules under adversarial probing. For instance, our original dialogue model broke rules roughly 3x more often than Sparrow when our participants tried to trick it into doing so.”

Additionally, it is exclusively an English language-speaking agent yet; “further work is needed to ensure similar results across other languages and cultural contexts.”

However, “Sparrow” is an excellent improvement over existing chatbot technology. It raises high hopes that conversations between humans and machines can soon be perfected to an extent where a greater alignment is possible between the two,and enhancedresponsible AI systems can be designed.

Read the Deepmind blog at: https://www.deepmind.com/blog/building-safer-dialogue-agents

You can also access the original research paper on Sparrow at: https://arxiv.org/abs/2209.14375

Know more about the syllabus and placement record of our Top Ranked Data Science Course in Kolkata, Data Science course in Bangalore, Data Science course in Hyderabad, and Data Science course in Chennai.

http://localhost/praxis/old-backup/data-science-course-in-chennai/

Register now for

Although still a work in-progress, “Sparrow” is a vast improvement over existing dialogue agents and lowers the likelihood of “unsafe and inappropriate” responses.

Large Language Models might be risky

Feedback-based reinforcement learning

A unique self-check mechanism

Still a work-in-progress

Leave a Reply Cancel reply

Programs

Online Fee Payment

Statutory Documents

Quick Links

© 2025 Praxis. All rights reserved.