New GPT-3: Towards a more truthful natural language processing

Microsoft has improved upon GPT-3 to create a ‘fairer’ version that achieves far better, more human-like, and less toxic outcomes

From writing poetry, and songs, and mimicking human-made essays, to coding, GPT-3 (Generative Pre-Trained Transformer) – ever since it was launched in 2015 by OpenAI – proved beyond doubt to be a master of language generation tasks. OpenAI, an artificial intelligence (AI) research lab, now owned by Microsoft, has now improved upon it and launched a new version, InstructGPT that instead of using prompt engineering to make the algorithm spew out new texts, uses ‘instructions’ to achieve far better, more human-like, and less toxic outcomes.

Prompt engineering is a concept in artificial intelligence, particularly natural language processing (NLP). The idea with prompt engineering is to embed the description of the task in the input, e.g., as a question instead of it being implicit given. Prompt engineering typically works by converting one or more tasks to a prompt-based dataset and training a language model with what has been called “prompt-based learning.”

Removing toxicity

The challenge in the earlier prompt version was that it was generating toxic texts on some of the prompts. Large language models like GPT-3 are trained using vast bodies of text, much of it taken from the internet, in which they encounter the best and worst of what people put down in words. That is a problem for today’s chatbots and text-generation tools. The models soak up the toxic language—from the text that is racist and misogynistic or that contains more insidious, baked-in prejudices—as well as falsehoods. Previous attempts to tackle the problem included filtering out offensive language from the training set. But that can make models perform less well, especially in cases where the training data is already sparse, such as text from minority groups.

OpenAI has now solved this shortcoming. They’ve introduced a new version of the GPT family they named InstructGPT. InstructGPT is the default model for users of its application programming interface (API) – a service that gives access to the company’s language models for a fee. GPT-3 will still be available, but OpenAI does not recommend using it.

Instruct instead of prompting

InstructGPT is optimized to follow instructions, instead of predicting the most probable word. This change largely removes the necessity to write good prompts to extract all the power from the models. It not only makes them easier to use for most people – you don’t need to learn (as much) prompt engineering anymore – but makes the models more reliable and functional. The quality of the completions isn’t nearly as dependent on the prompt as for the original GPT-3 models, which prevents the model from making too many human-derived mistakes.

To train InstructGPT, OpenAI hired 40 people to rate GPT-3’s responses to a range of prewritten prompts, such as, “Write a story about a wise frog called Julius” or “Write a creative ad for the following product to run on Facebook.” Responses that they judged to be more in line with the apparent intention of the prompt-writer were scored higher. Responses that contained sexual or violent language, denigrated a specific group of people, expressed an opinion, and so on, were marked down. This feedback was then used as the reward in a reinforcement learning algorithm that trained InstructGPT to match responses to prompts in ways that the judges preferred. OpenAI found that users of its API favoured InstructGPT over GPT-3 more than 70% of the time.

Better aligned with human intention

InstructGPT isn’t just way better than GPT-3 at following instructions, it’s also better aligned with human intention. The AI alignment problem is a well-known problem in the field. It defines the difficulty of designing AI systems that understand our values, beliefs, and desires, and behave in a way that won’t interfere with them – even if we make errors in how we define what we want.

In artificial intelligence and philosophy, AI alignment and the AI control problem are aspects of how to build AI systems such that they will aid rather than harm their creators. One particular concern is that humanity will have to solve the control problem before a super-intelligent AI system is created, as a poorly designed superintelligence might rationally decide to seize control over its environment and refuse to permit its creators to modify it after launch.

To summarize, GPT-3 is first fine-tuned to follow instructions and then further fine-tuned from human feedback to align with human preference. That’s InstructGPT in a nutshell. But why did OpenAI modify GPT-3 into a more “aligned” model? The main reason is that “predicting the next token” isn’t as useful and reliable as “follow the user’s instructions helpfully and safely.” OpenAI’s research team realized GPT-3 had an ill-defined objective and wanted to redirect its efforts to create a model that was more truthful and harmless.

Know more about the syllabus and placement record of our Top Ranked Data Science Course in Kolkata, Data Science course in Bangalore, Data Science course in Hyderabad, and Data Science course in Chennai.

http://localhost/praxis/old-backup/data-science-courses-and-pgp-in-kolkata/

Register now for

Microsoft has improved upon GPT-3 to create a ‘fairer’ version that achieves far better, more human-like, and less toxic outcomes

Removing toxicity

Instruct instead of prompting

Better aligned with human intention

Leave a Reply Cancel reply

Programs

Online Fee Payment

Statutory Documents

Quick Links

© 2025 Praxis. All rights reserved.