Six Times Better Than the Best

Six Times Better Than the Best

Google Brain team ready to revolutionise the world, again.

June 2020: Elon Musk-backed OpenAI first launches the GPT-3 transformer system; a landmark deep learning model that used a mind-boggling 175 billion parameters, making it the biggest NLP (Natural Language Processing) model in the world.

January 2021:A trio of researchers from the Google Brain team unveils a model that uses over a trillion parameters; making it almost 6 times larger than the GPT-3.

As far as good omens for the next decade go, there couldn’t have been a better one.

“Are you Human?”

The architecture behind an AI that ‘understands’ humans, performing a variety of functions that can mimic human behaviour and respond to statements: such as in the generation of replies to questions, or (even) in writing its own poetry or ‘news’ articles has been in development (and use) for a long time now. The basis for such an engine – a Natural Language Processing unit – first finds its roots in works dating back to the 1950s: where an erudite young gentleman named Alan Turing first published a paper titled ‘Computing Machinery and Intelligence’.

This paper formed the basis for the widely acclaimed ‘Turing Test’: a test to check whether or not a computer can think like a human being. The first time that a computer ever cracked the Turing Test was back in 2014, where, at a demonstration at Reading University, a computer convinced a human judge (and one-third of the jury) that it was, in fact, a thirteen-year-old Ukrainian boy.

(It is, however, to be made clear, that even the most advanced AI models working with the most cutting-edge machine learning technologies are not really understanding language: they are simply fine-tuned – using thousands of parameters – to make it seem like they do.)

Today, although the Turing Test holds more ground in spheres of philosophy and the arts rather than in AI development (especially as part of the conversation surrounding AI ever becoming self-aware), its ramifications are widespread. We are slowly inching towards a reality where AI machines can indeed crack the Turing Test on a regular basis, a task even the most advanced NLP model could not do. Chief AI evangelist at US-based AI firm DataRobot opines: “If Alan Turing was alive, he might be shocked that given 175 billion neurons from GPT-3 we are still unable to pass his test, but we will soon.”

Perhaps the model using over a trillion parameters will be the one to kickstart the process.

The name’s Brain, Google Brain

Recent research (still in pre-print) from Google Brain has essentially figured out a way to keep a model as simple as possible whilst ‘squeezing in’ as much raw computing power that it possibly can. This is what allows a trillion parameters to be fed into the model, making it, potentially, the most advanced AI system in the world.

According to the research paper authored by the scientists at Google Brain, Switch transformers may just be the way forward: “Switch Transformers are scalable and effective natural language learners. We simplify Mixture of Experts to produce an architecture that is easy to understand, stable to train and vastly more sample efficient than equivalently-sized dense models. We find that these models excel across a diverse set of natural language tasks and in different training regimes, including pre-training, fine-tuning and multi-task training. These advances make it possible to train models with hundreds of billion to trillion parameters and which achieve substantial speedups relative to dense T5 baselines.”

Following developments over the past few years, large-scale training has often been found to be the most effective way to generate the most powerful models. According to the research, simple architectures using large datasets and parameter counts far surpass complicated algorithms. However, this becomes a highly computationally-intensive process. This is why stress has been laid on using what they call the ‘Switch’ Transformer: a ‘sparsely-activated’ technique using only a subset of the model’s parameters to transform input data parsed into the model.

Tech website VentureBeat writes: “In what might be one of the most comprehensive tests of this correlation to date, Google researchers developed and benchmarked techniques they claim enabled them to train a language model containing more than a trillion parameters. They say their 1.6-trillion-parameter model, which appears to be the largest of its size to date, achieved an up to 4 times speedup over the previously largest Google-developed language model (T5-XXL).”

Google has consistently pushed the limits of what AI can do over the last few decades – and this is no different. Using sparse models as the most effective means of architecture in natural language tasks is set to be the practice for the future. Taken by itself, it is a monumental step in the right direction – and could have far-reaching ramifications for years to come.

© 2024 Praxis. All rights reserved. | Privacy Policy
   Contact Us