How NVIDIA’s Maxine is set to change the landscape of video calls
Since the onset of the COVID-19 pandemic, a significant proportion of the global workforce has been forced to work online – and without the convenience of in-person interaction or other communication methods. As such, this has seen a massive rise in the growth of video teleconferencing platforms such as Google platforms, Skype or Zoom, with the latter even becoming more valuable than the world’s seven largest airline companies by May 2020. In fact, the highest source of internet traffic over the last few months has been video streaming, with almost 30 million video meetings taking place globally on a daily basis.
Although video conferencing for work is still a rather novel platform, it has its origins in the early 1960s with methods and means undergoing rapid development in the 1990s. Its accessibility and affordability have made it one of the world’s most oft-used tools today, and thus efforts to optimise said process has been the centre of much research recently.
The NVIDIA AI Suite
High-quality video conferencing is a rather bandwidth-intensive process and requires a high level of accessories – camera quality, consistent network etc. This is why global tech giants NVIDIA have recently come up with an AI-powered suite meant to streamline the process of video conferencing – improving video quality while reducing bandwidth usage.
Called NVIDIA Maxine, it is a platform that provides the developers of video-conferencing software much leeway in altering a whole host of features, such as resolution upscaling, automatic camera positioning (that always keeps the participants’ faces in the centre of the frame) and using an adaptive AI to create a model of the speakers’ face, only updating expressions as they happen. NVIDIA has managed to do so by harnessing the power of artificial intelligence to analyse key features of a call’s participants and then intelligently re-animate them on their cloud, thereby saving the need to stream an entire screen’s worth of pixels continuously.
Using AI, Maxine is also capable of scaling lower resolution video into higher resolution whilst compressing the resulting feed by almost 90% – much more than what standard H.264 architectures can manage. The result is a much-reduced volume of data transmission in spite of maintaining high image quality.
Figure 1: Transferring only key points over the internet slashing bandwidth. Source: NVIDIA Website
There are several other novel features in the software as well, with NVIDIA promising further developments in ensuing updates. Some of these include ‘gaze correction’, which gives participants of the call the impression that the speaker is addressing them directly instead of the screen; the ‘denoise’ feature, which isolates background noise and mutes it, thereby allowing only the speaker’s voice to come through clearly; and even a conversational AI named Jarvis. Jarvis is a fully accelerated conversational AI framework that runs on the Maxine system allowing for the integration of ‘virtual assistants to take notes, set action items and answer questions in human-like voices’, along with other services such as translation, transcription as well as closed captioning.
Figure 2: Conversational services with Jarvis. Source: NVIDIA Website
The Maxine AI system additionally also predicts common issues to arise during video calls and has been trained to combat them. And, given the fact that it uses NVIDIA GPU acceleration and runs on the cloud, one doesn’t necessarily need to use an NVIDIA graphics card to enable the features of Maxine.
Essentially, what this does is to allow video developers to save money on the back-end of the video conferencing process, whilst also allowing workers to better contend with home settings that may not always be conducive to professional-grade teleconferencing – a major step towards affirming the fact that the Work-from-home phenomenon is here to stay.