As Artificial Intelligence (AI) makes huge advances in its ability to closely mimic human intelligence like writing texts, creating images, talking and communicating like real people, the questions of trustworthiness of the algorithms have started to come up.
Generative Adversarial Networks (GAN) can create real fakes that are so real that it is making people believe in the most bizarre videos. Advances in Generative Pre-trained Transformer 3 (GPT-3), an auto regressive language model that uses deep learning to produce human-like text, is so impressive that it looks genuinely written by a human. Unfortunately, although these GPT models excel at writing natural-sounding text, they are not equipped to write text that is factual.
Advanced language models pick up those words that would best match the context at hand and string them together to form meaningful sentences. However, the models will not – and cannot – ensure the authenticity of the meaning that the generated text conveys. But now, a start-upnamed Diffbot is attempting to address this issue. It collects massive amounts of texts from the Internet and pieces together information to construct a factual statement. For this, the AI extracts as many facts as it can dig up from the internet.In the process, Diffbot is building the biggest-ever knowledge graph by applying image recognition and natural-language processing to billions of web pages.
Diffbot stitches together all of the factoids it digs up from the web to create knowledge graphs. These are nothing but webs of relationships between concepts. They use data interlinking to enable machine learning algorithms transform into knowledge domains. The concept is not very new, though. Early research on AI conceptualized such knowledge graphs for teaching algorithms to understand the human world.
But those were typically handcrafted – a complex, meticulous and limited process. Automating the entire exercise means AI can now gain much wider, contextual understanding of concepts by accessing everything that is available on the internet – resulting in text output that would be more based on actual fact, and consequently more natural.
Google had been using knowledge graphs for some years now, and we are quite familiar with it. These are the short summaries of information that turn up whenever we search for a popular topic. Their knowledge graph extracts the most relevant factoids to present the summaries. That is what Diffbot aims to achieve – but not only for frequently searched popular topics. It intends to do this for every available topic – leading to a mindbogglingly massive knowledge graph, accumulated through scanning the entire web of information available in the public domain. Diffbot is updating this knowledge graph with new information every four or five days, and over the course of a month it produces 100 million to 150 million entries.
Of course, such a gigantic scale would not be achieved by simple web-crawling of individual pages. Instead, Diffbot uses computer vision algorithms. Computer vision systems work quite similar to human vision. They first identify the edges of an object and then connects these edges together to draw up the object’s final shape. Diffbot leverages this technology to extract the raw pixels of a web page and pull video, image, article, and discussion data from the page. It identifies the key elements of the webpage and then extracts facts in a variety of languages, in adherence to the three-part factoid schema.
Diffbot plans to add a natural-language interface that can answer almost any question posed in normal human language; it will also be providing supporting sources for each of these answers.Currently, they allow for both paid and free access to its knowledge graph. Researchers can access it for free.