Ever since the first coders wrote programs by hand, and transferred those into stacks of punched cards that could be processed by computers – the idea of machines that can write codes, find bugs, and correct them too, has been the holy grail of programming. Now, researchers from Intel, Massachusetts Institute of Technology (MIT) and Georgia Institute of Technology have announced a new machine programming system designed to detect code similarity. Called Machine Inferred Code Similarity (MISIM) system, it is an automated engine capable of determining whether two pieces of code, data structures, or algorithms perform the same or similar tasks – that is, if they match the outcome of a code.
The end goal – Democratizing coding: MISIM’s end goal is to democratize software development. Explaining it to the media, Josh Gottschlich, principal scientists and director/founder of machine programming research at Intel, sketched an outline of the future when he said: “Intel’s ultimate goal for machine programming (MP) is to democratize the creation of software. When fully realized, MP will enable everyone to create software by expressing their intention in whatever fashion that’s best for them, whether that’s code, natural language or something else. That’s an audacious goal, and while there’s much more work to be done, MISIM is a solid step toward it.”
What MISIM does: MISIM can extract the meaning of a piece of code – rather, what the code is telling the computer to do – in much the same way as natural-language processing (NLP) systems read a paragraph written in English. However, MISIM differs from other code-similarity-systems because it uses a context-aware semantic structure (CASS) which provides more insight into what the code does, not just how it does it. While, other code-similarity-systems try to determine similar characteristics or identical goals, MISIM can determine code that performs similar computations.
What problems can MISIM solve: According to the researchers, hardware and software systems are increasingly becoming more and more complex. That, coupled with the storage of programs necessary to develop hardware and software systems, has encouraged the need for a new development approach. The idea of machine programming is to improve development productivity through the usage of automated tools.
Existing code automation tools: Automated code generation has been a hot research topic for a number of years. Microsoft is building basic code generation into its widely used software development tools. Facebook, too, has developed a system called Aroma that auto completes small programs, and DeepMind has created a neural network that can come up with more efficient versions of simple algorithms than those devised by humans. Even OpenAI’s GPT-3 language model can churn out simple pieces of code, such as web page layouts, from natural-language prompts.
How MISIM works: MISIM works by comparing snippets of code with millions of other programs it has already seen, taken from a large number of online repositories. It can then suggest other ways the same code might be written, offering corrections and ways to make it faster or more efficient. The tool’s ability to understand what a program is trying to do, allows it to identify other programs that do similar things. In theory, this approach could be used by machines that wrote their own software, drawing on a patchwork of pre-existing programs with minimal human oversight or input.
MISIM first translates the code into a form that captures what it does, but ignores how it is written – because two programs written in very different ways can sometimes do the same thing. Next, MISIM uses a neural network to find other codes with similar purpose. In a preprint, Gottschlich and his colleagues reported that MISIM is 40 times more accurate than previous systems that try to do this, including Aroma.
The next steps – Coding with natural language: Intel plans to use the tool as a code recommendation system for developers in-house, suggesting alternative ways to write code that are faster or more efficient. But because MISIM is not tied to the syntax of a specific program, there is much more it could potentially do. For example, it could be used to translate code written in an old language like COBOL into a more modern language like Python. This matters indeed, because a lot of institutions – including the US government – still rely on software written in languages that few coders today know how to maintain or update. Combined with NLP, the ability to work with the meaning of codes separately from their textual representation could one day let people write software simply by describing what they want to do in words.