News

Facebook Uses Deep Learning to Translate One Computer Language to Another

Facebook has developed an open-source tool for translating archaic codebases into modern languages. Called TransCoder, it's an entirely self-supervised neural transcompiler system designed to make code migration easier and more efficient.

This is the first artificial intelligence (AI) system able to translate code from one programming language to another without requiring parallel data for training, Facebook researchers said in a blog post. "We've demonstrated that TransCoder can successfully translate functions between C++, Java, and Python 3," they wrote. "TransCoder outperforms open source and commercial rule-based translation programs. In our evaluations, the model correctly translates more than 90 percent of Java functions to C++, 74.8 percent of C++ functions to Java, and 68.7 percent of functions from Java to Python. In comparison, a commercially available tool translates only 61.0 percent of functions correctly from C++ to Java, and an open source translator is accurate for only 38.3 percent of Java functions translated into C++."

A "transcompiler" (aka: "transpiler" and "source-to-source compiler") is a translator that converts between programming languages that operate at a similar level of abstraction, explained the authors of a Facebook research paper ("Unsupervised Translation of Programming Languages"). "Transcompilers differ from traditional compilers that translate source code from a high-level to a lower-level programming language (e.g. assembly language) to create an executable," the paper explains.

"Initially, transcompilers were developed to port source code between different platforms (e.g. convert source code designed for the Intel 8080 processor to make it compatible with the Intel 8086). More recently, new languages have been developed (e.g. CoffeeScript, TypeScript, Dart, Haxe) along with dedicated transcompilers that convert them into a popular or omnipresent language (e.g. JavaScript)…. In this paper, we are more interested in the traditional type of transcompilers, where typical use cases are to translate an existing codebase written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a recent one, or to integrate code written in a different language to an existing codebase."

"Self-supervised" training is especially important when it comes to translating between programming languages, they said, because traditional supervised-learning approaches rely on large-scale parallel data sets for training, but such data sets simply don't exist for COBOL to C++ or C++ to Python, for example.

The TransCoder tool relies exclusively on source code written in one programming language and does not require examples of the same code in both the source and target languages. Also, it does not require the user to possess expertise in the programming languages involved in the translation.

"TransCoder could be useful for updating legacy codebases to modern programming languages, which are typically more efficient and easier to maintain," the authors said. "It also shows how neural machine translation techniques can be applied to new domains. As with Facebook AI's previous work using neural networks to solve advanced mathematics equations, we believe NMT can help with other tasks not typically associated with translation or pattern recognition tasks."

 

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at jwaters@converge360.com.

Featured