Google Makes Internal-Use Neural Net for NLP Available via Open Source

Google Research has open sourced the code for its internally developed TAPAS (TAble PArSing) neural network for querying tables using natural language processing (NLP).

TAPAS is based on BERT, Google's neural-network-based technique for NLP pre-training. (BERT stands for Bidirectional Encoder Representations from Transformers.) "Transformers" are models that process words in relation to all the other words in a sentence, rather than one-by-one in order. "BERT models can therefore consider the full context of a word by looking at the words that come before and after it—particularly useful for understanding the intent behind search queries," explained Pandu Nayak, Google Fellow and VP of the company's Search group, in a blog post.

BERT, created and open-sourced by Google in 2018, was designed to pre-train deep bidirectional representations from unlabeled text, the BERT research team explained in a research paper, by jointly conditioning on both left and right context. The pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of NLP tasks.

BERT helps the Google search engine understand the context of queries that include common words like "to" and "for" in a way it couldn't previously. The result is more relevant search results based on search intent, which is the real meaning behind Google searches, the why of the keywords.

"Interacting with tabular data as natural language is one of those scenarios that looks conceptually trivial and results in a nightmare in the real world," said Jesus Rodriguez, Intotheblock co-founder and chief scientist and managing partner at Invector labs, in a blog post. "Most attempts to solve this issue have been based on semantic parsing methods that process a natural language sentence and generate the corresponding SQL. That approach works in very constrained scenarios but is hardly scalable to real natural language interactions."

TAPAS extends the BERT model in four fundamental areas, Rodriguez explained: it uses extra embeddings for encoding the textual input, leveraging learned embeddings for the row and column indexes; it extends BERT with a classification layer that can select the subset of the table cells and scores the probability that those cells will be in the final answer; it includes an aggregation operator as part of the output to indicate which mathematical operations (SUM, COUNT, AVERAGE) or others need to be applied to the target cells; and it extends BERT with an inference layer to predict the most likely outcome of the operators and cells that need to be selected.

More information about Google TAPAS is available on GitHub.

About the Author

John K. Waters is the editor in chief of a number of sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at