AI Giants Tasked with Developing Tools To Mine Coronavirus Data

A group of tech leaders and academic researchers have joined forces to prepare and distribute a massive dataset of machine-readable coronavirus literature, and to issue a call for text and data-mining tools that can help the medical community find answers to high-priority scientific questions.

Dubbed the COVID-19 Open Research Dataset (CORD-19), the freely available resource comprises more than 29,000 scholarly articles, including over 13,000 with full text, about COVID-19, SARS-CoV-2 and related coronaviruses.

The group includes Microsoft, the Allen Institute for AI (AI2), the Chan Zuckerberg Initiative (CZI), Georgetown University's Center for Security and Emerging Technology (CSET), and the National Library of Medicine (NLM) at the National Institutes of Health.

"It's all-hands on deck as we face the COVID-19 pandemic," said Dr. Eric Horvitz, chief scientific officer at Microsoft, in a statement. "We need to come together as companies, governments, and scientists and work to bring our best technologies to bear across biomedicine, epidemiology, AI, and other sciences. The COVID-19 literature resource and challenge will stimulate efforts that can accelerate the path to solutions on COVID-19."

CORD-19 was assembled in response to a request from the White House's Office of Science and Technology Policy. "Decisive action from America's science and technology enterprise is critical to prevent, detect, treat, and develop solutions to COVID-19," said Michael Kratsios, U.S. Chief Technology Officer, in a statement. "We thank each institution for voluntarily lending its expertise and innovation to this collaborative effort, and call on the United States research community to put artificial intelligence technologies to work in answering key scientific questions about the novel coronavirus."

The World Health Organization (WHO) and the U.S. Centers for Disease Control and Prevention (CDC) have said they want help from the AI community to better understand the origins and transmission of the coronavirus to develop a vaccine and treatments.

"One of the most immediate and impactful applications of AI is in the ability to help scientists, academics, and technologists find the right information in a sea of scientific papers to move research faster. We applaud the OSTP, WHO, NIH and all organizations that are taking a proactive approach to use the most advanced technology in the fight against COVID-19," said Dr. Oren Etzioni, CEO of AI2. "The Allen Institute for AI, and particularly the Semantic Scholar team, is committed to updating and improving this important resource and the associated AI methods the community will be using to tackle this crucial problem."

The CORD-19 dataset is available on AI2's Web site, which will be updated as new research is published in archival services and peer-reviewed publications, they said. Researchers should submit the text and data mining tools and insights they develop via the Kaggle platform, a machine learning and data science community owned by Google Cloud.

Kaggle is hosting the COVID-19 Open Research Dataset Challenge, a series of questions designed to inspire the community to use CORD-19 to find new insights about the COVID-19 pandemic, including the natural history, transmission and diagnostics for the virus; management measures at the human-animal interface; and lessons from previous epidemiological studies.

The questions were developed in coordination with the National Academies of Sciences, Engineering, and Medicine's Standing Committee on Emerging Infectious Diseases and 21st Century Health Threats, and the WHO.

"It's difficult for people to manually go through more than 20,000 articles and synthesize their findings," said Anthony Goldbloom, Kaggle's co-founder and CEO. "Recent advances in technology can be helpful here. We're putting machine readable versions of these articles in front of our community of more than 4 million data scientists. Our hope is that AI can be used to help find answers to a key set of questions about COVID-19."

About the Author

John K. Waters is the editor in chief of a number of sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at