Machine Learning Top Topic on the Minds of GitHub Users
- By John K. Waters
Machine learning (ML) and data science are becoming hot topics on GitHub, the organization reported last week. A recent analysis of the “Octoverse,” the nickname for the community of users of the popular code repository and social coding platform recently acquired by Microsoft, revealed that AI/ML development tools, such as TensorFlow and Pytorch, are among its fastest-growing projects. And Python, one of the most popular languages for AI/ML development, was the third-most popular language on GitHub.
“We looked at contributors to repositories tagged with the 'machine-learning' topic, and ranked the most common primary languages of the repositories,” explained Thomas Elliot, a data scientist in charge of product analytics at GitHub, in a blog post
Four of the top contributed ML projects on GitHub focus on image processing (CMU-Perceptual-Computing-Lab/openpose,thtrieu/darkflow,ageitgey/face_recognition and tesseract-ocr/tesseract). TensorFlow, the open source library for numerical computation and large-scale machine learning, remained the most popular ML project on GitHub in 2018, with five times as many contributors as scikit-learn, the second-most popular project.
For this “State of the Octoverse: report, GitHub data scientists pulled data on contributions between Jan. 1, 2018, and Dec. 31, 2018, Elliot said. Contributions could include pushing code, opening an issue or pull request, commenting on an issue or pull request, or reviewing a pull request. “For the most imported packages, we used data from the dependency graph,” he said, “which includes all public repositories and any private repositories that have opted in to the dependency graph.”
The data scientists also found:
- Numpy, a package with support for mathematical operations on multidimensional data, was the most imported package, used in nearly three-quarters of machine learning and data science projects.
- Scipy, a package for scientific computation, pandas, a package for managing datasets, and matplotlib, a visualization library, are all used in over 40 percent of machine learning and data science projects.
- Scikit-learn, a popular machine learning package, containing implementations of a large number of machine learning algorithms, is used by nearly 40 percent of projects.
- TensorFlow, a package for working with neural nets, is used in nearly a quarter of packages.
The list of the top 10 are utility packages also included: six, a Python 2 and 3 compatibility library, and python-dateutil and pytz, packages for working with dates.
About the Author
John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at email@example.com.