Microsoft's Open Source, Cross-Platform ML.NET Machine Learning Framework Gets Update

ML.NET, the open source cross-platform machine learning framework introduced by Microsoft several months ago, has been updated to version 0.4.

The project is still in preview stages, and with the new update Microsoft is seeking developer feedback on a planned API revamp. In the meantime, several bug fixes and new functionality were highlighted in the new update.

"We are happy to announce the latest version: ML.NET 0.4," the company said in a blog post published Aug. 7. "In this release we’ve improved support for natural language processing (NLP) scenarios by adding the Word Embedding Transform, improved the speed of linear learners like binary classification and linear regression by adding support for the SymSGD learner, made improvements to the F# API and samples for ML.NET, bug fixes and more."

ML.NET was introduced in May at Microsoft's Build developer conference to help .NET coders get in on cutting-edge machine learning programming without having to learn the underlying technical details associated with creating and tuning ML models.

In announcing ML.NET 0.4, the company detailed new functionality and sought feedback to further improve the framework.

"We are working on a new API which improves flexibility and ease of use," Microsoft's ML.NET team said. "When the new API is ready and good enough, we plan to deprecate the current 'pipeline' API. Because this will be a significant change we want to share our proposals for the multiple API options and comparisons in a future blog post and start an open discussion with you where you can provide your feedback and help shape the long-term API for ML.NET."

ML.NET APIs at the Time of the Original Announcement
[Click on image for larger view.] ML.NET APIs at the Time of the Original Announcement (source: Microsoft).

Yesterday, Microsoft detailed new features, including:

  • Word Embeddings Transform for Text Scenarios. This allows the use of existing word embeddings, which map words in text to numeric vectors to help capture the meanings of words for visualization or model training. These existing -- or pretrained -- models alleviate the burden of developers having to create and train their own models.
  • SymSGD Learner for Binary Classification. This improves on the SGD algorithm. SGD, according to Wikipedia, stands for stochastic gradient descent, described as "a popular algorithm for training a wide range of models in machine learning, including (linear) support vector machines, logistic regression (see, e.g., Vowpal Wabbit) and graphical models." While Microsoft said SGD is well-known and effective for various machine learning problems, it suffers from scalability problems, and the addition of SymSGD addresses that performance issue by leveraging mulithreading.
  • Improvements to F# API and samples for ML.NET. This furthers work to improve the F# story for ML.NET, which has been lacking in certain respects, such as support for F# records. The ongoing project to better F# programming with ML.NET sees v0.4 allowing the use of property-based row classes in F#. With this work, the .NET machine learning samples source code repos have been updated.

Other planned work on the project, according to its Web site, includes adding support for machine learning tools and frameworks including Light GBM, Accord.NET, CNTK and TensorFlow.

Developers can get started with the open source ML.NET preview -- which works on Windows, Linux and macOS -- with this 10-minute tutorial.

About the Author

David Ramel is an editor and writer for Converge360.