A Look at the Technology Behind Microsoft's AI Surge

Microsoft is pushing artificial intelligence for the enterprise in a huge way, but where does its technology stand? Joey walks you through the technical foundations behind Microsoft's pivot to AI.

Microsoft's most recent reorganization in March highlights the company's focus on artificial intelligence (AI) and the "intelligent edge" -- the notion of bridging the digital world with the real word. For instance, Microsoft has made big investments aimed at enterprise Internet of Things (IoT) applications, such as ways to collect sensor data from manufacturing devices.

This strategy shift is powered by the Azure cloud. Azure offers a set of tools built on both open source and Microsoft solutions, and running as both Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) solutions. Additionally, Microsoft has contributed to the open source community with the Microsoft Cognitive Toolkit, a collection of algorithms and tools to ease machine learning development.

Lambda Architecture: The Heart of Azure IoT
Lambda architecture, while a general computing concept, is built into the design of Microsoft's IoT platform. The design pattern here focuses on managing large volumes of data by splitting it into two paths -- the speed path and the batch path. The speed path offers real-time querying and alerting, while the batch path is designed for larger data analysis. While not all AI scenarios use both of these paths, this is a very common edge computing pattern.

At the speed layer, Azure offers two main options -- Microsoft's own Azure Stream Analytics offering and the open source Apache Kafka, which can be implemented using the HDInsight Hadoop as a Service (HDaaS) offering, or on customers' own virtual machines (VMs). Both Stream Analytics and Kafka offer their own streaming query engines (Steam Analytics' engine is based off of T-SQL). Additionally, Microsoft offers Azure IoT and Azure Event Hubs, which connect edge devices (such as sensors) to the rest of the architecture. IoT Hubs offer a more robust solution with better security; Event Hubs are specifically designed just for streaming Big Data from system to system.

The batch layer has many more options. In addition to the Hadoop offerings Microsoft has with HDInsight, customers can build their own Hadoop solution on top of Azure Blob Storage or use Azure SQL Data Warehouse (Microsoft's massively parallel data warehousing solution), as well as take advantage of Microsoft's recent integration with Databricks and Spark. Spark is somewhat unique in that it can be used for streaming queries, machine learning and other places where Hadoop has been used. All of these solutions allow for connections to Microsoft's machine learning offerings.

Microsoft Machine Learning
Microsoft's machine learning offerings range from beginner- to expert-level tools. Most of these tools require an Azure subscription, though all of the service offerings that are required for use do have free tiers.

Azure Machine Learning Studio is a graphical integrated development environment (IDE) for building, prototyping and deploying predictive analytics solutions based on your data. Machine Learning Studio was Microsoft's entree into machine learning tools in 2013. You can deploy the models as Web services that can then be deployed to BI tools or Excel for further analysis. Machine Learning Studio does not require any programming but does allow for the use of pre-built R or Python scripts, in addition to the pre-supplied algorithms that are built into the tool. This tool is limited to data sets that are smaller than 10GB. This data may be stored in external files or in various Azure data sources, which is recommended as data volumes increase.

Azure also offers two machine learning services called "Experimentation Service" and "Model Management." The Experimentation Service allows data scientists to build models on their desktop (using a cross-platform tool called the Azure Machine Learning Workbench) while offering an easy path to scale up and out to other environments such as GPU-based VMs or HDInsight Spark clusters. Workbench supports running scripts in Python or PySpark; allows for container-based deployment using Docker; and can be used with a variety of computing targets, including local Python or Docker, Linux VMs, Docker on Linux VMs, and HDInsight for Spark. The goal of the Experimentation Service is to provide isolated, reproducible and consistent runs of machine learning models. This service, combined with container-based deployment, reduces friction as the containers are fully self-contained with all necessary dependencies.

The other service that is required to use Azure Machine Learning Workbench is Azure Machine Learning Model Management, which acts as a robust source-control mechanism and enables automated container creation and model retraining, as well as model performance telemetry. The container step is important because it enables scale-out analysis directly on the Kubernetes-based Azure Container Service.

Microsoft made a big acquisition in 2015 of a firm called Revolution Analytics, a builder of custom R and Python packages that offered better performance over their open source equivalents. The first step in this process was integrating R into SQL Server, which happened in SQL Server 2016, followed by a name change in SQL Server 2017 to "Machine Learning Services" and support for Python. Included in this product portfolio is a standalone Machine Learning Server that can run on Linux, including GPU-based machines, and scale-out options.

Cognitive Toolkit
Microsoft's biggest contribution to open source machine learning (and its answer to Google's TensorFlow) is the Cognitive Toolkit (CNTK). Microsoft developed these tools in-house after years of work on projects like Xbox Live and Bing Search. It also offers semantic tools, computer vision and others. CNTK can be included as a library in Python, C# or C++ programs, or it can be used a standalone tool.

And More To Come
Microsoft has made big investments across the machine learning space -- both in Azure (with its partnership with Databricks for Spark and the various Azure offerings to support edge computing) and in general with the release of the CNTK and Machine Learning Server. Based on the recent reorg and comments from CEO Sayta Nadella, I would only expect Microsoft's offerings in this space to continue to grow.

About the Author

Joseph D'Antoni is an Architect and SQL Server MVP with over a decade of experience working in both Fortune 500 and smaller firms. He is currently Principal Consultant for Denny Cherry and Associates Consulting. He holds a BS in Computer Information Systems from Louisiana Tech University and an MBA from North Carolina State University. Joey is the co-president of the Philadelphia SQL Server Users Group . He is a frequent speaker at PASS Summit, TechEd, Code Camps, and SQLSaturday events.