Sergii Baidachnyi

Blog about technologies

Archive for June 6th, 2018

Windows ML: using ONNX models on the edge

leave a comment »

Source Code for this example is available on GitHub:

Open Neural Network Exchange (ONNX) is a new open-source project that should help people to use neural network models across different machine learning frameworks. Microsoft started to talk about ONNX just last October, but frameworks like CNTK, Caffe2, PyTorch already support it and there are lots of converters for existing models including a converter for TensorFlow. Additionally, Microsoft included ONNX support to Universal Windows Platform, and it’s exactly that what I am going to discuss today.

To create a basic UWP application I need a model first. It’s possible to download an existing ONNX model from the following repository, but I decided to train my own model. Right now, we are doing a series of Machine Learning OpenHacks where people have to pass some challenges and learn lots of stuff about different machine learnings frameworks and algorithms. The simplest challenge in the list is creating a classification model using Microsoft Custom Vision Cognitive Service. Usually cognitive services don’t allow you to download your trained models, but Custom Vision does!

If you want to create an exportable model the Custom Vision service provides three options that are marked as compact. For my experiment I selected just General one:

Our OpenHack dataset contains thousand images, but Custom Vision can be trained even if you have just 5 images per category. I picked two categories from the dataset: hardshell jackets and insulated jackets and uploaded 60 images per category. If you want to create a model yourself, you will be able to find all initial images on github.

To train a custom vision model you just need to click the Train button. I got good enough results even on the first iteration:

Now, it’s possible to export model. The Custom Vision service supports several formats for the model including ONNX one:

Once model is exported we can start developing a new UWP application. Don’t forget to make sure that you use the latest SDK (Build 17134) to create the application. Exactly this release contains Windows ML implementation. It’s still in preview, but you should not install anything to see integration with ONNX.

Once the application is created, you can add the generated ONNX model to the project. On this step Visual Studio will recognize ONNX and call mlgen tool to generate three proxy classes for your model: two classes to describe input and output data and one more to create the model itself. Probably, you will need to rename class names because mine were too long and contained lots of garbage, but it’s not a problem.

Let’s look at Input and output classes:

public sealed class JacketModelInput
    public VideoFrame data { get; set; }

public sealed class JacketModelOutput
    public IList<string> classLabel { get; set; }
    public IDictionary<string, float> loss { get; set; }
    public JacketModelOutput()
        this.classLabel = new List<string>();
        this.loss = new Dictionary<string, float>()
            { "hardshell", float.NaN },
            { "insulated", float.NaN },

You can see that to send an image to the model I need to convert it to a video frame. It’s possible to do using BitmapDecoder and SoftwareBitmap classes. The class for output data contains a Dictionary that already has two my tags from Custom Vision service. So, I can look at the dictionary and get my probabilities once a new image is evaluated.
The model class is pretty simple as well:

public sealed class JacketModel
    private LearningModelPreview learningModel;
    public static async Task<JacketModel> CreateModel(StorageFile file)
    . . . . .
    public async Task<JacketModelOutput> EvaluateAsync(JacketModelInput input) 
        . . . . .

You can see that there are just two methods. The first method is static and allows you to get a file with ONNX model and create an instance of the class. The second method evaluates our data and generates result utilizing classes for input and output data.

To show how to use all these classes above I created a very basic interface with a button, image element and couple text fields, and very basic logic:

JacketModel model;

protected async override void OnNavigatedTo(NavigationEventArgs e)
var appInstalledFolder = 
       var modelFolder=await appInstalledFolder.GetFolderAsync("Models");
       var modelFile = await modelFolder.GetFileAsync("jacket_model.onnx");
       model = await JacketModel.CreateModel(modelFile);

private async void Button_Click(object sender, RoutedEventArgs e)
       var picker=new FileOpenPicker();
       var file = await picker.PickSingleFileAsync();
       if (file!=null)
           using (IRandomAccessStream fileStream = 
              await file.OpenAsync(Windows.Storage.FileAccessMode.Read))
              var decoder = await BitmapDecoder.CreateAsync(fileStream);
              var software_bitmap = await decoder.GetSoftwareBitmapAsync();

              WriteableBitmap bm =
                new WriteableBitmap((int)decoder.PixelWidth,(int)decoder.PixelHeight);

              imgFile.Source = bm;

              var frame = VideoFrame.CreateWithSoftwareBitmap(software_bitmap);
              var evalData = await model.EvaluateAsync(new JacketModelInput() 
                { data = frame });
              insulated_prob.Text = (100*evalData.loss["insulated"]).ToString("N2");
              hardshell_prob.Text = (100*evalData.loss["hardshell"]).ToString("N2");

There are two methods only. OnNavigatedTo just create an instance of the model using the ONNX model file that I included to the project (I don’t know why I didn’t use StorageFile.GetFileFromApplicationUriAsync method rather than three lines of code, but this approach works as well). The second method is an event handler. Once you select an image, it will be read to a SoftwareBitmap instance and the last one will be converted to a VideoFrame.


The final interface looks like this one:

So, thanks to ONNX and Windows ML we can create applications that use Machine Learning power on the edge.


Written by Sergiy Baydachnyy

06/06/2018 at 7:38 AM

Posted in Machine Learning

Detecting objects using Custom Vision service

with one comment

Last year, I had a chance to participate in a project that we developed together with InDro Robotics team. Our primary goal was to build a machine learning solution that could be used to rescue people on the water. Just imagine an army of drones that can fly under the ocean and notify the rescue service once any issue is recognized. You can find more details about the project in the following articles:

Working at this project we could build our own model using [CNTK] ( as well as use a private preview of Custom Vision cognitive service that was moved to public preview shortly after we finish our project.

At that time the Custom Vision service could just classify images rather than detect or identify a particular object. Additionally, there were some requirements regarding to image background and the object appearance, but we found that the service worked pretty well in a “closed” environment like ocean where we have the same background. It could classify images even better than our custom model.

The most important challenge there was inability to build custom rules using the service. For example, the service could return that there is a life-vest and a boat on an image, but it was not clear if we have a life-vest on the boat or in the water. In the first case there is nothing strange, but the second one requires an operator attention (probably, something happened to the boat and people are floating around – you might watch Titanic to encourage your imagination there😊).

But it was exactly one year ago. Since that time Microsoft added some cool features to the service including object detection features. So, I decided to find some images from our InDro project and see how it works.
To start working with Custom Vision service you need to login to portal and sign-in using any Microsoft ID account. No Azure Trial or credit card are required – just login and start create a new project.

Custom Vision supports great API and some SDKs for different programming languages. For example, there is C# and Python tutorials that you can use to create a project, upload images, train your model and make predictions programmatically, but for this short experiment I decided to use just the portal. So, to create a project just click New Project button and activate the dialog below:

If you already have experience with Custom Vision, you will be able to see a new option Object Detection there. It’s exactly what we need to use: just choose Object Detection and upload some images using interface.
Because we are not classifying images anymore but detecting particular objects we need to provide more information about our objects tagging them. It’s possible to do directly on the portal. You can filter all untagged images and provide tags one by one:

This is the most complex step, but the Custom Vision service can start building a model once you have at least 15 tagged objects per tag. So, you can use few images only.

In my case, I uploaded 28 images with 25 kayaks and 17 life-vests there. I had much more images (of course, more is better), but I wanted to see results for a small subset. So, I clicked Train button and got my model in 60 seconds. The result is very surprised me:

I even could find a life-vest that I could not recognize like a life-vest myself:

Of course, there are some room for improvement. For example, you can upload more images per tag as well as answer some questions like if you want to recognize a life-vest that is on a person who is sitting in a kayak. But even on the first iteration the service produces good results and you should not be an expert in Machine Learning.

One more thing that I want to note is how to start working with Custom Vision programmatically. In the beginning I mentioned couple tutorials, but all of them assume that you already have prepared images with marked boxes. So, you need a tool that helps you generate your dataset in the right format. And there is a tool: Initially this tool was built for CNTK, but now it supports YOLO format, TensorFlow Pascal VOC and Custom Vision. In the case of Custom Vision, the tool will create a new project and upload all images, tags and boxes there.

Pay attention that for Custom Vision export you need to provide a training key rather than a directory:

Your training and prediction keys are available on Just click the Settings button in top-right corner of the portal.

If you want to use existing project some JavaScript coding skills are required. The Visual Object Tagging Tool is open source. So, you can simply find the following file lib/detection_algorithms/custom_vision/exporter.js and modify it a little bit (check vott_export name and tag creation procedure).

Written by Sergiy Baydachnyy

06/06/2018 at 1:42 AM

Posted in Machine Learning