Tag Archives: AI

Announcing Microsoft Build Tour 2017

Figure 1 Sign-up at http://buildtour.microsoft.com

On the heels of the Build conferences these last few years, we have had the pleasure of meeting thousands of developers around the world. Their feedback and technical insight has helped us to continue the tradition and explore more technical depth.

Today, I’m excited to announce the Microsoft Build Tour 2017, coming to cities around the globe this June! The Microsoft Build Tour is an excellent way to experience Microsoft Build news first-hand, and also work directly with Microsoft teams from Redmond and your local area.

This year, we’re bringing the Microsoft Build Tour to these locations:

Dates

City

June 5-6 Shanghai, China
June 8-9 Beijing, China
June 12 Munich, Germany *
June 13-14 Seoul, Republic of Korea
June 14-15 Helsinki, Finland
June 19-20 Warsaw, Poland
June 21-22 Hyderabad, India
June 29-30 Sydney, Australia

The Microsoft Build Tour is for all developers using Microsoft platform and tools. We will cover a breadth of topics across Windows, Cloud, AI, and cross-platform development. We will look at the latest news around .NET, web apps, the Universal Windows Platform, Win32 apps, Mixed Reality, Visual Studio, Xamarin, Microsoft Azure, Cognitive services, and much more.

We also want to go deeper into code, so this year we’re bringing the tour as a two-day* event. You can decide to attend just the sessions on the first day, or sign-up for a deep (and fun!) hands-on experience on the second day.

  • Day 1: Full day of fast-paced, demo-driven sessions, focusing primarily on new technology that you can start using immediately in your projects, with a bit of forward-looking awesomeness for inspiration.
  • Day 2: Full day hackathon where you’ll use the latest technologies to build a fun client, cloud and mobile solution that meet the requirements of a business case given at the beginning of the day. Seating is limited for Day 2, so be sure to register soon!

In most locations, on both days, we’ll also have a Mixed Reality experience booth where you’ll be able to sign up for scheduled hands-on time with Microsoft HoloLens and our latest Windows Mixed Reality devices.

To learn more and register, visit http://buildtour.microsoft.com. Can’t make it in person? Select cities will be live-streamed regionally to give you a chance to view the sessions and participate in the Q&A.

We can’t wait to see you on the tour!

*Munich is a single day, session-focused event.

Read More

Building the Terminator Vision HUD in HoloLens

James Cameron’s 1984 film The Terminator introduced many science-fiction idioms we now take for granted. One of the most persistent is the thermal head-up-display (HUD) shot that allows the audience to see the world through the eyes of Arnold Schwarzenegger’s T-800 character. In design circles, it is one of the classic user interfaces that fans frequently try to recreate both as a learning tool and as a challenge.

In today’s post, you’ll learn how to recreate this iconic interface for the HoloLens. To sweeten the task, you’ll also hook up this interface to Microsoft Cognitive Services to perform an analysis of objects in the room, face detection and even some Optical Character Recognition (OCR).

While on the surface this exercise is intended to just be fun, there is a deeper level. Today, most computing is done in 2D. We sit fixed at our desks and stare at rectangular screens. All of our input devices, our furniture and even our office spaces are designed to help us work around 2D computing. All of this will change over the next decade.

Modern computing will eventually be overtaken by both 3D interfaces and 1-dimensional interfaces. 3D interfaces are the next generation of mixed reality devices that we are all so excited about. 1D interfaces, driven by advances in AI research, are overtaking our standard forms of computing more quietly, but just as certainly.

By speaking or looking in a certain direction, we provide inputs to AI systems in the cloud that can quickly analyze our world and provide useful information. When 1D and 3D are combined—as you are going to do in this walkthrough—a profoundly new type of experience is created that may one day lead to virtual personal assistants that will help us to navigate our world and our lives.

The first step happens to be figuring out how to recreate the T-800 thermal HUD display.

Recreating the UI

Start by creating a new 3D project in Unity and call it “Terminator Vision.” Create a new scene called “main.” Add the HoloToolkit unity package to your app. You can download the package from the HoloToolkit project’s GitHub repository. This guide uses HoloToolkit-Unity-v1.5.5.0.unitypackage. In the Unity IDE, select the Assets tab. Then click on Import Package -> Custom Package and find the download location of the HoloTookit to import it into the scene. In the menu for your Unity IDE, click on HoloToolkit -> Configure to set up your project to target HoloLens.

Once your project and your scene are properly configured, the first thing to add is a Canvas object to the scene to use as a surface to write on. In the hierarchy window, right-click on your “main” scene and select GameObject -> UI -> Canvas from the context menu to add it. Name your Canvas “HUD.”

The HUD also needs some text, so the next step is to add a few text regions to the HUD. In the hierarchy view, right-click on your HUD and add four Text objects by selecting UI -> Text. Call them BottomCenterText, MiddleRightText, MiddleLeftText and MiddleCenterText. Add some text to help you match the UI to the UI from the Terminator movie. For the MiddleRightText add:

SCAN MODE 43984

SIZE ASSESSMENT

ASSESSMENT COMPLETE

 

FIT PROBABILITY 0.99

 

RESET TO ACQUISITION

MODE SPEECH LEVEL 78

PRIORITY OVERRIDE

DEFENSE SYSTEMS SET

ACTIVE STATUS

LEVEL 2347923 MAX

For the MiddleLeftText object, add:

ANALYSIS:

***************

234654 453 38

654334 450 16

245261 856 26

453665 766 46

382856 863 09

356878 544 04

664217 985 89

For the BottomCenterText, just write “MATCH.” In the scene panel, adjust these Text objects around your HUD until they match with screenshots from the Terminator movie. MiddleCenterText can be left blank for now. You’re going to use it later for surfacing debug messages.

Getting the fonts and colors right are also important – and there are lots of online discussions around identifying exactly what these are. Most of the text in the HUD is probably Helvetica. By default, Unity in Windows assigns Arial, which is close enough. Set the font color to an off-white (236, 236, 236, 255), font-style to bold, and the font size to 20.

The font used for the “MATCH” caption at the bottom of the HUD is apparently known as Heinlein. It was also used for the movie titles. Since this font isn’t easy to find, you can use another font created to emulate the Heinlein font called Modern Vision, which you can find by searching for it on internet. To use this font in your project, create a new folder called Fonts under your Assets folder. Download the custom font you want to use and drag the TTF file into your Fonts folder. Once this is done, you can simply drag your custom font into the Font field of BottomCenterText or click on the target symbol next to the value field for the font to bring up a selection window. Also, increase the font size for “MATCH” to 32 since the text is a bit bigger than other text in the HUD.

In the screenshots, the word “MATCH” has a white square placed to its right. To emulate this square, create a new InputField (UI -> Input Field) under the HUD object and name it “Square.” Remove the default text, resize it and position it until it matches the screenshots.

Locking the HUD into place

By default, the Canvas will be locked to your world space. You want it to be locked to the screen, however, as it is in the Terminator movies.

To configure a camera-locked view, select the Canvas and examine its properties in the Inspector window. Go to the Render Mode field of your HUD Canvas and select Screen Space – Camera in the drop down menu. Next, drag the Main Camera from your hierarchy view into the Render Camera field of the Canvas. This tells the canvas which camera perspective it is locked to.

The Plane Distance for your HUD is initially set to one meter. This is how far away the HUD will be from your face in the Terminator Vision mixed reality app. Because HoloLens is stereoscopic, adjusting the view for each eye, this is actually a bit close for comfort. The current focal distance for HoloLens is two meters, so we should set the plane distance at least that far away.

For convenience, set Plane Distance to 100. All of the content associated with your HUD object will automatically scale so it fills up the same amount of your visual field.

It should be noted that locking visual content to the camera, known as head-locking, is generally discouraged in mixed reality design as it can cause visual comfort. Instead, using body-locked content that tags along with the player is the recommended way to create mixed reality HUDs and menus. For the sake of verisimilitude, however, you’re going to break that rule this time.

La vie en rose

Terminator view is supposed to use heat vision. It places a red hue on everything in the scene. In order to create this effect, you are going to play a bit with shaders.

A shader is a highly optimized algorithm that you apply to an image to change it. If you’ve ever worked with any sort of photo-imaging software, then you are already familiar with shader effects like blurring. To create the heat vision colorization effect, you would configure a shader that adds a transparent red distortion to your scene.

If this were a virtual reality experience, in which the world is occluded, you would apply your shader to the camera using the RenderWithShader method. This method takes a shader and applies it to any game object you look at. In a holographic experience, however, this wouldn’t work since you also want to apply the distortion to real-life objects.

In the Unity toolbar, select Assets -> Create -> Material to make a new material object. In the Shader field, click on the drop-down menu and find HoloToolkit -> Lambertian Configurable Transparent. The shaders that come with the HoloToolkit are typically much more performant in HoloLens apps and should be preferred. The Lambertian Configurable Transparent shader will let you select a red to apply; (200, 43, 38) seems to work well, but you should choose the color values that look good to you.

Add a new plane (3D Object -> Plane) to your HUD object and call it “Thermal.” Then drag your new material with the configured Lambertian shader onto the Thermal plane. Set the Rotation of your plane to 270 and set the Scale to 100, 1, 100 so it fills up the view.

Finally, because you don’t want the red colorization to affect your text, set the Z position of each of your Text objects to -10. This will pull the text out in front of your HUD a little so it stands out from the heat vision effect.

Deploy your project to a device or the emulator to see how your Terminator Vision is looking.

Making the text dynamic

To hook up the HUD to Cognitive Services, first orchestrate a way to make the text dynamic. Select your HUD object. Then, in the Inspector window, click on Add Component -> New Script and name your script “Hud.”

Double-click Hud.cs to edit your script in Visual Studio. At the top of your script, create four public fields that will hold references to the Text objects in your project. Save your changes.


public Text InfoPanel;
    public Text AnalysisPanel;
    public Text ThreatAssessmentPanel;
    public Text DiagnosticPanel;

If you look at the Hud component in the Inspector, you should now see four new fields that you can set. Drag the HUD Text objects into these fields, like so.

In the Start method, add some default text so you know the dynamic text is working.


  void Start()
    {
        AnalysisPanel.text = "ANALYSIS:n**************ntestntestntest";
        ThreatAssessmentPanel.text = "SCAN MODE XXXXXnINITIALIZE";
        InfoPanel.text = "CONNECTING";
 //...
    }

When you deploy and run the Terminator Vision app, the default text should be overwritten with the new text you assign in Start. Now set up a System.Threading.Timer to determine how often you will scan the room for analysis. The Timer class measures time in milliseconds. The first parameter you pass to it is a callback method. In the code shown below, you will call the Tick method every 30 seconds. The Tick method, in turn, will call a new method named AnalyzeScene, which will be responsible for taking a photo of whatever the Terminator sees in front of him using the built-in color camera, known as the locatable camera, and sending it to Cognitive Services for further analysis.


    System.Threading.Timer _timer;
    void Start()
    {
        //...

        int secondsInterval = 30;
        _timer = new System.Threading.Timer(Tick, null, 0, secondsInterval * 1000);

    }

    private void Tick(object state)
    {
        AnalyzeScene();
    }

Unity accesses the locatable camera in the same way it would normally access any webcam. This involves a series of calls to create the photo capture instance, configure it, take a picture and save it to the device. Along the way, you can also add Terminator-style messages to send to the HUD in order to indicate progress.


    void AnalyzeScene()
    {
        InfoPanel.text = "CALCULATION PENDING";
        PhotoCapture.CreateAsync(false, OnPhotoCaptureCreated);
    }

    PhotoCapture _photoCaptureObject = null;
    void OnPhotoCaptureCreated(PhotoCapture captureObject)
    {
        _photoCaptureObject = captureObject;

        Resolution cameraResolution = PhotoCapture.SupportedResolutions.OrderByDescending((res) => res.width * res.height).First();

        CameraParameters c = new CameraParameters();
        c.hologramOpacity = 0.0f;
        c.cameraResolutionWidth = cameraResolution.width;
        c.cameraResolutionHeight = cameraResolution.height;
        c.pixelFormat = CapturePixelFormat.BGRA32;

        captureObject.StartPhotoModeAsync(c, OnPhotoModeStarted);
    }

    private void OnPhotoModeStarted(PhotoCapture.PhotoCaptureResult result)
    {
        if (result.success)
        {
            string filename = string.Format(@"terminator_analysis.jpg");
            string filePath = System.IO.Path.Combine(Application.persistentDataPath, filename);
            _photoCaptureObject.TakePhotoAsync(filePath, PhotoCaptureFileOutputFormat.JPG, OnCapturedPhotoToDisk);
        }
        else
        {
            DiagnosticPanel.text = "DIAGNOSTICn**************nnUnable to start photo mode.";
            InfoPanel.text = "ABORT";
        }
    } 

If the photo is successfully taken and saved, you will grab it, serialize it as an array of bytes and send it to Cognitive Services to retrieve an array of tags that describe the room as well. Finally, you will dispose of the photo capture object.


    void OnCapturedPhotoToDisk(PhotoCapture.PhotoCaptureResult result)
    {
        if (result.success)
        {
            string filename = string.Format(@"terminator_analysis.jpg");
            string filePath = System.IO.Path.Combine(Application.persistentDataPath, filename);

            byte[] image = File.ReadAllBytes(filePath);
            GetTagsAndFaces(image);
            ReadWords(image);
        }
        else
        {
            DiagnosticPanel.text = "DIAGNOSTICn**************nnFailed to save Photo to disk.";
            InfoPanel.text = "ABORT";
        }
        _photoCaptureObject.StopPhotoModeAsync(OnStoppedPhotoMode);
    }

    void OnStoppedPhotoMode(PhotoCapture.PhotoCaptureResult result)
    {
        _photoCaptureObject.Dispose();
        _photoCaptureObject = null;
    }

In order to make a REST call, you will need to use the Unity WWW object. You also need to wrap the call in a Unity coroutine in order to make the call non-blocking. You can also get a free Subscription Key to use the Microsoft Cognitive Services APIs just by signing up.


    string _subscriptionKey = "b1e514eYourKeyGoesHere718c5";
    string _computerVisionEndpoint = "https://westus.api.cognitive.microsoft.com/vision/v1.0/analyze?visualFeatures=Tags,Faces";
    public void GetTagsAndFaces(byte[] image)
    {
            coroutine = RunComputerVision(image);
            StartCoroutine(coroutine);
    }

    IEnumerator RunComputerVision(byte[] image)
    {
        var headers = new Dictionary<string, string>() {
            { "Ocp-Apim-Subscription-Key", _subscriptionKey },
            { "Content-Type", "application/octet-stream" }
        };

        WWW www = new WWW(_computerVisionEndpoint, image, headers);
        yield return www;

        List<string> tags = new List<string>();
        var jsonResults = www.text;
        var myObject = JsonUtility.FromJson<AnalysisResult>(jsonResults);
        foreach (var tag in myObject.tags)
        {
            tags.Add(tag.name);
        }
        AnalysisPanel.text = "ANALYSIS:n***************nn" + string.Join("n", tags.ToArray());

        List<string> faces = new List<string>();
        foreach (var face in myObject.faces)
        {
            faces.Add(string.Format("{0} scanned: age {1}.", face.gender, face.age));
        }
        if (faces.Count > 0)
        {
            InfoPanel.text = "MATCH";
        }
        else
        {
            InfoPanel.text = "ACTIVE SPATIAL MAPPING";
        }
        ThreatAssessmentPanel.text = "SCAN MODE 43984nTHREAT ASSESSMENTnn" + string.Join("n", faces.ToArray());
    }

The Computer Vision tagging feature is a way to detect objects in a photo. It can also be used in an application like this one to do on-the-fly object recognition.

When the JSON data is returned from the call to cognitive services, you can use the JsonUtility to deserialize the data into an object called AnalysisResult, shown below.


    public class AnalysisResult
    {
        public Tag[] tags;
        public Face[] faces;

    }

    [Serializable]
    public class Tag
    {
        public double confidence;
        public string hint;
        public string name;
    }

    [Serializable]
    public class Face
    {
        public int age;
        public FaceRectangle facerectangle;
        public string gender;
    }

    [Serializable]
    public class FaceRectangle
    {
        public int height;
        public int left;
        public int top;
        public int width;
    }

One thing to be aware of when you use JsonUtility is that it only works with fields and not with properties. If your object classes have getters and setters, JsonUtility won’t know what to do with them.

When you run the app now, it should update the HUD every 30 seconds with information about your room.

To make the app even more functional, you can add OCR capabilities.


string _ocrEndpoint = "https://westus.api.cognitive.microsoft.com/vision/v1.0/ocr";
public void ReadWords(byte[] image)
{
    coroutine = Read(image);
    StartCoroutine(coroutine);
}


IEnumerator Read(byte[] image)
{
var headers = new Dictionary<string, string>() {
    { "Ocp-Apim-Subscription-Key", _subscriptionKey },
    { "Content-Type", "application/octet-stream" }
};

WWW www = new WWW(_ocrEndpoint, image, headers);
yield return www;

List<string> words = new List<string>();
var jsonResults = www.text;
var myObject = JsonUtility.FromJson<OcrResults>(jsonResults);
foreach (var region in myObject.regions)
foreach (var line in region.lines)
foreach (var word in line.words)
{
    words.Add(word.text);
}

string textToRead = string.Join(" ", words.ToArray());
if (myObject.language != "unk")
{
    DiagnosticPanel.text = "(language=" + myObject.language + ")n" + textToRead;
}
}

This service will pick up any words it finds and redisplay them for the Terminator.

It will also attempt to determine the original language of any words that it finds, which in turn can be used for further analysis.

Conclusion

In this post, you discovered how to recreate a cool visual effect from an iconic sci-fi movie. You also found out how to call Microsoft Cognitive Services from Unity in order to make a richer recreation.

You can extend the capabilities of the Terminator Vision app even further by taking the text you find through OCR and calling Cognitive Services to translate it into another language using the Translator API. You could then use the Bing Speech API to read the text back to you in both the original language and the translated language. This, however, goes beyond the original goal of recreating the Terminator Vision scenario from the 1984 James Cameron film and starts sliding into the world of personal assistants, which is another topic for another time.

View the source code for Terminator Vision on Github here.

Read More

Cognitive Services APIs: Vision

What exactly are Cognitive Services and what are they for? Cognitive Services are a set of machine learning algorithms that Microsoft has developed to solve problems in the field of Artificial Intelligence (AI). The goal of Cognitive Services is to democratize AI by packaging it into discrete components that are easy for developers to use in their own apps. Web and Universal Windows Platform developers can consume these algorithms through standard REST calls over the Internet to the Cognitive Services APIs.

The Cognitive Services APIs are grouped into five categories…

  • Vision—analyze images and videos for content and other useful information.
  • Speech—tools to improve speech recognition and identify the speaker.
  • Language—understanding sentences and intent rather than just words.
  • Knowledge—tracks down research from scientific journals for you.
  • Search—applies machine learning to web searches.

So why is it worthwhile to provide easy access to AI? Anyone watching tech trends realizes we are in the middle of a period of huge AI breakthroughs right now with computers beating chess champions, go masters and Turing tests. All the major technology companies are in an arms race to hire the top AI researchers.

Along with high profile AI problems that researchers know about, like how to beat the Turing test and how to model computer neural-networks on human brains, are discrete problems that developers are concerned about, like tagging our family photos and finding an even lazier way to order our favorite pizza on a smartphone. The Cognitive Services APIs are a bridge allowing web and UWP developers to use the resources of major AI research to solve developer problems. Let’s get started by looking at the Vision APIs.

Cognitive Services Vision APIs

The Vision APIs are broken out into five groups of tasks…

  • Computer Vision—Distill actionable information from images.
  • Content Moderator—Automatically moderate text, images and videos for profanity and inappropriate content.
  • Emotion—Analyze faces to detect a range of moods.
  • Face—identify faces and similarities between faces.
  • Video—Analyze, edit and process videos within your app.

Because the Computer Vision API on its own is a huge topic, this post will mainly deal with just its capabilities as an entry way to the others. The description of how to use it, however, will provide you good sense of how to work with the other Vision APIs.

Note: Many of the Cognitive Services APIs are currently in preview and are undergoing improvement and change based on user feedback.

One of the biggest things that the Computer Vision API does is tag and categorize an image based on what it can identify inside that image. This is closely related to a computer vision problem known as object recognition. In its current state, the API recognizes about 2000 distinct objects and groups them into 87 classifications.

Using the Computer Vision API is pretty easy. There are even samples available for using it on a variety of development platforms including NodeJS, the Android SDK and the Swift SDK. Let’s do a walkthrough of building a UWP app with C#, though, since that’s the focus of this blog.

The first thing you need to do is register at the Cognitive Services site and request a key for the Computer Vision Preview (by clicking on one of the “Get Started for Free” buttons.

Next, create a new UWP project in Visual Studio and add the ProjectOxford.Vision NuGet package by opening Tools | NuGet Package Manager | Manage Packages for Solution and selecting it. (Project Oxford was an earlier name for the Cognitive Services APIs.)

For a simple user interface, you just need an Image control to preview the image, a Button to send the image to the Computer Vision REST Services and a TextBlock to hold the results. The workflow for this app is to select an image -> display the image -> send the image to the cloud -> display the results of the Computer Vision analysis.


<Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
    <Grid.RowDefinitions>
        <RowDefinition Height="9*"/>
        <RowDefinition Height="*"/>
    </Grid.RowDefinitions>
    <Grid.ColumnDefinitions>
        <ColumnDefinition/>
        <ColumnDefinition/>
    </Grid.ColumnDefinitions>
    <Border BorderBrush="Black" BorderThickness="2">
    <Image x:Name="ImageToAnalyze" />
    </Border>
    <Button x:Name="AnalyzeButton" Content="Analyze" Grid.Row="1" Click="AnalyzeButton_Click"/>
    <TextBlock x:Name="ResultsTextBlock" TextWrapping="Wrap" Grid.Column="1" Margin="30,5"/>
</Grid>

When the Analyze Button gets clicked, the handler in the Page’s code-behind will open a FileOpenPicker so the user can select an image. In the ShowPreviewAndAnalyzeImage method, the returned image is used as the image source for the Image control.


readonly string _subscriptionKey;

public MainPage()
{
    //set your key here
    _subscriptionKey = "b1e514ef0f5b493xxxxx56a509xxxxxx";
    this.InitializeComponent();
}

private async void AnalyzeButton_Click(object sender, RoutedEventArgs e)
{
    var openPicker = new FileOpenPicker
    {
        ViewMode = PickerViewMode.Thumbnail,
        SuggestedStartLocation = PickerLocationId.PicturesLibrary
    };
    openPicker.FileTypeFilter.Add(".jpg");
    openPicker.FileTypeFilter.Add(".jpeg");
    openPicker.FileTypeFilter.Add(".png");
    openPicker.FileTypeFilter.Add(".gif");
    openPicker.FileTypeFilter.Add(".bmp");
    var file = await openPicker.PickSingleFileAsync();

    if (file != null)
    {
        await ShowPreviewAndAnalyzeImage(file);
    }
}

private async Task ShowPreviewAndAnalyzeImage(StorageFile file)
{
    //preview image
    var bitmap = await LoadImage(file);
    ImageToAnalyze.Source = bitmap;

    //analyze image
    var results = await AnalyzeImage(file);

    //"fr", "ru", "it", "hu", "ja", etc...
    var ocrResults = await AnalyzeImageForText(file, "en");

    //parse result
    ResultsTextBlock.Text = ParseResult(results) + "nn " + ParseOCRResults(ocrResults);
}

The real action happens when the returned image then gets passed to the VisionServiceClient class included in the Project Oxford NuGet package you imported. The Computer Vision API will try to recognize objects in the image you pass to it and recommend tags for your image. It will also analyze the image properties, color scheme, look for human faces and attempt to create a caption, among other things.


private async Task<AnalysisResult> AnalyzeImage(StorageFile file)
{

    VisionServiceClient VisionServiceClient = new VisionServiceClient(_subscriptionKey);

    using (Stream imageFileStream = await file.OpenStreamForReadAsync())
    {
        // Analyze the image for all visual features
        VisualFeature[] visualFeatures = new VisualFeature[] { VisualFeature.Adult, VisualFeature.Categories
            , VisualFeature.Color, VisualFeature.Description, VisualFeature.Faces, VisualFeature.ImageType
            , VisualFeature.Tags };
        AnalysisResult analysisResult = await VisionServiceClient.AnalyzeImageAsync(imageFileStream, visualFeatures);
        return analysisResult;
    }
}

And it doesn’t stop there. With a few lines of code, you can also use the VisionServiceClient class to look for text in the image and then return anything that the Computer Vision API finds. This OCR functionality currently recognizes about 26 different languages.


private async Task<OcrResults> AnalyzeImageForText(StorageFile file, string language)
{
    //language = "fr", "ru", "it", "hu", "ja", etc...
    VisionServiceClient VisionServiceClient = new VisionServiceClient(_subscriptionKey);
    using (Stream imageFileStream = await file.OpenStreamForReadAsync())
    {
        OcrResults ocrResult = await VisionServiceClient.RecognizeTextAsync(imageFileStream, language);
        return ocrResult;
    }
}

Combining the image analysis and text recognition features of the Computer Vision API will return results like that shown below.

The power of this particular Cognitive Services API is that it will allow you to scan your device folders for family photos and automatically start tagging them for you. If you add in the Face API, you can also tag your photos with the names of family members and friends. Throw in the Emotion API and you can even start tagging the moods of the people in your photos. With Cognitive Services, you can take a task that normally requires human judgement and combine it with the indefatigability of a machine (in this case a machine that learns) in order to perform this activity quickly and indefinitely on as many photos as you own.

Wrapping Up

In this first post in the Cognitive API series, you received an overview of Cognitive Services and what it offers you as a developer. You also got a closer look at the Vision APIs and a walkthrough of using one of them. In the next post, we’ll take a closer look at the Speech APIs. If you want to dig deeper on your own, here are some links to help you on your way…

Read More

Cortana to open up to new devices and developers with Cortana Skills Kit and Cortana Devices SDK

We believe that everyone deserves a personal assistant. One to help you cope as you battle to stay on top of everything, from work to your home life. Calendars, communications and commitments. An assistant that is available everywhere you need it, working in concert with the experts you rely on to get things done.

We’re at the beginning of a technological revolution in artificial intelligence. The personal digital assistant is the interface where all the powers of that intelligence can become an extension of each one of us. Delivering on this promise will take a community that is equally invested in the outcome and able to share in the benefits.

Today we are inviting you to join us in this vision for Cortana with the announcement of the Cortana Skills Kit and Cortana Devices SDK.

The Cortana Skills Kit is designed to help developers reach the growing audience of 145 million Cortana users, helping users get things done while driving discovery and engagement across platforms: Windows, Android, iOS, Xbox and new Cortana-powered devices.

The Cortana Devices SDK will allow OEMs and ODMs to create a new generation of smart, personal devices – on no screen or the big screen, in homes and on wheels.

Developers and device manufacturers can sign up today to receive updates as we move out of private preview.

Cortana Skills Kit Preview

The Cortana Skills Kit will allow developers to leverage bots created with the Microsoft Bot Framework and publish them to Cortana as a new skill, to integrate their web services as skills and to repurpose code from their existing Alexa skills to create Cortana skills. It will connect users to skills when users ask, and proactively present skills to users in the appropriate context. And it will help developers personalize their experiences by leveraging Cortana’s understanding of users’ preferences and context, based on user permissions.

In today’s San Francisco event, we showed how early development partners are working with the private preview of the Cortana Skills Kit ahead of broader availability in February 2017.

  • Knowmail is applying AI to the problem of email overload and used the Bot Framework to build a bot which they’ve published to Cortana. Their intelligent solution works in Outlook and Office 365, learning your email habits in order to prioritize which emails to focus on while on-the-go in the time you have available.
  • We showed how Capital One, the first financial services company to sign on to the platform, leveraged existing investments in voice technology to enable customers to efficiently manage their money through a hands-free, natural language conversation with Cortana.
  • Expedia has published a bot to Skype using the Microsoft Bot Framework, and they demonstrated how the bot, as a new Cortana skill, will help users book hotels.
  • We demonstrated TalkLocal’s Cortana skill, which allows people to find local services using natural language. For example, “Hey Cortana, there’s a leak in my ceiling and it’s an emergency” gets Talk Local looking for a plumber.

Developers can sign up today to stay up to date with news about the Cortana Skills Kit.

Cortana Devices SDK for device manufacturers

We believe that your personal assistant needs to help across your day wherever you are: home, at work and everywhere in between. We refer to this as Cortana being “unbound” – tied to you, not to any one platform or device. That’s why Cortana is available on Windows 10, on Android and iOS, on Xbox and across mobile platforms.

We shared last week that Cortana will be included in the IoT Core edition of the Windows 10 Creators Update, which powers IoT devices.

The next step in this journey is the Cortana Devices SDK, which makes Cortana available to all OEMs and ODMs to build smarter devices on all platforms.

It will carry Cortana’s promise in personal productivity everywhere and deliver real-time, two-way audio communications with Skype, Email, calendar and list integration – all helping Cortana make life easier, everywhere. And, of course, it will carry Cortana expert skills across devices.

We are working with partners across a range of industries and hardware categories, including some exciting work with connected cars. The devices SDK is designed for diversity, supporting cross platforms including Windows IoT, Linux, Android and more through open-source protocols and libraries.

One early device partner, Harman Kardon, a leader in premium audio, will have more news to share next year about their plans, but today provided a sneak peek at their new device coming in 2017.

The Cortana Devices SDK is currently in private preview and will be available more broadly in 2017. If you are an OEM or ODM interested in including Cortana in your device, please contact us using this form to receive updates on the latest news about the Cortana Devices SDK and to be considered for access to the early preview.

Read More

Welcome to the invisible revolution

Think of your favorite pieces of technology. These are the things that you use every day for work and play, and pretty much can’t live without.

Chances are, at least one of them is a gadget – your phone, maybe, or your gaming console.

But if you really think about it, chances also are good that many of your most beloved technologies are no longer made of plastic, metal and glass.

Maybe it’s a streaming video service you use to binge watch “Game of Thrones” on or an app that lets you track your steps and calories so you can fit into those jeans you wore back in high school. Maybe it’s a virtual assistant that helps you remember where your meetings are and when you need to take your medicine, or an e-reader that lets you get lost in your favorite book via your phone, tablet or even car speakers.

Perhaps, quietly and without even realizing it, your most beloved technologies have gone from being things you hold to services you rely on, and that exist everywhere and nowhere. Instead of the gadgets themselves, they are tools that you expect to be able to use on any type of gadget: Your phone, your PC, maybe even your TV.

They are part of what Harry Shum, executive vice president in charge of Microsoft’s Technology and Research division, refers to as an “invisible revolution.”

“We are on the cusp of creating a world in which technology is increasingly pervasive but is also increasingly invisible,” Shum said.

Read the full story.

The post Welcome to the invisible revolution appeared first on The Official Microsoft Blog.

Read More

Microsoft expands IT training for active-duty US service members, ‘Halo 5: Guardians’ breaks records – Weekend Reading: Nov. 6 edition

It was a good week for Master Chief, and for U.S. service members seeking to master IT skills to help them transition from military to civilian life. Let’s get to it!

The Microsoft Software & Systems Academy (MSSA) is expanding from three locations to nine, and will be servicing 12 military installations. The MSSA program uses a service member’s time prior to transitioning out of the service to train him or her in specialized technology management areas like server cloud/database, business intelligence and software development. After successfully completing the program, participants have an interview for a full-time job at Microsoft or one of its hiring partners. “On this Veterans Day 2015, it’s the responsibility of the IT industry to honor those who have served with more than an artillery salute and a brief word of thanks,” says Chris Cortez, vice president of Military Affairs at Microsoft, and retired U.S. Marine Corps major general. “We are compelled to set an example of what it can look like to dig in with our transitioning service members as they prepare to cross the bridge to the civilian world.”

A week after launching worldwide, “Halo 5: Guardians” broke records as biggest Halo launch ever and the fastest-selling Xbox One exclusive game to-date, with more than $400 million in global sales of “Halo 5: Guardians” games and hardware. The “Halo 5: Live” launch celebration also earned a Guinness World Records title for the most-watched video game launch broadcast, with more than 330,000 unique streams on the evening of the broadcast.

Halo 5: Guardians, launch, New York City

In China, millions of people are carrying on casual conversations with a Microsoft technology called XiaoIce. Hsiao-Wuen Hon, corporate vice president in charge of Microsoft Research Asia, sees XiaoIce as an example of the vast potential that artificial intelligence holds — not to replace human tasks and experiences, but rather to augment them, writes Allison Linn. Hon recently joined some of the world’s leading computer scientists at the 21st Century Computing Conference in Beijing, an annual meeting of researchers and computer science students, to discuss some emerging trends.

MSR, China, AI, artificial intelligence

Microsoft and Red Hat announced a partnership that will help customers embrace hybrid cloud computing by providing greater choice and flexibility deploying Red Hat solutions on Microsoft Azure. Also announced: Microsoft acquired Mobile Data Labs, creator of the popular MileIQ app, which takes advantage of sensors in modern mobile devices to automatically and contextually capture, log and calculate business miles, allowing users to confidently claim tax deductions. The acquisition is the latest example of Microsoft’s ambition to reinvent productivity and business process in a mobile-first, cloud-first world, says Rajesh Jha, corporate vice president for Outlook and Office 365.

MileIQ

We got to know some pretty cool people doing really cool things. Among them: The team members of Loop who created the Arrow and Next Lock Screen apps through the Microsoft Garage. We also were introduced to Scott McBride, a Navy vet whose internship at Microsoft led to a full-time job; he’s now a business program manager for Microsoft’s Cloud and Enterprise group. McBride will be helping Microsoft recruit new hires this fall.

Microsoft Garage, Loop Team, apps

Microsoft Loop team photographed in their new workspace, under construction in Bellevue, Washington. (Photography by Scott Eklund/Red Box Pictures)

A game with a deceptively simple, one-word title, “Prune,” is the App of the Week. In it, you give life to a forgotten landscape, and uncover a story that’s hidden deep beneath the soil. You’ll cultivate a sapling into a full-grown tree, and watch it evolve in an elegant but sparse environment. It’s up to you to bring the tree toward the sunlight, or shield it from the dangers of a hostile world. You can install “Prune” for $3.99 from the Windows Store.

Prune, games, Windows

“Prune”

This week on the Microsoft Instagram channel, we met Thavius Beck. Beyond being a musician, Thavius is a performer, producer and teacher. He uses his Surface Book to spread his love of music and perform in completely new ways.

Instagram, Surface Book

Thanks for reading! Have a good weekend, and we’ll see you back here next Friday!

Posted by Suzanne Choney
Microsoft News Center Staff

Read More