Tag Archives: artificial intelligence

Cognitive Services APIs: Speech

Speech recognition is in many ways at the heart of Artificial Intelligence. The 18th Century essayist Samuel Johnson captured this beautifully when he wrote, “Language is the dress of thought.” If the ultimate goal of AI research is a machine that thinks like a human, a reasonable starting point would be to create a machine that understands how humans think. To understand how humans think, in turn, requires an understanding of what humans say.

In the previous post in this series, you learned about the Cognitive Services Vision APIs. In this post, we’re going to complement that with an overview of the Speech APIs. The Cognitive Services Speech APIs are grouped into three categories:

  • Bing Speech—convert spoken audio to text and, conversely, text to speech.
  • Speaker Recognition—identify speakers and use speech recognition for authentication.
  • Custom Speech Service (formerly CRIS)—overcome speech recognition barriers like background noise and specialized vocabulary.

A good way to understand the relationship between the Bing Speech API and the other APIs is that while Big Speech handles taking raw speech and turns it into text without knowing anything about the speaker, Custom Speech Service and Speaker Recognition go further and try to use processing to clean up the raw speech or to compare it against other speech samples. They basically do extra speech analysis work.

Bing Speech for UWP

As a UWP developer, you have several options for accessing speech-to-text capabilities. You can access the UWP Speech APIs found in the Windows.Media.SpeechRecognition namespace. You can also integrate Cortana into your UWP app. Alternatively, you can go straight to the Bing Speech API which underlies both of these technologies.

Bing Speech lets you do text-to-speech and speech-to-text through REST calls to Cognitive Services. The Cognitive Services website provides samples for iOS, Android and Javascript. There’s also a client library NuGet package if you are working in WPF. For UWP, however, you will use the REST APIs.

As with the other Cognitive Services offerings, you first need to pick up a subscription key for Bing Speech in order to make calls to the API. In UWP, you then need to record microphone input using the MediaCapture class and encode it before sending it to Bing Speech. (Gotcha Warning — be sure to remember to check off the Microphone capability in your project’s app manifest file so the mic can be accessed, otherwise you may spend hours wondering why the code doesn’t work for you.)

var CaptureMedia = new MediaCapture();
var captureInitSettings = new MediaCaptureInitializationSettings();
captureInitSettings.StreamingCaptureMode = StreamingCaptureMode.Audio;
await CaptureMedia.InitializeAsync(captureInitSettings);
MediaEncodingProfile encodingProfile = MediaEncodingProfile.CreateWav(AudioEncodingQuality.Medium);
AudioStream = new InMemoryRandomAccessStream();
await CaptureMedia.StartRecordToStreamAsync(encodingProfile, AudioStream);

Once you are done recording, you can use the standard HttpClient class to send the audio stream to Cognitive Services for processing, like so…

// build REST message
cookieContainer = new CookieContainer();
handler = new HttpClientHandler() { CookieContainer = cookieContainer };
client = new HttpClient(handler);
client.DefaultRequestHeaders.TryAddWithoutValidation("Content-Type", "audio / wav; samplerate = 16000");
// authenticate the REST call
client.DefaultRequestHeaders.TryAddWithoutValidation("Authorization", _subscriptionKey);
// pass in the Bing Speech endpoint
request = new HttpRequestMessage(HttpMethod.Post, uri);
// pass in the audio stream
request.Content = new ByteArrayContent(fileBytes);
// make REST call to CogSrv
response = await client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead, cancellationToken);

Getting these calls right may seem a bit hairy at first. To make integrating Bing Speech easier, Microsoft MVP Gian Paolo Santopaolo has created a UWP reference app on GitHub with several useful helper classes you can incorporate into your own speech recognition project. This reference app also includes a sample for reversing the process and doing text-to-speech.

Speaker Recognition

While the Bing Speech API can figure out what you are saying without knowing anything about who you are as a speaker, the Speaker Recognition API in Cognitive Services is all about figuring out who you without caring about what you are specifically saying. There’s a nice symmetry to this. Using machine learning, the Speaker Recognition API finds qualities in your voice that identify you almost as well as your fingerprints or retinal pattern do.

This API is typically used for two purposes: identification and verification. Identification allows a voice to be compared to a group of voices in order to find the best match. This is the auditory equivalent to how the Cognitive Services Face API matches up faces that resemble each other.

Speaker verification allows you to use a person’s voice as part of a two-factor login mechanism. For verification to work, the speaker must say a specific, pre-selected passphrase like “apple juice tastes funny after toothpaste” or “I am going to make him an offer he cannot refuse.” The initial recording of a passphrase to compare against is called enrollment. (It hardly needs to be said but—please don’t use “password” for your speaker verification passphrase.)

There is a client library that supports speaker enrollment, speaker verification and speaker identification. Per usual, you need to sign up for a Speaker Recognition subscription key to use it. You can add the client library to your UWP project in Visual Studio by installing the Microsoft.ProjectOxford.SpeakerRecognition NuGet package.

Using the media capture code from the Bing Speech sample above to record on the microphone, and assuming that the passphrase has already been enrolled for the user through her Speaker Id (a Guid), verification is as easy as calling the Speaker Recognition client library VerifyAsync method and passing the audio stream and Speaker Id as parameters.

string _subscriptionKey;
Guid _speakerId;
Stream audioStream;

public async void VerifySpeaker()
    var serviceClient = new SpeakerVerificationServiceClient(_subscriptionKey);
    Verification response = await serviceClient.VerifyAsync(audioStream, _speakerId);

    if (response.Result == Result.Accept)
        // verification successful


Sample projects are available showing how to use Speaker Recognition with Android, Python and WPF. Because of the close similarities between UWP and WPF, you will probably find the last sample useful as a reference for using this Cognitive Service in your UWP app.

Custom Speech Service

You already know how to use the Bing Speech speech-to-text capability introduced at the top of this post. That Cognitive Service is built around generalized language models to work for most people most of the time. But what if you want to do speech recognition involving specialized jargon or vocabulary? To handle these situations, you might need a custom language model rather than the one used by the speech-to-text engine in Bing Speech.

Along the same lines, the generalized acoustic model used to train Bing Speech may not work well for you if your app is likely to be used in an atypical acoustic environment like an air hangar or a factory floor.

Custom Speech Service lets you build custom language models and custom acoustic models for your speech-to-text engine. You can then set these up as custom REST endpoints for doing calls to Cognitive Services from your app. These RESTful endpoints can also be used from any device and from any software platform that can make REST calls. It’s basically a really powerful machine learning tool that lets you take the speech recognition capabilities of your app to a whole new level. Additionally, since all that changes is the endpoint you call, any previous code you have written to use the Bing Speech API should work without any alteration other than the Uri you are targeting.

Wrapping Up

In this post, we went over the Bing Speech APIs for speech-to-text and text-to-speech as well as the extra APIs for cleaning up raw speech input and doing comparisons and verification using speech input. In the next post in the Cognitive APIs Series, we’ll take a look using the Language Understanding Intelligent Service (LUIS) to derive meaning from speech in order to figure out what people really want when they ask for something. In the meantime, here are some additional resources so you can learn more about the Speech APIs on your own.

Cortana to open up to new devices and developers with Cortana Skills Kit and Cortana Devices SDK

We believe that everyone deserves a personal assistant. One to help you cope as you battle to stay on top of everything, from work to your home life. Calendars, communications and commitments. An assistant that is available everywhere you need it, working in concert with the experts you rely on to get things done.

We’re at the beginning of a technological revolution in artificial intelligence. The personal digital assistant is the interface where all the powers of that intelligence can become an extension of each one of us. Delivering on this promise will take a community that is equally invested in the outcome and able to share in the benefits.

Today we are inviting you to join us in this vision for Cortana with the announcement of the Cortana Skills Kit and Cortana Devices SDK.

The Cortana Skills Kit is designed to help developers reach the growing audience of 145 million Cortana users, helping users get things done while driving discovery and engagement across platforms: Windows, Android, iOS, Xbox and new Cortana-powered devices.

The Cortana Devices SDK will allow OEMs and ODMs to create a new generation of smart, personal devices – on no screen or the big screen, in homes and on wheels.

Developers and device manufacturers can sign up today to receive updates as we move out of private preview.

Cortana Skills Kit Preview

The Cortana Skills Kit will allow developers to leverage bots created with the Microsoft Bot Framework and publish them to Cortana as a new skill, to integrate their web services as skills and to repurpose code from their existing Alexa skills to create Cortana skills. It will connect users to skills when users ask, and proactively present skills to users in the appropriate context. And it will help developers personalize their experiences by leveraging Cortana’s understanding of users’ preferences and context, based on user permissions.

In today’s San Francisco event, we showed how early development partners are working with the private preview of the Cortana Skills Kit ahead of broader availability in February 2017.

  • Knowmail is applying AI to the problem of email overload and used the Bot Framework to build a bot which they’ve published to Cortana. Their intelligent solution works in Outlook and Office 365, learning your email habits in order to prioritize which emails to focus on while on-the-go in the time you have available.
  • We showed how Capital One, the first financial services company to sign on to the platform, leveraged existing investments in voice technology to enable customers to efficiently manage their money through a hands-free, natural language conversation with Cortana.
  • Expedia has published a bot to Skype using the Microsoft Bot Framework, and they demonstrated how the bot, as a new Cortana skill, will help users book hotels.
  • We demonstrated TalkLocal’s Cortana skill, which allows people to find local services using natural language. For example, “Hey Cortana, there’s a leak in my ceiling and it’s an emergency” gets Talk Local looking for a plumber.

Developers can sign up today to stay up to date with news about the Cortana Skills Kit.

Cortana Devices SDK for device manufacturers

We believe that your personal assistant needs to help across your day wherever you are: home, at work and everywhere in between. We refer to this as Cortana being “unbound” – tied to you, not to any one platform or device. That’s why Cortana is available on Windows 10, on Android and iOS, on Xbox and across mobile platforms.

We shared last week that Cortana will be included in the IoT Core edition of the Windows 10 Creators Update, which powers IoT devices.

The next step in this journey is the Cortana Devices SDK, which makes Cortana available to all OEMs and ODMs to build smarter devices on all platforms.

It will carry Cortana’s promise in personal productivity everywhere and deliver real-time, two-way audio communications with Skype, Email, calendar and list integration – all helping Cortana make life easier, everywhere. And, of course, it will carry Cortana expert skills across devices.

We are working with partners across a range of industries and hardware categories, including some exciting work with connected cars. The devices SDK is designed for diversity, supporting cross platforms including Windows IoT, Linux, Android and more through open-source protocols and libraries.

One early device partner, Harman Kardon, a leader in premium audio, will have more news to share next year about their plans, but today provided a sneak peek at their new device coming in 2017.

The Cortana Devices SDK is currently in private preview and will be available more broadly in 2017. If you are an OEM or ODM interested in including Cortana in your device, please contact us using this form to receive updates on the latest news about the Cortana Devices SDK and to be considered for access to the early preview.

Microsoft grants help kids learn computer science, Earth Day is celebrated and influential engineer is honored — Weekend Reading: April 22 edition

From a huge effort to help kids realize their potential to a celebration of our dear old planet, this week brought plenty of interesting and inspiring news around Microsoft. We’ve rounded up some of the highlights in this latest edition of Weekend Reading.

Earlier this week, Microsoft announced grants to 100 nonprofit partners in 55 countries as part of YouthSpark, a global initiative to increase access for young people to learn computer science. In turn, these nonprofit partners — such as Laboratoria, CoderDojo and City Year — will use the power of local schools, businesses and community organizations to empower students to achieve more for themselves, their families and their communities.

The nonprofits will build upon the work that Microsoft already has underway through programs like Hour of Code with Code.org, BBC micro:bit and TEALS.

Every young person should have an opportunity, a spark, to realize a more promising future,” Mary Snapp, corporate vice president and head of Microsoft Philanthropies, wrote in a blog post on Wednesday. “Together with our nonprofit partners, we are excited to take a bold step toward that goal today.”

WR Youthspark image

Wondering what the next wave of breakthrough technology will be? Harry Shum, executive vice president of Microsoft Technology and Research, calls it an “invisible revolution,” and it’s transforming farming, allowing people from different cultures to communicate, helping people breathe healthier air, preventing disease outbreaks and much more.

“We are on the cusp of creating a world in which technology is increasingly pervasive but is also increasingly invisible,” Shum said.

This week on the Microsoft Facebook page, we joined the invisible revolution to preview the latest, most cutting-edge developments in artificial intelligence, machine learning and cloud computing. The possibilities are endless.

Invisible revolution GIF

Computer industry luminaries honored Dave Cutler, a Microsoft senior technical fellow whose impressive body of work spans five decades, as a Computer History Museum Fellow. The 74-year-old has shaped entire eras. He worked to develop the VMS operating system for Digital Equipment Corporation in the late 1970s, had a central role in the development of Windows NT — the basis for all major versions of Windows since 1993 — and helped develop the Microsoft Azure cloud operating system and the hypervisor for Xbox One that allows the console to be more than just for gaming.

“The Fellow awards recognize people who’ve had a tremendous impact on our lives, on our culture, on the way we work, exchange information and live,” said John Hollar, the museum’s president and CEO. “People like Dave Cutler, who probably influences the computing experiences of more than 2 billion people, yet isn’t known in a way he deserves to be, in proportion to the impact he’s had on the world.”

WR Engineer award

Microsoft Philanthropies sponsored the annual We Day, supporting exciting events Wednesday in Seattle and earlier this month in Los Angeles. Nearly 30,000 attended the shows, which celebrate young people who are making a difference.

In supporting We Day, Microsoft aims to help young people drive the change they would like to see in their neighborhoods, schools and communities. Our photo gallery captures the highlights, famous faces and young people who were involved in this year’s events.

WR_We day

In advance of Earth Day on Friday, Microsoft kicked off this week with inspiration and information about the company’s sustainability programs and initiatives, including ways you can take part in the efforts. The  brand new Environmental Sustainability at Microsoft website details how Microsoft’s company-wide carbon fee have financed significant investments in renewable energy to power its data centers, improved building efficiency and reached more than 6 million people through the purchase of carbon offsets from community projects around the world.

Microsoft, which has been a carbon-neutral company since 2012, is continually finding ways to make its products and their lifecycles more earth-friendly. Learn more about how Microsoft is commemorating Earth Day on the Microsoft Green Blog.

WR_earth day

Microsoft is also constantly working to help students achieve more. Some all-new education features coming in the Windows 10 Anniversary Update are specifically inspired by teachers and focused on students. A “Set Up School PCs” app lets teachers set up a device themselves in mere minutes, and a new “Take a Test” provides simple and secure standardized testing for classrooms or entire schools.

Learning will also get a big boost with Microsoft Classroom and Microsoft Forms, a OneNote Class Notebook that now has Learning Management System (LMS) integration and — perhaps most exciting to students — the dawn of “Minecraft: Education Edition.” Educators will be able to give it a test run in the summer months and provide feedback and suggestions.

In apps this week, the powerful mobile photo-editing app PicsArt is marking Earth Day by offering a series of green- and outdoorsy-themed photo frame and clip art packages. Several are exclusive to Windows customers. The PicsArt app is free in the Windows Store.

Need a little help juggling projects, priorities and other moving parts in your busy life? The Todoist Windows 10 app can help you stay organized, collaborate with colleagues and even empty your inbox by turning important emails into tasks.

Or for a little fun this weekend, go way beyond retro to prehistoric days in “Age of Cavemen.” In this multiplayer strategy game, you’re the village chief in a dangerous world, and you need to keep your people safe. Build an army, create alliances and destroy your opponents in a wild and wooly free-for-all.

WR apps image

And that’s a wrap for this edition of Weekend Reading. See you here next week for the latest roundup.

Posted by Tracy Ith
Microsoft News Center Staff

The post Microsoft grants help kids learn computer science, Earth Day is celebrated and influential engineer is honored — Weekend Reading: April 22 edition appeared first on The Official Microsoft Blog.

Future of Work, advancing AI on GitHub and Bing’s predictions for upcoming primaries and caucuses — Weekend Reading: Jan. 29 edition

This installment of Weekend Reading leads off with a new website that focuses on workplace culture, a research toolkit tied to artificial intelligence and Bing’s Election 2016 experience.

The Future of Work explores how the role of an organization’s leader may be changing, but is more important than ever. In order to create organizations that can adapt to change, leaders need to actively shape an open workplace culture that fosters collaboration and builds trust. On this site, PopTech and Microsoft Office Envisioning speak to experts on the future of work.


By releasing its Computational Network Toolkit on GitHub, Microsoft is making the tools that its own researchers use to speed up advances in artificial intelligence available to a broader group of developers. Next at Microsoft tells you all about CNTK.


On Feb. 1, the Iowa caucus marks the beginning of the next phase in the 2016 election. Bing’s Election 2016 experience includes the Bing Political Index (BPI), which shows where each candidate stands on the key issues, while predictions will appear alongside live results. Bing Predicts also calls Hillary Clinton and Donald Trump to emerge as the big winners of the upcoming primaries and caucuses.

Peggy Johnson, executive vice president of Business Development at Microsoft, and executive sponsor of Hack for Her

Peggy Johnson, executive vice president of Business Development at Microsoft, and executive sponsor of Hack for Her

Hack for Her, which Microsoft hosted Jan. 12, is bringing together corporations, universities and other organizations to form a community that demonstrates a new approach to strengthen the value proposition of products from a female perspective.

“Building products for everyone starts with gender-inclusive design,” said Peggy Johnson, executive vice president of Business Development at Microsoft, and executive sponsor of Hack for Her. “The biggest emerging market today isn’t China or India; it’s women, who have $18 trillion of spending power.” Read more about it on The Official Microsoft Blog.

We saw the debut of “Rise of the Tomb Raider” on Windows 10, a new Microsoft Garage app called News Pro and the Power BI Windows 10 universal app.  The latest Red Stripe Deals also rolled out Thursday, and on Wednesday our App of the Week was TripAdvisor, which gives travelers using Windows 10 devices a valuable tool in planning vacations. Mulder and Scully fans can also shop in the Windows Store for the return of “The X-Files.”

This week on the Microsoft Instagram channel, we followed the players of Real Madrid as they prepare for the fútbol season ahead. For these athletes, a winning season comes from passion, purpose and preparation.

That’s it for this edition. See you next Friday for another Weekend Reading.

Posted by Athima Chansanchai
Microsoft News Center Staff

Microsoft expands IT training for active-duty US service members, ‘Halo 5: Guardians’ breaks records – Weekend Reading: Nov. 6 edition

It was a good week for Master Chief, and for U.S. service members seeking to master IT skills to help them transition from military to civilian life. Let’s get to it!

The Microsoft Software & Systems Academy (MSSA) is expanding from three locations to nine, and will be servicing 12 military installations. The MSSA program uses a service member’s time prior to transitioning out of the service to train him or her in specialized technology management areas like server cloud/database, business intelligence and software development. After successfully completing the program, participants have an interview for a full-time job at Microsoft or one of its hiring partners. “On this Veterans Day 2015, it’s the responsibility of the IT industry to honor those who have served with more than an artillery salute and a brief word of thanks,” says Chris Cortez, vice president of Military Affairs at Microsoft, and retired U.S. Marine Corps major general. “We are compelled to set an example of what it can look like to dig in with our transitioning service members as they prepare to cross the bridge to the civilian world.”

A week after launching worldwide, “Halo 5: Guardians” broke records as biggest Halo launch ever and the fastest-selling Xbox One exclusive game to-date, with more than $400 million in global sales of “Halo 5: Guardians” games and hardware. The “Halo 5: Live” launch celebration also earned a Guinness World Records title for the most-watched video game launch broadcast, with more than 330,000 unique streams on the evening of the broadcast.

Halo 5: Guardians, launch, New York City

In China, millions of people are carrying on casual conversations with a Microsoft technology called XiaoIce. Hsiao-Wuen Hon, corporate vice president in charge of Microsoft Research Asia, sees XiaoIce as an example of the vast potential that artificial intelligence holds — not to replace human tasks and experiences, but rather to augment them, writes Allison Linn. Hon recently joined some of the world’s leading computer scientists at the 21st Century Computing Conference in Beijing, an annual meeting of researchers and computer science students, to discuss some emerging trends.

MSR, China, AI, artificial intelligence

Microsoft and Red Hat announced a partnership that will help customers embrace hybrid cloud computing by providing greater choice and flexibility deploying Red Hat solutions on Microsoft Azure. Also announced: Microsoft acquired Mobile Data Labs, creator of the popular MileIQ app, which takes advantage of sensors in modern mobile devices to automatically and contextually capture, log and calculate business miles, allowing users to confidently claim tax deductions. The acquisition is the latest example of Microsoft’s ambition to reinvent productivity and business process in a mobile-first, cloud-first world, says Rajesh Jha, corporate vice president for Outlook and Office 365.


We got to know some pretty cool people doing really cool things. Among them: The team members of Loop who created the Arrow and Next Lock Screen apps through the Microsoft Garage. We also were introduced to Scott McBride, a Navy vet whose internship at Microsoft led to a full-time job; he’s now a business program manager for Microsoft’s Cloud and Enterprise group. McBride will be helping Microsoft recruit new hires this fall.

Microsoft Garage, Loop Team, apps

Microsoft Loop team photographed in their new workspace, under construction in Bellevue, Washington. (Photography by Scott Eklund/Red Box Pictures)

A game with a deceptively simple, one-word title, “Prune,” is the App of the Week. In it, you give life to a forgotten landscape, and uncover a story that’s hidden deep beneath the soil. You’ll cultivate a sapling into a full-grown tree, and watch it evolve in an elegant but sparse environment. It’s up to you to bring the tree toward the sunlight, or shield it from the dangers of a hostile world. You can install “Prune” for $3.99 from the Windows Store.

Prune, games, Windows


This week on the Microsoft Instagram channel, we met Thavius Beck. Beyond being a musician, Thavius is a performer, producer and teacher. He uses his Surface Book to spread his love of music and perform in completely new ways.

Instagram, Surface Book

Thanks for reading! Have a good weekend, and we’ll see you back here next Friday!

Posted by Suzanne Choney
Microsoft News Center Staff