Tag Archives: researcher

Accenture: Intelligent operations goal requires data backbone

A newly released report co-authored by Accenture and market researcher HfS reveals 80% of the global enterprises surveyed worry about digital disruption, but many of those companies lack the data backbone that could help them compete.

The report stated that large organizations are “concerned with disruption and competitive threats, especially from new digital-savvy entrants.” Indeed, digital disrupters such as Uber and Lyft in personal transportation, Airbnb in travel and hospitality, and various fintech startups have upset the established order in those industries. The Accenture-HfS report views “intelligent operations” as the remedy for the digital challenge and the key to bolstering customer experience. But the task of improving operations calls for organizations to pursue more than a few mild course corrections, according to Debbie Polishook, group chief executive at Accenture Operations, a business segment that includes business process and cloud services.

In the past, enterprises that encountered friction in their operations would tweak the errant process, add a few more people and take on a Lean Six Sigma project, she noted. Those steps, however, won’t suffice in the current business climate, Polishook said.

“Given what is happening  today with the multichannel, with the various ways customers and employees can interact with you, making tiny tweaks is not going to get it done and meet the expectations of your stakeholders,” she said.

Graphic detailing data quality problems within organizations
Organizations struggle to leverage their data

Hard work ahead

The report, which surveyed 460 technology and services decision-makers in organizations with more than $3 billion in revenue, suggested professional services firms such as Accenture will have their work cut out for them as they prepare clients for the digital era.

The survey noted most enterprises struggle to harness data with an eye toward improving operations and achieving competitive advantage. The report stated “nearly 80% of respondents estimate that 50% [to] 90% of their data is unstructured” and largely inaccessible. A 2017 Accenture report also pointed to a data backbone deficit among corporations: More than 90% of the respondents to that survey said they struggle with data access.

In addition, half of the Accenture-HfS report respondents who were surveyed acknowledged their back office isn’t keeping pace with the front office demands to support digital capabilities.

“Eighty percent of the organizations we talked to are concerned with digital disruption and are starting to note that their back office is not quite keeping up with their front office,” Polishook said. “The entire back office is the boat anchor holding them back.”

That lagging back office is at odds with enterprises’ desire to rapidly roll out products and services. An organization’s operations must be able to accommodate the demand for speed in the context of a digital, online and mobile world, Polishook said.

Enterprises need a “set of operations that can respond to these pressures,” she added. “Most companies are not there yet.”

One reason for the lag: Organizations tend to prioritize new product development and front office concerns when facing digital disruption. Back office systems such as procurement tend to languish.

“Naturally, as clients … are becoming disrupted in the market, they pay attention first to products and services,” Polishook said. “They are finding that is not enough.”

The report’s emphasis on revamped operations as critical to fending off digital disruption mirrors research from MIT Sloan’s Center for Information Systems Research. In a presentation in 2017, Jeanne Ross, principal research scientist at the center, identified a solid operational backbone as one of four keys to digital transformation. The other elements were strategic vision, a focus on customer engagement or digitized solutions and a plan for rearchitecting the business.

The path to intelligent operations

The Accenture-HfS report identified five essential components necessary for intelligent operations: innovative talent, a data backbone, applied intelligence, cloud computing and a “smart partnership ecosystem.”

As for innovative talent, the report cited “entrepreneurial drive, creativity and partnering ability” as enterprises’ top areas of talent focus.

There is a lot of heavy lifting to be done.
Debbie Polishookgroup chief executive, Accenture Operations

“One of the most important pieces getting to intelligent operations is the talent,” Polishook said. She said organizations in the past looked to ERP or business process management to boost operations, but contended there is no technology silver bullet.

The data-driven backbone is becoming an important focus for large organizations. The report stated more than 85% of enterprises “are developing a data strategy around data aggregation, data lakes, or data curation, as well as mechanisms to turn data into insights and then actions.” Big data consulting is already a growing market for channel partners.

In the area of applied intelligence about 90% of the enterprises surveyed identified automation, analytics and AI as technologies that will emerge as the cornerstone of business and process transformation. Channel partners also look forward to the AI field and the expanded use of such automation tools as robotic process automation as among the top anticipated trends of 2018.

Meanwhile, more than 90% of large enterprises expect to realize “plug-and-play digital services, coupled with enterprise-grade security, via the cloud, according to the Accenture-HfS report. And a like percentage of respondents viewed partnering with an ecosystem as important for exploiting market opportunities. The report said enterprises of the future will create “symbiotic relationships with startups, academia, technology providers and platform players.”

The path to achieving intelligent operations calls for considerable effort among all partners involved in the transformation.

“There is a lot of heavy lifting to be done,” Polishook said.

Researchers use AI to improve accuracy of gene editing with CRISPR

From left, Nicolo Fusi, a researcher at Microsoft, Jennifer Listgarten, who recently joined the faculty at UC Berkeley, and John Doench, an associate director at the Broad Institute, collaborated on a method of using AI to improve gene editing results. Photo by Dana J. Quigley.

A collaboration between computer scientists and biologists from research institutions across the United States is yielding a set of computational tools that increase efficiency and accuracy when deploying CRISPR, a gene-editing technology that is transforming industries from healthcare to agriculture.

CRISPR is a nano-sized sewing kit that can be designed to cut and alter DNA at a specific point in a specific gene.

The technology, for example, may lead to breakthrough applications such as modifying cells to combat cancer or produce high-yielding drought-tolerant crops such as wheat and corn.

Elevation, the newest tool released by the team, uses a branch of artificial intelligence known as machine learning to predict so-called off-target effects when editing genes with the CRISPR system.

Although CRISPR shows great promise in a number of fields, one challenge is that lots of genomic regions are similar, which means the nano-sized sewing kit can accidentally go to work on the wrong gene and cause unintended consequences – the so-called off-target effects.

“Off-target effects are something that you really want to avoid,” said Nicolo Fusi, a researcher at Microsoft’s research lab in Cambridge, Massachusetts. “You want to make sure that your experiment doesn’t mess up something else.”

Fusi and former Microsoft colleague Jennifer Listgarten, together with collaborators at the Broad Institute of MIT and Harvard, University of California Los Angeles, Massachusetts General Hospital and Harvard Medical School, describe Elevation in a paper published Jan. 10 in the journal Nature Biomedical Engineering.

Elevation and a complementary tool for predicting on-target effects called Azimuth are publicly available for free as a cloud-based end-to-end guide-design service running on Microsoft Azure as well as via open-source code.

Using the computational tools, researchers can input the name of the gene they want to modify and the cloud-based search engine will return a list of guides that researchers can sort by predicted on-target or off-target effects.

[embedded content]

Nature as engineer

The CRISPR gene-editing system is adapted from a natural virus-fighting mechanism. Scientists discovered it in the DNA of bacteria in the late 1980s and figured out how it works over the course of the next several decades.

“The CRISPR system was not designed, it evolved,” said John Doench, an associate director at the Broad Institute who leads the biological portions of the research collaboration with Microsoft.

CRISPR stands for “clustered regularly interspaced short palindromic repeats,” which describes a pattern of repeating DNA sequences in the genomes of bacteria separated by short, non-repeating spacer DNA sequences.

The non-repeating spacers are copies of DNA from invading viruses, which molecular messengers known as RNA use as a template to recognize subsequent viral invasions. When an invader is detected, the RNA guides the CRISPR complex to the virus and dispatches CRISPR-associated (Cas) proteins to snip and disable the viral gene.

Modern adaptations

In 2012, molecular biologists figured out how to adapt the bacterial virus-fighting system to edit genes in organisms ranging from plants to mice and humans. The result is the CRISPR-Cas9 gene editing technique.

The basic system works like this: Scientists design synthetic guide RNA to match a DNA sequence in the gene they want to cut or edit and set it loose in a cell with the CRISPR-associated protein scissors, Cas9.

Today, the technique is widely used as an efficient and precise way to understand the role of individual genes in everything from people to poplar trees as well as how to change genes to do everything from fight diseases to grow more food.

“If you want to understand how gene dysfunction leads to disease, for example, you need to know how the gene normally functions,” said Doench. “CRISPR has been a complete game changer for that.”

An overarching challenge for researchers is to decide what guide RNA to choose for a given experiment. Each guide is roughly 20 nucleotides; hundreds of potential guides exist for each target gene in a knockout experiment.

In general, each guide has a different on-target efficiency and a different degree of off-target activity.

The collaboration between the computer scientists and biologists is focused on building tools that help researchers search through the guide choices and find the best one for their experiments.

Several research teams have designed rules to determine where off-targets are for any given gene-editing experiment and how to avoid them. “The rules are very hand-made and very hand-tailored,” said Fusi. “We decided to tackle this problem with machine learning.”

Training models

To tackle the problem, Fusi and Listgarten trained a so-called first-layer machine-learning model on data generated by Doench and colleagues. These data reported on the activity for all possible target regions with just one nucleotide mismatch with the guide.

Then, using publicly available data that was previously generated by the team’s Harvard Medical School and Massachusetts General Hospital collaborators, the machine-learning experts trained a second-layer model that refines and generalizes the first-layer model to cases where there is more than one mismatched nucleotide.

The second-layer model is important because off-target activity can occur with far more than just one mismatch between guide and target, noted Listgarten, who joined the faculty at the University of California at Berkeley on Jan. 1.

Finally, the team validated their two-layer model on several other publicly available datasets as well as a new dataset generated by collaborators affiliated with Harvard Medical School and Massachusetts General Hospital.

Some model features are intuitive, such as a mismatch between the guide and nucleotide sequence, noted Listgarten. Others reflect unknown properties encoded in DNA that are discovered through machine learning.

“Part of the beauty of machine learning is if you give it enough things it can latch onto, it can tease these things out,” she said.

Off target scores

Elevation provides researchers with two kinds of off-target scores for every guide: individual scores for one target region and a single overall summary score for that guide.

Target scores are machine-learning based probabilities provided for every single region on the genome that something bad could happen. For every guide, Elevation returns hundreds to thousands of these off-target scores.

For researchers trying to determine which of potentially hundreds of guides to use for a given experiment, these individual off-target scores alone can be cumbersome, noted Listgarten.

The summary score is a single number that lumps the off-target scores together to provide an overview of how likely the guide is to disrupt the cell over all its potential off-targets.

“Instead of a probability for each point in the genome, it is what’s the probability I am going to mess up this cell because of all of the off-target activities of the guide?” said Listgarten.

End-to-end guide design

Writing in Nature Biomedical Engineering, the collaborators describe how Elevation works in concert with a tool they released in 2016 called Azimuth that predicts on-target effects.

The complementary tools provide researchers with an end-to-end system for designing experiments with the CRISPR-Cas9 system – helping researchers select a guide that achieves the intended effect – disabling a gene, for example – and reduce mistakes such as cutting the wrong gene.

“Our job,” said Fusi, “is to get people who work in molecular biology the best tools that we can.”

In addition to Listgarten, Fusi and Doench, project collaborators include Michael Weinstein from the University of California Los Angeles, Benjamin Kleinstiver, Keith Joung and Alexander A. Sousa from Harvard Medical School and Massachusetts General Hospital, and Melih Elibol, Luong Hoang, Jake Crawford and Kevin Gao from Microsoft Research.


John Roach writes about Microsoft research and innovation. Follow him on Twitter.

Tags: CRISPR, healthcare

Debugging data: Microsoft researchers look at ways to train AI systems to reflect the real world – The AI Blog

Photo of Microsoft researcher Hanna Walach
Hanna Wallach is a senior researcher in Microsoft’s New York City research lab. Photo by John Brecher.

Artificial intelligence is already helping people do things like type faster texts and take better pictures, and it’s increasingly being used to make even bigger decisions, such as who gets a new job and who goes to jail. That’s prompting researchers across Microsoft and throughout the machine learning community to ensure that the data used to develop AI systems reflect the real world, are safeguarded against unintended bias and handled in ways that are transparent and respectful of privacy and security.

Data is the food that fuels machine learning. It’s the representation of the world that is used to train machine learning models, explained Hanna Wallach, a senior researcher in Microsoft’s New York research lab. Wallach is a program co-chair of the Annual Conference on Neural Information Processing Systems from Dec. 4 to Dec. 9 in Long Beach, California. The conference, better known as “NIPS,” is expected to draw thousands of computer scientists from industry and academia to discuss machine learning – the branch of AI that focuses on systems that learn from data.

“We often talk about datasets as if they are these well-defined things with clear boundaries, but the reality is that as machine learning becomes more prevalent in society, datasets are increasingly taken from real-world scenarios, such as social processes, that don’t have clear boundaries,” said Wallach, who together with the other program co-chairs introduced a new subject area at NIPS on fairness, accountability and transparency. “When you are constructing or choosing a dataset, you have to ask, ‘Is this dataset representative of the population that I am trying to model?’”

Kate Crawford, a principal researcher at Microsoft’s New York research lab, calls it “the trouble with bias,” and it’s the central focus of an invited talk she will be giving at NIPS.

“The people who are collecting the datasets decide that, ‘Oh this represents what men and women do, or this represents all human actions or human faces.’ These are types of decisions that are made when we create what are called datasets,” she said. “What is interesting about training datasets is that they will always bear the marks of history, that history will be human, and it will always have the same kind of frailties and biases that humans have.”

Researchers are also looking at the separate but related issue of whether there is enough diversity among AI researchers. Research has shown that more diverse teams choose more diverse problems to work on and produce more innovative solutions. Two events co-located with NIPS will address this issue: The 12thWomen in Machine Learning Workshop, where Wallach, who co-founded Women in Machine Learning, will give an invited talk on the merger of machine learning with the social sciences, and the Black in AI workshop, which was co-founded by Timnit Gebru, a post-doctoral researcher at Microsoft’s New York lab.

“In some types of scientific disciplines, it doesn’t matter who finds the truth, there is just a particular truth to be found. AI is not exactly like that,” said Gebru. “We define what kinds of problems we want to solve as researchers. If we don’t have diversity in our set of researchers, we are at risk of solving a narrow set of problems that a few homogeneous groups of people think are important, and we are at risk of not addressing the problems that are faced by many people in the world.”

Timnit Gebru is a post-doctoral researcher at Microsoft’s New York City research lab. Photo by Peter DaSilva.

Machine learning core

At its core, NIPS is an academic conference with hundreds of papers that describe the development of machine learning models and the data used to train them.

Microsoft researchers authored or co-authored 43 accepted conference papers. They describe everything from the latest advances in retrieving data stored in synthetic DNA to a method for repeatedly collecting telemetry data from user devices without compromising user privacy.

Nearly every paper presented at NIPS over the past three decades considers data in some way, noted Wallach. “The difference in recent years, though,” she added, “is that machine learning no longer exists in a purely academic context, where people use synthetic or standard datasets. Rather, it’s something that affects all kinds of aspects of our lives.”

The application of machine-learning models to real-world problems and challenges is, in turn, bringing into focus issues of fairness, accountability and transparency.

“People are becoming more aware of the influence that algorithms have on their lives, determining everything from what news they read to what products they buy to whether or not they get a loan. It’s natural that as people become more aware, they grow more concerned about what these algorithms are actually doing and where they get their data,” said Jenn Wortman Vaughan, a senior researcher at Microsoft’s New York lab.

The trouble with bias

Data is not something that exists in the world as an object that everyone can see and recognize, explained Crawford. Rather, data is made. When scientists first began to catalog the history of the natural world, they recognized types of information as data, she noted. Today, scientists also see data as a construct of human history.

Crawford’s invited talk at NIPS will highlight examples of machine learning bias such as news organization ProPublica’s investigation that exposed bias against African-Americans in an algorithm used by courts and law enforcement to predict the tendency of convicted criminals to reoffend, and then discuss how to address such bias.

“We can’t simply boost a signal or tweak a convolutional neural network to resolve this issue,” she said. “We need to have a deeper sense of what is the history of structural inequity and bias in these systems.”

One method to address bias, according to Crawford, is to take what she calls a social system analysis approach to the conception, design, deployment and regulation of AI systems to think through all the possible effects of AI systems. She recently described the approach in a commentary for the journal Nature.

Crawford noted that this isn’t a challenge that computer scientists will solve alone. She is also a co-founder of the AI Now Institute, a first-of-its-kind interdisciplinary research institute based at New York University that was launched in November to bring together social scientists, computer scientists, lawyers, economists and engineers to study the social implications of AI, machine learning and algorithmic decision making.

Jenn Wortman Vaughan is a senior researcher at Microsoft’s New York City research lab. Photo by John Brecher.

Interpretable machine learning

One way to address concerns about AI and machine learning is to prioritize transparency by making AI systems easier for humans to interpret. At NIPS, Vaughan, one of the New York lab’s researchers, will give a talk describing a large-scale experiment that she and colleagues are running to learn what factors make machine learning models interpretable and understandable for non-machine learning experts.

“The idea here is to add more transparency to algorithmic predictions so that decision makers understand why a particular prediction is made,” said Vaughan.

For example, does the number of features or inputs to a model impact a person’s ability to catch instances where the model makes a mistake? Do people trust a model more when they can see how a model makes its prediction as opposed to when the model is a black box?

The research, said Vaughan, is a first step toward the development of “tools aimed at helping decision makers understand the data used to train their models and the inherent uncertainty in their models’ predictions.”

Patrice Simard, a distinguished engineer at Microsoft’s Redmond, Washington, research lab who is a co-organizer of the symposium, said the field of interpretable machine learning should take a cue from computer programming, where the art of decomposing problems into smaller problems with simple, understandable steps has been learned. “But in machine learning, we are completely behind. We don’t have the infrastructure,” he said.

To catch up, Simard advocates a shift to what he calls machine teaching – giving machines features to look for when solving a problem, rather than looking for patterns in mountains of data. Instead of training a machine learning model for car buying with millions of images of cars labeled as good or bad, teach a model about features such as fuel economy and crash-test safety, he explained.

The teaching strategy is deliberate, he added, and results in an interpretable hierarchy of concepts used to train machine learning models.

Researcher diversity

One step to safeguard against unintended bias creeping into AI systems is to encourage diversity in the field, noted Gebru, the co-organizer of the Black in AI workshop co-located with NIPS. “You want to make sure that the knowledge that people have of AI training is distributed around the world and across genders and ethnicities,” she said.

The importance of researcher diversity struck Wallach, the NIPS program co-chair, at her fourth NIPS conference in 2005. For the first time, she was sharing a hotel room with three roommates, all of them women. One of them was Vaughan, and the two of them, along with one of their roommates, co-founded the Women in Machine Learning group, which is now in its 12th year and has held a workshop co-located with NIPS since 2008. This year, more than 650 women are expected to attend.

Wallach will give an invited talk at the Women in Machine Learning Workshop about how she applies machine learning in the context of social science to measure unobservable theoretical constructs such as community membership or topics of discussion.

“Whenever you are working with data that is situated within society contexts,” she said, “necessarily it is important to think about questions of ethics, fairness, accountability, transparency and privacy.”


John Roach writes about Microsoft research and innovation. Follow him on Twitter.

Google bug bounty pays $100,000 for Chrome OS exploit

A pseudonymous security researcher has struck it big for the second time, earning the top Google bug bounty in the Chrome Reward Program.

The researcher, who goes by the handle Gzob Qq, notified Google of a Chrome OS exploit on Sept. 18, 2017 that took advantage of five separate vulnerabilities in order to gain root access for persistent code execution.

Google patched the issues in Chrome OS version 62, which was released on Nov. 15th. The details of the exploit chain were then released, showing Gzob Qq used five flaws to complete the system takeover.

As part of the exploit chain, Gzob Qq used a memory access flaw in the V8 JavaScript engine (CVE-2017-15401), a privilege escalation bug in PageState (CVE-2017-15402), a command injection flaw in the network_diag component (CVE-2017-15403) and symlink traversal issues in both the crash_reporter (CVE-2017-15404) and cryptohomed (CVE-2017-15405).

Gzob Qq earned a Google bug bounty of $100,000 for the find, which is the top prize awarded as part of the Chrome Reward Program. Google first increased the Chrome bug bounty reward from $50,000 to $100,000 in March 2015 and since then, this is the second time Gzob Qq has earned that prize.

In September 2016, Gzob Qq notified Google of a Chrome OS exploit chain using an overflow vulnerability in the DNS client library used by the Chrome OS network manager.

In addition to the Google bug bounty, Gzob Qq has also received credit for disclosing flaws in Ubuntu Linux.

AI wave rolls through Microsoft’s language translation technologies

Microsoft researcher Arul Menezes. (Photo by Dan DeLong.)

A fresh wave of artificial intelligence rolling through Microsoft’s language translation technologies is bringing more accurate speech recognition to more of the world’s languages and higher quality machine-powered translations to all 60 languages supported by Microsoft’s translation technologies.

The advances were announced at Microsoft Tech Summit Sydney in Australia on November 16.

“We’ve got a complex machine, and we’re innovating on all fronts,” said Olivier Fontana, the director of product strategy for Microsoft Translator, a platform for text and speech translation services. As the wave spreads, he added, these machine translation tools are allowing more people to grow businesses, build relationships and experience different cultures.

Microsoft’s research labs around the world are also building on top of these technologies to help people learn how to speak new languages, including a language learning application for non-native speakers of Chinese that also was announced at this week’s tech summit.

[embedded content]

Neural networks

The new Microsoft Translator advances build on last year’s switch to deep neural network-powered machine translations, which offer more fluent, human-sounding translations than the predecessor technology known as statistical machine translation.

Both methods involve training algorithms using professionally translated documents, so the system can learn how words and phrases in one language are represented in another language. The statistical method, however, is limited to translating a word within the local context of a few surrounding words, which can lead to clunky and stilted translations.

Neural networks are inspired by people’s theories about how the pattern-recognition process works in the brains of multilingual humans, leading to more natural-sounding translations.

Microsoft recently switched 10 more languages to neural network-based models for machine translation, for a total of 21. The neural network-powered translations show between 6 percent and 43 percent improvement in accuracy depending on language pairs, according to an automated evaluation metric for machine translation known as the bilingual evaluation understudy, or BLEU, score.

“Over the last year, we have been rolling out to more languages, we have been making the models more complex and deeper, so we have much better quality,” said Arul Menezes, general manager of the Microsoft AI and Research machine translation team. He added that the neural network-powered translations for Hindi and Chinese, two of the world’s most popular languages, are available by default to all developers using Microsoft’s translation services.

From left to right, Microsoft researchers Yan Xia, Jonathan Tien and Frank Soong. (Photo courtesy of Microsoft.)

Steps of the translation process

For a machine, the process of translating from one language to the next is broken down into several steps; each step has a stake in the quality of the translation. In the case of translating what a person speaks in one language, the first step is speech recognition, which is the process of converting spoken words into text.

All languages supported by Microsoft speech translation technologies now use a type of AI called long short-term memory for speech recognition, which together with additional data have led to an up to 29 percent increase in quality over deep neural network models for conversational speech.

“When you do speech translation, you first do speech recognition and then you do translation,” explained Menezes. “So, if you have an error in speech recognition, then that effect is going to be amplified at the next step because if you misrecognize a word, then the translation is going to be incomprehensible.”

The second step of machine translation converts the text from one language to the next, which Microsoft does with neural network-based models for 21 languages. The improvement in quality of translations is apparent even when only one of the languages is supported by a neural network-based model due to an approach that translates both languages through English.

Consider, for example, a person who wants to translate from Dutch to Catalan. Dutch is newly supported by neural networks; engineers are still working on the neural network support infrastructure for Catalan. End users will notice an improvement in the Dutch to Catalan translation using this hybrid approach because half of it is better, noted Menezes.

In the final step of speech translation, the translated text is synthesized into voice via text-to-speech synthesis technology. Here, too, speech and language researchers are making advances that produce more accurate and human-sounding synthetic voices. These improvements translate to higher quality experiences across Microsoft’s existing translation services as well as open the door to new language learning features.

Learn Chinese

For example, if you really want to learn to speak a foreign language, everyone knows that practice is essential. The challenge is to find someone with the time, patience and skill to help you practice pronunciation, vocabulary and grammar.

For people learning Chinese, Microsoft is aiming to fill that void with a new smartphone app that can act as an always available, artificially intelligent language-learning assistant. The free Learn Chinese app is launching soon on Apple’s iOS platform.

The app aims to solve a problem that is familiar to any langue learner who has spent countless hours in crowded classrooms listening to teachers, watching language-learning videos at home or flipping through stacks of flashcards to master vocabulary and grammar — only to feel woefully underprepared for real-world conversations with native speakers.

“You think you know Chinese, but if you meet a Chinese person and you want to speak Chinese, there is no way you can do it if you have not practiced,” explained Yan Xia, a senior development lead at Microsoft Research Asia in Beijing. “Our application addresses this issue by leveraging our speech technology.”

The application is akin to a teacher’s assistant, noted Frank Soong, principal researcher and research manager of the Beijing lab’s speech group, which developed the machine-learning models that power Learn Chinese as well as Xiaoying, a chatbot for learning English that the lab deployed in 2016 on the WeChat platform in China.

“Our application isn’t a replacement for good human teachers,” said Soong. “But it can assist by being available any time an individual has the desire or the time to practice.”

The language learning technology relies on a suite of AI tools such as deep neural networks that have been tuned by Soong’s group to recognize what the language learners are trying to say and evaluate the speakers’ pronunciation, rhythm and tone. They are based on a comparison with models trained on data from native speakers as well as the lab’s state-of-the art text-to-speech synthesis technology.

When individuals use the app, they get feedback in the form of scores, along with highlighted words that need improvement and links to sample audio to hear the proper pronunciation. “The app will work with you as a language learning partner,” said Xia. “It will respond to you and give you feedback based on what you are saying.”

Microsoft research Olivier Fontana. (Photo by Dan DeLong.)

Reaching more places

The Learn Chinese application and Microsoft’s core language translation services are powered by machine intelligence running in the cloud. This allows people the flexibility and convenience to access these services anywhere they have an internet connection, such as a bus stop, restaurant or conference center.

For clients with highly sensitive translation needs or who require translation services where internet connections are unavailable, Microsoft is now offering neural network powered translations for its on-premise servers. The development, Fontana noted, is one more example of how “the AI wave is advancing and reaching more and more places and more and more languages.”


John Roach writes about Microsoft research and innovation. Follow him on Twitter.

Tags: AI, Microsoft Translator

Google Buganizer flaw reveals unpatched vulnerability details

A security researcher uncovered several flaws in Google’s Issue Tracker thatexposed data regarding unpatched vulnerabilities listed in the database.

Google describes the Issue Tracker, more commonly known as the Buganizer, as a tool used internally to track bugs and feature requests in Google products. However, Alex Birsan, software developer and researcher, found three flaws in the Buganizer, the most severe of which allowed an elevation of privileges and exposed data on unpatched vulnerabilities.

The less critical issues Birsan found allowed him to essentially use a Buganizer issue ID as an official @Google.com email address — although he could not use this email to login to Google systems — and to get notifications for internal tickets to which he shouldn’t have had access. Those two flaws alone took Birsan about 16 hours of work and netted him a little more than $8,000 in bug bounty rewards, but then came the major issue.

Revealing Buganizer data

“When you visit the Issue Tracker as an external user, most of its functionality is stripped away, leaving you with extremely limited privileges,” Birsan wrote in a blog post. “If you want to see all the cool stuff Google employees can do, you can look for API endpoints in the JavaScript files. Some of these functions are disabled completely, others are simply hidden in the interface.”

Birsan found that Google’s Buganizer had a few issues in handling POST requests through the API.

“There was no explicit check that the current user actually had access to the issues specified in issueIds before attempting to perform the given action,” Birsan wrote. “If no errors occurred during the action, another part of the system assumed that the user had proper permissions. Thus, every single detail about the given issue ID would be returned in the HTTP response body.”

Birsan claimed he checked the issue a few times and “could see details about vulnerability reports, along with everything else hosted on the Buganizer. Even worse, I could exfiltrate data about multiple tickets in a single request, so monitoring all the internal activity in real time probably wouldn’t have triggered any rate limiters.”

Finding this flaw only took Birsan one hour, but it netted him $7,500 in reward. Birsan said he initially expected more because he thought the Buganizer issue was more severe, but he said the “impact would be minimized, because all the dangerous vulnerabilities get neutralized within the hour anyway.”

Proof-of-concept iOS exploit released by Google’s Project Zero

A security researcher for Google’s Project Zero team has released a proof-of-concept iOS exploit that takes advantage of another Broadcom Wi-Fi issue.

The vulnerability abused by Gal Beniamini, a security researcher for Google Project Zero based in Israel, was found in the same Broadcom BCM4355C0 Wi-Fi chips affected by the Broadpwn flaw, but is separate. Beniamini confirmed the Broadcom flaw (CVE-2017-11120) affects a range of devices, including the Samsung Galaxy S7 Edge and various Wi-Fi routers, but the exploit he released was specifically for the iPhone 7.

Beniamini wrote in his disclosure that the BCM4355C0 SoC with firmware version did not validate a specific field properly and an iOS exploit could allow code execution and more.

“The exploit gains code execution on the Wi-Fi firmware on the iPhone 7,” Beniamini wrote. “Upon successful execution of the exploit, a backdoor is inserted into the firmware, allowing remote read/write commands to be issued to the firmware via crafted action frames (thus allowing easy remote control over the Wi-Fi chip).”

However, Beniamini’s proof-of-concept iOS exploit requires knowledge of the MAC address of the target device, which may make using this attack in the wild more difficult.

Beniamini said his iOS exploit was tested against the Wi-Fi firmware in iOS 10.2 “but should work on all versions of iOS up to 10.3.3.”

Apple has patched against this iOS exploit in iOS 11 and Google patched the same Broadcom flaw in its September Security Update for Android. Users are urged to update if possible.

Network lateral movement from an attacker’s perspective

LOUISVILLE, KY. — A security researcher at DerbyCon 7.0 showed how an attacker will infiltrate, compromise and move laterally on an enterprise network, and why it benefits IT professionals to look at infosec from a threat actor’s perspective.

Ryan Nolette, security technologist at Sqrrl, based in Cambridge, Mass., said there are a number of different definitions for network lateral movement, but he prefers the MITRE definition which says network lateral movement is “a step in the process” of getting to the end goal of profit.

Nolette said there are a lot of different attacks that can all be part of network lateral movement, including compromising a shared web root — things running as the same permissions as the web server — using SQL injection, remote access tools and pass the hash attacks.

According to Nolette there are five key stages to the network lateral movement process: infection, compromise, reconnaissance, credential theft and lateral movement. This process will then repeat from the recon stage for each system as needed, but the network lateral movement stage is “where the attack gets really freaking exciting,” Nolette told the crowd.

“You’ve already mapped out where you want to go next. You have credentials that you can possibly use to log in to use other systems,” Nolette said. “Now, it’s time to make an engineer or IT admin cry because now you’re going to start moving across their environment.”

Demonstrating network lateral movement

Nolette walked through a demo attack and made sure he had some roadblocks to overcome. First, he ran a Meterpreter payload in Metasploit which would allow him to “run plugins, scripts, payloads, or start a local shell session against the victim” and used it to determine the user privileges of the victim machine.

Finding the privileges were limited, Nolette loaded a generic Windows User Access Controls bypass — which he noted was patched in the current version of Windows — to escalate privileges to admin level.

In a blog post expanding on the attack, Nolette said that once the attacker has access to a system with these privileges, the aim is to map the network and processes, learn naming conventions to identify targets and plan the next move, which is to recover hashes in order to steal login credentials.

With credentials, Nolette said he targets local users and domain users.

It’s time to make an engineer or IT admin cry because now you’re going to start moving across their environment.
Ryan Nolettesecurity technologist at Sqrrl

“The reason I want the local users is because in every single large corporation, IT has a backdoor local admin account that uses the same password across 10,000 systems,” Nolette told the DerbyCon audience. “For the record, [Group Policy Object] allows you to randomize that password for every system and stores it in [Active Directory], so there’s really no excuse anymore for this practice.”

Another way Nolette said attackers can find more privileged users is by looking at accounts that break the normal naming convention of the organization. For example, Nolette said if a username is initial.lastname but an attacker sees a name like master_a, that could be an indication it is a domain user with higher privileges.

When mapping the potential paths for network lateral movement, Nolette said attackers will look for specific open ports and use PsExec to run commands on remote systems — both tactics used in the recent WannaCry and NotPetya ransomware attacks.

“If you use PsExec, SpecOps hates you because that’s a legitimate tool used by IT and is constantly run throughout environments and being abused,” Nolette said. He suggested one good security practice was to use whitelisting software to only allow PsExec to be run by very specific IT user accounts. 

Understanding attacker network lateral movement

“In a lot of presentations you don’t get to see the offense side. All you get to see are the after-effects of what they did. They move laterally, great, now I have a new process on this system. But, what did they actually do in order to do that?” Nolette said. “If I figure out what the attacker is doing, I can try to move further up the attack chain and stop them there.”

Nolette said the value of threat hunting to him was not about finding a specific attack or method, but rather in validating a hypothesis about how threat actors may be abusing systems.

“I find that valuable because that’s a repeatable process. When you’re trying to sell to your upper management what you want to do, you always want to use business terms: return on investment, high value target, synergy,” Nolette said. “In order to be a successful security practitioner, you have to know why the business [cares]. Security is not a money-maker. It is always a cost center. How to change that view with the upper management is to show them return on investment. By spending a few hours looking at this stuff, I just saved us a few million dollars.”

Apache Struts vulnerability affects versions since 2008

A security researcher discovered an Apache Struts vulnerability that affects versions of the web application development framework going back to 2008.

Man Yue Mo, researcher at the open source software project LGTM.com run by software analytics firm Semmle, Inc., headquartered in San Francisco, disclosed the remotely executable Apache Struts vulnerability, which he said was “a result of unsafe deserialization in Java” and could lead to arbitrary code execution. Mo originally disclosed the issue to Apache on July 17, 2017.  

Mo publicly disclosed the Apache Struts vulnerability on Sept. 5 and the Apache Struts group released patches the same day, but by the morning of Sept. 6 Mo updated his post because “multiple working exploits [were] observed on various places on the internet.”

“I verified that it was a genuine remote code execution [RCE] vulnerability before reporting it to the Struts security team. They have been very quick and responsive in working out a solution even though it is a fairly non-trivial task that requires API changes,” Mo wrote in a blog post. “Due to the severity of this finding I will not disclose more details at this stage. Rather, I will update this blog post in a couple of weeks’ time with more information.”

Mo’s discovery is the latest in string of serious Apache Struts vulnerabilities that have been disclosed recently. In March, for example, an RCE vulnerability was patched after being actively exploited by attackers.

Boris Chen, vice president of engineering and co-founder of tCell, Inc., a web applications security company headquartered in San Francisco, said “serialization exploits resulting in RCE are one of the most serious yet underreported vulnerabilities that applications face today, and it doesn’t seem to be waning. For Apache Struts alone, this is the fourth RCE vulnerability this year.”

The newly discovered Apache Struts vulnerability is a stark reminder that while websites represent the front-line for most organizations, they can also become the front door for attackers.
Brian Robisonsenior director of security technology, Cylance Inc.

Michael Patterson, CEO of Plixer International, Inc., a network traffic analysis company based in Kennebunk, Me., said that this Apache Struts vulnerability “is a significant finding given that the majority of our largest companies are using Apache Struts.”  

“Although a patch for the vulnerability has since been released, given that many companies don’t stay on top of patches, there still could be plenty of time for malicious code writers to exploit it,” Patterson told SearchSecurity. “Most organizations are aware that there is absolutely no way to prevent being compromised.”

Brian Robison, senior director of security technology at Cylance Inc., said attacks like this are not new but should be a wake-up call.

“The newly discovered Apache Struts vulnerability is a stark reminder that while websites represent the front-line for most organizations, they can also become the front door for attackers. Many organizations develop layers of security to protect their public facing websites, but in some cases, those layers can’t stop something that looks like normal behavior,” Robinson told SearchSecurity. “No matter whether someone is using Apache, IIS or any other web server, it is critical that they keep up with patches and security feeds. A web server that is left idle while the company focuses on building the content can quickly become ground-zero for a wide spread attack.”

SHA-1 hashes recovered for 320M breached passwords

Even while one researcher attempted to highlight the need for better password security, other researchers found an opportunity to prove how easily SHA-1 hashes can be recovered.

Troy Hunt is well-known for running the website Have I Been Pwned (HIBP), which compiles data from data breaches in order to allow users to easily check if any of their passwords has been cracked. Hunt wanted to give users “more options” to search their passwords, and added an option to HIBP allowing users to search the SHA-1 hashes of passwords. This is what Hunt did for nearly 320 million passwords he added to HIBP recently.

Hunt admitted that using SHA-1 hashes was not the most secure path.

“What this means is that anyone using this data can take a plain text password from their end (for example during registration, password change or at login), hash it with SHA-1 and see if it’s previously been leaked,” Hunt wrote in a blog post. “It doesn’t matter that SHA-1 is a fast algorithm unsuitable for storing your customers’ passwords with because that’s not what we’re doing here, it’s simply about ensuring the source passwords are not immediately visible.”

Both “for research purposes and of course to satisfy [their] curiosity while using this opportunity as a challenge,” the CynoSure Prime password research collective attempted to recover the SHA-1 hashes used on the passwords dumped by Hunt.

“Out of the roughly 320 million hashes, we were able to recover all but 116 of the SHA-1 hashes, a roughly 99.9999% success rate,” CynoSure wrote in its analysis. “In addition, we attempted to take it a step further and resolve as many ‘nested’ hashes (hashes within hashes) as possible to their ultimate plaintext forms. Through the use of MDXfind [a proprietary hash finding utility] we were able to identify over 15 different algorithms in use across the pwned-passwords-1.0.txt and the successive update-1 and update-2 packages following that.”

Kyle Hanslovan, CEO of Huntress Labs, a cybersecurity managed services provider based in Baltimore, said “the techniques used by CynoSure Prime are not overly sophisticated, thus it’s extremely likely threat actors have done the same. What’s most impressive about Cynosure Prime’s research is the speed they normalized the data, cracked the hashes, and analyzed the results.”

Keep in mind that this data is associated with accounts that have already been potentially compromised, and is therefore somewhat dated.
Leigh-Anne Gallowaycyber security resilience lead, Positive Technologies

Rod Schultz, chief product officer at Rubicon Labs, provider of secure identity for IoT and based in San Francisco, noted the SHA-1 hashes had not been reversed and that SHA-1 has been deprecated due to “its vulnerability with collisions.”

“A hash algorithm maps information to what is supposed to be a unique fingerprint (a mapping space). When a mapping is found to not be unique, then we call this a collision, as two different pieces of information are now colliding in the mapped space,” Schultz told SearchSecurity via email. “It is possible to take a fingerprint and find the original password, reverse the mapping, but only with precomputed tables. This is what the researches have done, and it’s possible because SHA-1 has a mapping space that is no longer big enough.” 

Experts say SHA-1 hashes don’t increase risk

Gabriel Gumbs, vice president of product strategy at STEALTHbits Technologies, a data security software company based in Hawthorne, N.J., said in his own research over the years, he has found that many passwords released by Hunt “were already known about and exchanged on the dark web.”

“These were passwords that had already been compromised in real world breaches, so Troy’s call was not just a personal one, but one based on his best judgement having been in the information security field for some time,” Gumbs told SearchSecurity. “Any additional measures to obfuscate the information while allowing checks against the data continue to allow some to think that SHA-1 is still acceptable for use. Could Troy have done things differently? Possibly, but the outcome would not likely lead to a renewed conversation about password security.”

Leigh-Anne Galloway, cyber security resilience lead at Positive Technologies, an enterprise security company based in Framingham, Mass., said it’s important to remember that “none of the usernames associated with the passwords or the passwords themselves were released by crypto busters.”

“These only become an issue where they are associated with a username or other piece of personal information that could lead to a nefarious individual obtaining the username, which would still require some effort to reverse engineer,” Galloway told SearchSecurity. “In order to obtain the password for a user, you would need to know the salt (a random piece of data) used in the generation of the hash itself. Also keep in mind that this data is associated with accounts that have already been potentially compromised, and is therefore somewhat dated.”

Gumbs said SHA-1 has been known “since 2005 to not be a secure mechanism for hashing passwords any longer.”

“And since the first proof of concepts that found weaknesses in SHA-1, several more have been produced in the last decade,” Gumbs said. “There is no additional risk added by the researchers, in fact, they are likely doing more to bring awareness to the very real and existing risk of continuing to use the hashing function.”