Tag Archives: research

CodeTalk: Rethinking accessibility for IDEs

By Suresh Parthasarathy, Senior Research Developer; Gopal Srinivasa, Senior Research Software Development Engineer

CodeTalk team members from left to right include: Priyan Vaithilingam, Suresh Parthasarathy, Venkatesh Potluri, Manohar Swaminathan and Gopal Srinivasa from Microsoft Research India.

Software programming productivity tools known as integrated development environments, or IDEs, are supposed to be a game changer for Venkatesh Potluri, a research fellow in Microsoft’s India research lab. Potluri is a computer scientist who regularly needs to write code efficiently and accurately for his research in human computer interaction and accessibility. Instead, IDEs are one more source of frustration for Potluri: he is blind and unable to see the features that make IDEs a boon to the productivity of sighted programmers, such as squiggly red lines that automatically appear beneath potential code errors.

Potluri uses a screen reader to hear the code that he types. He scrolls back and forth through the computer screen to maintain context. But using a screen reader with an IDE is incomplete since much of the information from these systems is conveyed visually. For example, code is syntax highlighted in bright colors, errors are automatically highlighted with squiggles and the debugger uses several windows to provide the full context of a running program. Performance analysis tools use charts and graphs to highlight bottlenecks and architecture analysis tools use graphical models to show code structure.

“IDEs provide a lot of relevant information while writing code; a lot of this information — such as the current state of the program being debugged, real-time error alerts and code refactoring suggestions, are not announced to screen reader users,” Potluri said. “As a developer using a screen reader, the augmentation IDEs provide is not of high value to me.”

Soon after Venkatesh joined Microsoft Research India in early 2017, he and his colleagues Priyan Vaithilingam and Saqib Shaikh launched Project CodeTalk to increase the value of IDE’s for the community of blind and low vision users. According to a recent survey posted on the developer community website Stack Overflow, users who self-identify as blind or low vision make up one percent of the programmer population, which is higher than the 0.4 percent of people in the general population. Team members realized that while a lot of work had gone into making IDEs more accessible, the efforts had fallen short of meeting the needs of blind and low vision developers.

As a first step, the team explored their personal experiences with IDE technologies. Potluri, for example, detailed frustrations such as trying to fix one last bug before the end of a long day, listening carefully to the screen reader and concentrating hard to retain in his mind the structures of the code file only to have the screen reader go silent a few seconds after program execution. Uncertain if the program completed successfully or terminated with an exception, he has to take extra steps to recheck the program that keep him at work late into the night.

[embedded content]

The CodeTalk team also drew insights from a survey of blind and low vision developers that was led by senior researcher Manohar Swaminathan. The effort generated ideas for the development of an extension that improves the experience of the blind and low vision community of developers who use Microsoft’s Visual Studio, a popular IDE that supports multiple programming languages and is customizable. The CodeTalk extension and source code are now available on GitHub.

Highlights of the extension include the ability to quickly access code constructs and functions that lead to faster coding, learn the context of where the cursor is in the code, navigate through chunks of code with simple keystrokes and hear auditory cues when the code has errors and while debugging. The extension also introduces a novel concept of Talkpoints, which can be thought of as audio-based breakpoints.

Together, these features make debugging and syntax checking—two critical features of IDEs—far more accessible to blind and low vision developers, according to a study the CodeTalk team conducted with blind and low vision programmers. Real-time error information and talk points were particularly appreciated as significant productivity boosters. The team also began using the extension for their own development, and discovered that the features were useful for sighted users, as well.

CodeTalk is one step in a long journey of exploring ways to make IDEs more accessible. Research is ongoing to define and meet the needs of blind and low vision developers. The source code is available on GitHub and contributors are invited. The Visual Studio extension is available for download.

You can read more about this story on Microsoft’s Research Blog.

CodeTalk team members include Suresh Parthasarathy, Gopal Srinivasa, Priyan Vaithilingam, Manohar Swaminathan and Venkatesh Potluri from Microsoft Research India and Saqib Shaikh from Microsoft Research Cambridge.

Debugging data: Microsoft researchers look at ways to train AI systems to reflect the real world – The AI Blog

Photo of Microsoft researcher Hanna Walach
Hanna Wallach is a senior researcher in Microsoft’s New York City research lab. Photo by John Brecher.

Artificial intelligence is already helping people do things like type faster texts and take better pictures, and it’s increasingly being used to make even bigger decisions, such as who gets a new job and who goes to jail. That’s prompting researchers across Microsoft and throughout the machine learning community to ensure that the data used to develop AI systems reflect the real world, are safeguarded against unintended bias and handled in ways that are transparent and respectful of privacy and security.

Data is the food that fuels machine learning. It’s the representation of the world that is used to train machine learning models, explained Hanna Wallach, a senior researcher in Microsoft’s New York research lab. Wallach is a program co-chair of the Annual Conference on Neural Information Processing Systems from Dec. 4 to Dec. 9 in Long Beach, California. The conference, better known as “NIPS,” is expected to draw thousands of computer scientists from industry and academia to discuss machine learning – the branch of AI that focuses on systems that learn from data.

“We often talk about datasets as if they are these well-defined things with clear boundaries, but the reality is that as machine learning becomes more prevalent in society, datasets are increasingly taken from real-world scenarios, such as social processes, that don’t have clear boundaries,” said Wallach, who together with the other program co-chairs introduced a new subject area at NIPS on fairness, accountability and transparency. “When you are constructing or choosing a dataset, you have to ask, ‘Is this dataset representative of the population that I am trying to model?’”

Kate Crawford, a principal researcher at Microsoft’s New York research lab, calls it “the trouble with bias,” and it’s the central focus of an invited talk she will be giving at NIPS.

“The people who are collecting the datasets decide that, ‘Oh this represents what men and women do, or this represents all human actions or human faces.’ These are types of decisions that are made when we create what are called datasets,” she said. “What is interesting about training datasets is that they will always bear the marks of history, that history will be human, and it will always have the same kind of frailties and biases that humans have.”

Researchers are also looking at the separate but related issue of whether there is enough diversity among AI researchers. Research has shown that more diverse teams choose more diverse problems to work on and produce more innovative solutions. Two events co-located with NIPS will address this issue: The 12thWomen in Machine Learning Workshop, where Wallach, who co-founded Women in Machine Learning, will give an invited talk on the merger of machine learning with the social sciences, and the Black in AI workshop, which was co-founded by Timnit Gebru, a post-doctoral researcher at Microsoft’s New York lab.

“In some types of scientific disciplines, it doesn’t matter who finds the truth, there is just a particular truth to be found. AI is not exactly like that,” said Gebru. “We define what kinds of problems we want to solve as researchers. If we don’t have diversity in our set of researchers, we are at risk of solving a narrow set of problems that a few homogeneous groups of people think are important, and we are at risk of not addressing the problems that are faced by many people in the world.”

Timnit Gebru is a post-doctoral researcher at Microsoft’s New York City research lab. Photo by Peter DaSilva.

Machine learning core

At its core, NIPS is an academic conference with hundreds of papers that describe the development of machine learning models and the data used to train them.

Microsoft researchers authored or co-authored 43 accepted conference papers. They describe everything from the latest advances in retrieving data stored in synthetic DNA to a method for repeatedly collecting telemetry data from user devices without compromising user privacy.

Nearly every paper presented at NIPS over the past three decades considers data in some way, noted Wallach. “The difference in recent years, though,” she added, “is that machine learning no longer exists in a purely academic context, where people use synthetic or standard datasets. Rather, it’s something that affects all kinds of aspects of our lives.”

The application of machine-learning models to real-world problems and challenges is, in turn, bringing into focus issues of fairness, accountability and transparency.

“People are becoming more aware of the influence that algorithms have on their lives, determining everything from what news they read to what products they buy to whether or not they get a loan. It’s natural that as people become more aware, they grow more concerned about what these algorithms are actually doing and where they get their data,” said Jenn Wortman Vaughan, a senior researcher at Microsoft’s New York lab.

The trouble with bias

Data is not something that exists in the world as an object that everyone can see and recognize, explained Crawford. Rather, data is made. When scientists first began to catalog the history of the natural world, they recognized types of information as data, she noted. Today, scientists also see data as a construct of human history.

Crawford’s invited talk at NIPS will highlight examples of machine learning bias such as news organization ProPublica’s investigation that exposed bias against African-Americans in an algorithm used by courts and law enforcement to predict the tendency of convicted criminals to reoffend, and then discuss how to address such bias.

“We can’t simply boost a signal or tweak a convolutional neural network to resolve this issue,” she said. “We need to have a deeper sense of what is the history of structural inequity and bias in these systems.”

One method to address bias, according to Crawford, is to take what she calls a social system analysis approach to the conception, design, deployment and regulation of AI systems to think through all the possible effects of AI systems. She recently described the approach in a commentary for the journal Nature.

Crawford noted that this isn’t a challenge that computer scientists will solve alone. She is also a co-founder of the AI Now Institute, a first-of-its-kind interdisciplinary research institute based at New York University that was launched in November to bring together social scientists, computer scientists, lawyers, economists and engineers to study the social implications of AI, machine learning and algorithmic decision making.

Jenn Wortman Vaughan is a senior researcher at Microsoft’s New York City research lab. Photo by John Brecher.

Interpretable machine learning

One way to address concerns about AI and machine learning is to prioritize transparency by making AI systems easier for humans to interpret. At NIPS, Vaughan, one of the New York lab’s researchers, will give a talk describing a large-scale experiment that she and colleagues are running to learn what factors make machine learning models interpretable and understandable for non-machine learning experts.

“The idea here is to add more transparency to algorithmic predictions so that decision makers understand why a particular prediction is made,” said Vaughan.

For example, does the number of features or inputs to a model impact a person’s ability to catch instances where the model makes a mistake? Do people trust a model more when they can see how a model makes its prediction as opposed to when the model is a black box?

The research, said Vaughan, is a first step toward the development of “tools aimed at helping decision makers understand the data used to train their models and the inherent uncertainty in their models’ predictions.”

Patrice Simard, a distinguished engineer at Microsoft’s Redmond, Washington, research lab who is a co-organizer of the symposium, said the field of interpretable machine learning should take a cue from computer programming, where the art of decomposing problems into smaller problems with simple, understandable steps has been learned. “But in machine learning, we are completely behind. We don’t have the infrastructure,” he said.

To catch up, Simard advocates a shift to what he calls machine teaching – giving machines features to look for when solving a problem, rather than looking for patterns in mountains of data. Instead of training a machine learning model for car buying with millions of images of cars labeled as good or bad, teach a model about features such as fuel economy and crash-test safety, he explained.

The teaching strategy is deliberate, he added, and results in an interpretable hierarchy of concepts used to train machine learning models.

Researcher diversity

One step to safeguard against unintended bias creeping into AI systems is to encourage diversity in the field, noted Gebru, the co-organizer of the Black in AI workshop co-located with NIPS. “You want to make sure that the knowledge that people have of AI training is distributed around the world and across genders and ethnicities,” she said.

The importance of researcher diversity struck Wallach, the NIPS program co-chair, at her fourth NIPS conference in 2005. For the first time, she was sharing a hotel room with three roommates, all of them women. One of them was Vaughan, and the two of them, along with one of their roommates, co-founded the Women in Machine Learning group, which is now in its 12th year and has held a workshop co-located with NIPS since 2008. This year, more than 650 women are expected to attend.

Wallach will give an invited talk at the Women in Machine Learning Workshop about how she applies machine learning in the context of social science to measure unobservable theoretical constructs such as community membership or topics of discussion.

“Whenever you are working with data that is situated within society contexts,” she said, “necessarily it is important to think about questions of ethics, fairness, accountability, transparency and privacy.”


John Roach writes about Microsoft research and innovation. Follow him on Twitter.

Multiple Intel firmware vulnerabilities in Management Engine

New research has uncovered five Intel firmware vulnerabilities related to the controversial Management Engine, leading one expert to question why the Intel ME cannot be disabled.

The research that led to finding the Intel firmware vulnerabilities was undertaken “in response to issues identified by external researchers,” according to Intel. This likely refers to a flaw in Intel Active Management Technology — part of the Intel ME — found in May 2017 and a supposed Intel ME kill switch found in September. Due to issues like these, Intel “performed an in-depth comprehensive security review of our Intel Management Engine (ME), Intel Server Platform Services (SPS), and Intel Trusted Execution Engine (TXE) with the objective of enhancing firmware resilience.”

In a post detailing the Intel firmware vulnerabilities, Intel said the flaws could allow an attacker to gain unauthorized access to a system, impersonate the ME/SPS/TXE, execute arbitrary code or cause a system crash.

Mark Ermolov and Maxim Goryachy, researchers at Positive Technologies Research, an enterprise security company based in Framingham, Mass., were credited with finding three Intel firmware vulnerabilities, one in each of Intel ME, SPS and TXE.

“Intel ME is at the heart of a vast number of devices worldwide, which is why we felt it important to assess its security status. It sits deep below the OS and has visibility of a range of data, everything from information on the hard drive to the microphone and USB,” Goryachy told SearchSecurity. “Given this privileged level of access, a hacker with malicious intent could also use it to attack a target below the radar of traditional software-based countermeasures such as anti-virus.”

How dangerous are Intel ME vulnerabilities

The Intel ME has been a controversial feature because of the highly-privileged level of access it has and the fact that it can continue to run even when the system is powered off. Some have even suggested it could be used as a backdoor to any systems running on Intel hardware.

Tod Beardsley, research director at Rapid7, said that given Intel ME’s “uniquely sensitive position on the network,” he’s happy the security review was done, but he had reservations.

Controlling privilege isn’t difficult to do, but it is key to securing systems.
James Maudesenior security engineer, Avecto

“It is frustrating that it’s difficult to impossible to completely disable this particular management application, even in sites where it’s entirely unused. The act of disabling it tends to require actually touching a keyboard connected to the affected machine,” Beardsley told SearchSecurity. “This doesn’t lend itself well to automation, which is a bummer for sites that have hundreds of affected devices whirring away in far-flung data centers. It’s also difficult to actually get a hold of firmware to fix these things for many affected IoT devices.”

James Maude, senior security engineer at Avecto Limited, an endpoint security software company based in the U.K., said that the Intel firmware vulnerabilities highlight the importance of controlling user privileges because some of the flaws require higher access to exploit.

“From hardware to software, admin accounts with wide-ranging privilege rights present a large attack surface. The fact that these critical security gaps have appeared in hardware that can be found in almost every organization globally demonstrates that all businesses need to bear this in mind,” Maude told SearchSecurity. “Controlling privilege isn’t difficult to do, but it is key to securing systems. It’s time for both enterprises and individual users to realize that they can’t rely solely on inbuilt security — they must also have robust security procedures in place.”

However, Beardsley noted all of the firmware vulnerabilities across the Intel products require physical access to the machine in order to exploit.

“For the majority of issues that require local access, the best advice is simply not to allow untrusted users physical access to the affected systems,” Beardsley said. “This is pretty easy for server farms, but can get trickier for things like point-of-sale systems, kiosks, and other computing objects where low-level employees or the public are expected to touch the machines. That said, it’s nothing a little epoxy in the USB port can’t solve.”

AI’s sharing economy: Microsoft creates publicly available datasets

From left, Adam Atkinson of Microsoft Research Maluuba, Yoshua Bengio of University of Montreal and Samira Ebrahimi Kahou of Microsoft Research Maluuba are among the AI experts who worked on the FigureQA dataset. Photo courtesy of Microsoft Research Maluuba.

Samira Ebrahimi Kahou and her colleagues at Microsoft Research Maluuba recently set out to solve an interesting research problem: How could they use artificial intelligence to correctly reason about information found in graphs and pie charts?

One big obstacle, they discovered, was that the research area was so new that there weren’t any existing datasets available for them to test their hypotheses.

So, they made one.

The FigureQA dataset, which the team released publicly earlier this fall, is one of a number of datasets, metrics and other tools for testing AI systems that Microsoft researchers and engineers have created and shared in recent years. Researchers all over the world use them to see how well their AI systems do at everything from translating conversational speech to predicting the next word a person may want to type.

The teams say these tools provide a codified way for everyone from academic researchers to industry experts to test their systems, compare their work and learn from each other.

“It clarifies our goals, and then others in the research community can say, ‘OK, I see where you’re going,’” said Rangan Majumder, a partner group program manager within Microsoft’s Bing division who also leads development of the MS MARCO machine reading comprehension dataset. The year-old dataset is getting an update in the next few months.

For people used to the technology industry’s more traditional way of doing things, that kind of information sharing can seem surprising. But in the field of AI, where academics and industry players are very much intertwined, researchers say this type of openness is becoming more common.

“Traditionally, companies have kept their research in-house. Now, we’re really seeing an industrywide impact where almost every company is publishing papers and trying to move the state of the art forward, rather than moving it into a walled garden,” said Rahul Mehrotra, a program manager at Microsoft’s Montreal-based Maluuba lab, which also has released two other datasets, NewsQA and Frames, in the past year.

Many AI experts say that this more collaborative culture is crucial to advancing the field of AI. They note that many of the early breakthroughs in the field were the result of researchers from competing institutions sharing knowledge and building on each other’s work.

“We can’t have all the ideas on the planet, so if someone else has a great idea and wants to try it out, we can give them a dataset to do that,” said Christian Federmann, a senior program manager with the Microsoft Translator team.

Federmann’s team developed the Microsoft Speech Language Translation Corpus so they and others could test bilingual conversational speech translation systems such as the Microsoft Translator live feature and Skype Translator. The corpus was recently updated with additional language pairs.

Federmann also notes that Microsoft is one of the few big players that has the budget and resources to create high-quality tools and datasets that allow the industry to compare its work.

That’s key to creating the kind of benchmarks that people can use to credibly showcase their achievements. For example, the recent milestones in conversational speech recognition are based on results of the Switchboard corpus.

Rangan Majumder, a partner group program manager within Microsoft’s Bing division, leads development of the MS MARCO machine reading comprehension dataset

Paying it forward

Many of the teams that are developing datasets and other metrics say they are, in a sense, paying it forward because they also rely on datasets that others have created.

When they were a small startup, Mehrotra said Maluuba relied heavily on a Microsoft dataset called MCTest. Now, as part of Microsoft, they’ve been pleased to see that the datasets they are creating are being used by others in the field.

Devi Parikh, an assistant professor at Georgia Tech and research scientist at Facebook AI Research, said the FigureQA dataset Maluuba recently released is helpful because it allows researchers like herself to work on problems that require the use of multiple types of AI. To accurately read a graphic and answer a question about it requires both computer vision and natural language processing.

“From a research perspective, I think there’s more and more interest in working on problems that are at the intersection of subfields of AI,” she said.

Still, researchers and engineers working in the AI field say that while some information sharing is valuable, there are also times when competing researchers want to be able to compare their systems without revealing all the information about the data they are using.

Doug Orr, a senior software engineering lead with SwiftKey, which Microsoft acquired last year, said his team wanted to create a standard way for measuring how good a job a system does at predicting what a person will type next. That’s a key component of SwiftKey’s systems, which offer personalized predictions based on a person’s communications style.

Instead of sharing a dataset, the team created a set of metrics that researchers can use with any dataset. The metrics, which are available on GitHub, allow researchers to have standardized benchmarks with which they can measure their own improvement and compare their results to others, without having to share proprietary data.

Orr said the metrics have benefited the team internally because they have a better sense of how much their systems are improving over time, and it allowed everyone in the field to be more transparent about how they are performing against each other.

Majumder, from the Bing team, says his team sees value in testing their systems with any and all available benchmarks, including internal data they don’t share publicly, datasets they build for public use and ones that others create, such as the SQuAD dataset.

When people join his team from other areas of the company, he says they often have to get used to the fact that they are entering a hybrid area where the team is developing products while also making AI research breakthroughs.

In the field of AI, he says, “what we have is somewhere in between engineering and science.”


Allison Linn is a senior writer at Microsoft. Follow her on Twitter.

Tags: AI, Big Data

Neural fuzzing: applying DNN to software security testing

William Blum, Principal Research Engineering Lead. (Photography by Scott Eklund/Red Box Pictures)

Microsoft researchers have developed a new method for discovering software security vulnerabilities that uses machine learning and deep neural networks to help the system root out bugs better by learning from past experience. This new research project, called neural fuzzing, is designed to augment traditional fuzzing techniques, and early experiments have demonstrated promising results.

Software security testing is a hard task that is traditionally done by security experts through costly and targeted code audits, or by using very specialized and complex security tools to detect and assess vulnerabilities in code. We recently released a tool, called Microsoft Security Risk Detection, that significantly simplifies security testing and does not require you to be an expert in security in order to root out software bugs. The Azure-based tool is available to Windows users and in preview for Linux users.

Fuzz testing
The key technology underpinning Microsoft Security Risk Detection is fuzz testing, or fuzzing. It’s a program analysis technique that looks for inputs causing error conditions that have a high chance of being exploitable, such as buffer overflows, memory access violations and null pointer dereferences.

Fuzzers come in different categories:

  • Blackbox fuzzers, also called “dumb fuzzers,” rely solely on the sample input files to generate new inputs.
  • Whitebox fuzzers analyze the target program either statically or dynamically to guide the search for new inputs aimed at exploring as many code paths as possible.
  • Greybox fuzzers, just like blackbox fuzzers, don’t have any knowledge of the structure of the target program, but make use of a feedback loop to guide their search based on observed behavior from previous executions of the program.

Figure 1 – Crashes reported by AFL. Experimental support in MSRD

Neural fuzzing
Earlier this year, Microsoft researchers including myself, Rishabh Singh, and Mohit Rajpal, began a research project looking at ways to improve fuzzing techniques using machine learning and deep neural networks. Specifically, we wanted to see what a machine learning model could learn if we were to insert a deep neural network into the feedback loop of a greybox fuzzer.

For our initial experiment, we looked at whether we could learn over time by observing past fuzzing iterations of an existing fuzzer.

We applied our methods to a type of greybox fuzzer called American fuzzy lop, or AFL.

We tried four different types of neural networks and ran the experiment on four target programs, using parsers for four different file formats: ELF, PDF, PNG, XML.

The results were very encouraging—we saw significant improvements over traditional AFL in terms of code coverage, unique code paths and crashes for the four input formats.

  • The AFL system using deep neural networks based on the Long short-term memory (LSTM) neural network model gives around 10 percent improvement in code coverage over traditional AFL for two files parsers: ELF and PNG.
  • When looking at unique code paths, neural AFL discovered more unique paths than traditional AFL for all parsers except PDF. For the PNG parser, after 24 hours of fuzzing it found twice as many unique code paths as traditional AFL.

Figure 2 – Input gain over time (in hours) for the libpng file parser.

  • A good way to evaluate fuzzers is to compare the number of crashes reported. For the ELF file parser, neural AFL reported more than 20 crashes whereas traditional AFL did not report any. This is astonishing given that neural AFL was trained on AFL itself. We also observed more crashes being reported for text-based file formats like XML, where neural AFL could find 38 percent more crashes than traditional AFL. For PDF, traditional AFL did overall better than neural AFL in terms of new code paths found. However, neither system reported any crashes.

Figure 3 – Reported crashes over time (in hours) for readelf (left) and libxml (right).

Overall, using neural fuzzing outperformed traditional AFL in every instance except the PDF case, where we suspect the large size of the PDF files incurs noticeable overhead when querying the neural model.

In general, we believe our neural fuzzing approach yields a novel way to perform greybox fuzzing that is simple, efficient and generic.

  • Simple: The search is not based on sophisticated hand-crafted heuristics — the system learns a strategy from an existing fuzzer. We just give it sequences of bytes and let it figure out all sorts of features and automatically generalize from them to predict which types of inputs are more important than others and where the fuzzer’s attention should be focused.
  • Efficient: In our AFL experiment, in the first 24 hours we explored significantly more unique code paths than traditional AFL. For some parsers we even report crashes not already reported by AFL.
  • Generic: Although we’ve tested it only on AFL, our approach could be applied to any fuzzer, including blackbox and random fuzzers.

We believe our neural fuzzing research project is just scratching the surface of what can be achieved using deep neural networks for fuzzing. Right now, our model only learns fuzzing locations, but we could also use it to learn other fuzzing parameters such as the type of mutation or strategy to apply. We are also considering online versions of our machine learning model, in which the fuzzer constantly learns from ongoing fuzzing iterations.

William Blum leads the engineering team for Microsoft Security Risk Detection.


Bad Rabbit ransomware data recovery may be possible

Two different security research firms uncovered important information about the Bad Rabbit ransomware attacks, including the motives and a possible way to recover data without paying.

A threat research team from FireEye found a connection between the Bad Rabbit ransomware and “Backswing,” which FireEye described as a “malicious JavaScript profiling framework.” According to the researchers, Backswing has been seen in use in the wild since September 2016 and recently some sites harboring the framework were redirecting to Bad Rabbit distribution URLs.

“Malicious profilers allow attackers to obtain more information about potential victims before deploying payloads (in this case, the Bad Rabbit ‘flash update’ dropper),” FireEye researchers wrote. “The distribution of sites compromised with Backswing suggest a motivation other than financial gain. FireEye observed this framework on compromised Turkish sites and Montenegrin sites over the past year. We observed a spike of Backswing instances on Ukrainian sites, with a significant increase in May 2017. While some sites hosting Backswing do not have a clear strategic link, the pattern of deployment raises the possibility of a strategic sponsor with specific regional interests.”

Researchers added that using Backswing to gather information on targets and the growing number of malicious websites containing the framework could point to “a considerable footprint the actors could leverage in future attacks.”

Bad Rabbit ransomware recovery

Meanwhile, researchers from Kaspersky Lab discovered flaws in the Bad Rabbit ransomware that could give victims a chance to recover encrypted data without paying the ransom.

The Kaspersky team wrote in a blog post that early reports that the Bad Rabbit ransomware leaked the encryption key were false, but the team did find a flaw in the code where the malware doesn’t wipe the generated password from memory, leaving a slim chance to extract it before the process terminates.

However, the team also detailed an easier way to potentially recover files.

“We have discovered that Bad Rabbit does not delete shadow copies after encrypting the victim’s files,” Kaspersky researchers wrote. “It means that if the shadow copies had been enabled prior to infection and if the full disk encryption did not occur for some reason, then the victim can restore the original versions of the encrypted files by the means of the standard Windows mechanism or 3rd-party utilities.”

MakeCode for Minecraft makes learning to code super fun

A few years ago, my group in Microsoft’s research organization began to experiment with tools that make it possible for kids to learn how to code in the context of Minecraft, the wildly popular game where players build fantastical virtual worlds out of digital blocks, create and play mini-games within the game, and learn to survive monster-filled nights.

Confused? That’s okay. Many grownups don’t understand Minecraft. Even if they think they do, they don’t. That no rules, open-world environment is all part of its appeal. Our goal is to leverage this enthusiasm to teach kids how to code while playing Minecraft. After all, game playing is the most natural way for humans to learn.

The research is an outgrowth of our TouchDevelop program, which we started in 2011 to teach people how to program and build apps using the touchscreen on their phones. These devices are much more powerful, graphic and sensor rich computers than those we learned to code on as kids. Our TouchDevelop group wanted anyone to be able to program their phones as easily as we did 8-bit computers.

Then Minecraft emerged as the game people everywhere were playing and we found ourselves wanting to code inside Minecraft, too. The rest, as they say, is history.

Students in my after-school computer science classes lucky enough to tinker with coding in Minecraft went nutso crazy, in a good way. The ability to write code and immediately see the results in Minecraft, such as avatars that can jump 100 blocks high, dig through mountains and make it rain chickens, sent my students running around the classroom from screen to screen to see what their classmates did and shouting the IP addresses of their servers across the room.

Today, our Microsoft Research and Microsoft MakeCode teams are excited to make this learning experience widely available through Microsoft MakeCode for Minecraft on Windows 10.

The MakeCode for Minecraft editor has the pixelated look and feel of Minecraft. MakeCode allows coding with visual blocks, based on a drag and drop interface for beginners, as well as in text with a JavaScript interface for the more experienced learners.

Coding with blocks or text, MakeCode teaches the 101 of programming languages, including variables, control flow, if statements, loops and functions. More advanced users smoothly ramp up to more complex concepts such as recursion, fractals and object oriented or distributed programming.

The Microsoft MakeCode team also works on other editors that allow the programming of physical things such as micro-controllers including the micro:bit and Adafruit Circuit Playground Express. In all these scenarios, the coding is directly linked to building something real, which is the primary reason most computer programmers learn to code in the first place.

Instead of thinking they are coding, students are playing a game, they are building their next superpower. Minecraft is a game. MakeCode for Minecraft fits the coding experience into the game itself. Check it out.


DEFCON hopes voting machine hacking can secure systems

A new report pushes recommendations based on the research done into voting machine hacking at DEFCON 25, including basic cybersecurity guidelines, collaboration with local officials and an offer of free voting machine penetration testing.

It took less than an hour for hackers to break into the first voting machine at the DEFCON conference in July. This week, DEFCON organizers released a new report that details the results from the Voting Village and the steps needed to ensure election security in the future.

Douglas Lute, former U.S. ambassador to NATO and retired U.S. Army lieutenant general, wrote in the report that “last year’s attack on America’s voting process is as serious a threat to our democracy as any I have ever seen in the last 40+ years – potentially more serious than any physical attack on our Nation.”

“Loss of life and damage to property are tragic, but we are resilient and can recover. Losing confidence in the security of our voting process — the fundamental link between the American people and our government — could be much more damaging,” Lute wrote. “In short, this is a serious national security issue that strikes at the core of our democracy.”

In an effort to reduce the risks from voting machine hacking, DEFCON itself will be focusing more on the election systems. Jeff Moss, founder of DEFCON, said during a press conference for the report that access to voting machines is still a major hurdle.

“The part that’s really hard to get our hands on is the back-end software that ties the voting machines together — to tabulate, to accumulate votes, to provision a voting ballot, to run the election, to figure out a winner — and boy we really want to have a complete voting system to attack, so people can attack the network, they can attack the physical machines, they can go after the databases” Moss said. “This is the mind-boggling part: just as this is the first time this is really being done — no NDAs — there’s never been a test of a complete system. We want a full end-to-end system so it’s one less thing people can argue about. We can say, ‘See? We did it here too.'”

DEFCON had obtained the voting machines tested at the 2017 conference from second-hand markets, like eBay, but hopes to have more cooperation from election officials and the companies that make the voting equipment. Moss said it is still unclear what exactly DEFCON will be allowed to do in 2018 because the DMCA exemption that allows voting machine hacking currently needs to be renewed.

Immediate voting machine security

DEFCON officials noted that election security needs to be improved before the 2018 DEFCON conference so local officials can prepare for the 2018 mid-term elections.

John Gilligan, board chair and interim CEO for the Center for Internet Security (CIS), said his organization was working “to take the elections ecosystem and to develop a handbook of best practices” around election security. CIS has invited DHS, NIST, the Election Assistance Commission, the National Association of Secretaries of State and other election officials to collaborate on the process.

“We have 400 or 500 people who currently collaborate with us, but we’re going to expand that horizon a bit because there are those who have specific expertise in election systems. The view is: let’s get together and very quickly — by the end of this calendar year — produce a set of best practices that will be given to the state and local governments,” Gilligan said in a news conference on Tuesday. “Our effort will complement what the Election Assistance Commission is developing presently with NIST.”

Jake Braun, cybersecurity lecturer at the University of Chicago and CEO of private equity firm Cambridge Global, headquartered in Washington, said the DEFCON team would provide free voting machine pen testing to any election officials that want the help.

The only way you can see if the machine was hacked is if the attacker wanted to be found. That’s the sad truth. It can be done without leaving a trace.
Harri Hurstifounding partner at Nordic Innovations Lab

“If you’re an election official, the thing you can do coming out of this is to contact DEFCON and offer to give out your schemes, your databases, give access to whatever else you want tested. This is essentially free testing and training for your staff, and that would normally cost you millions of dollars to purchase on your own.”

Moss said the industry fear of hackers is common, but urged that the team only wanted to help.

“This is the first scrutiny the manufacturers have had and they don’t know what to do. And that’s a pretty routine response. We saw that from the medical device world, car world, access control, ATMs,” Moss said. “When these industries first come into contact with hackers and people who are giving an honest opinion of their technology, they pull back and hide for a while. If you’re doing a good job, we’ll tell you, ‘Hey, that’s awesome.’ And, if you’re doing a poor job, we’ll say, ‘Can you please fix that?’ But the best part is it’s free. You’re getting some of the world’s best hackers doing pro bono work, giving away reports for free — normally these people make thousands of dollars a day — and they’re doing it just because they want to see what’s possible.”

The DEFCON voting machine hacking report noted a number of misconceptions surrounding the security of elections, but Harri Hursti, founding partner at Nordic Innovations Lab, said one of the biggest issues was the idea that there had “never been a documented incident where votes have been changed during a real election.”

“These machines don’t have the capability of providing you forensic evidence to see that. They cannot prove they are honest; they cannot prove they were not hacked. They simply don’t have the fundamental, basic capabilities of providing you that data,” Hursti said in the press conference. “The only way you can see if the machine was hacked is if the attacker wanted to be found. That’s the sad truth. It can be done without leaving a trace.”

Opportunities abound in the creative and collaborative culture of STEM

Jennifer Chayes, Distinguished Scientist and Managing Director, Microsoft Research New England & New York City

By Jennifer Chayes, Technical Fellow and Managing Director, Microsoft Research New England & New York City 

The explosion of data available today everywhere from biomedicine to the arts is opening new opportunities for researchers with backgrounds in science, technology, engineering and math to pursue creative and collaborative endeavors that have deep societal impact.

My research has always been interdisciplinary, which by nature is collaborative and creative. You need experts from different fields and you must make scientific leaps to bring the perspectives, results and methodology from one discipline to another. I encourage the scientists who work with me to trust their scientific intuition and make those leaps. Of course, we must also do the hard work to fill in the gaps after we leap. The work is deeply satisfying and delivers genuine societal impact.

The explosion of biomedical data, for example, is generating opportunities for researchers in STEM fields to make tremendous contributions to goals such as increasing the longevity and quality of life.

For example, a researcher in one of my labs worked in collaboration with physicians to develop a reinforcement-learning algorithm for a tool that provides personalized exercise incentives for diabetics. Different patients react differently to specific incentives at specific times. A text message saying that you are exercising less than 90 percent of other patients might cause one patient to exercise more and another to give up exercise altogether. The reinforcement learning algorithm produces the right messages for the right patients at the right time. These personalized messages significantly improve most patients’ exercise regimes and, in some cases, decrease the need for diabetes medication. This reinforcement learning algorithm is now being applied more widely.

My labs are also involved in a project that aims to predict which cancer patients are good candidates for specific cancer immunotherapy drugs. These drugs enhance the ability of our immune systems to go after cancer cells. The approach is more targeted than chemotherapy or radiation, more effective at eliminating cancer cells and less damaging to non-cancerous tissue. Cancer immunotherapy is a new field with more questions than answers. The organization Stand Up to Cancer is sponsoring about ten researchers from my labs to work on a host of projects with biologists and oncologists from many top biomedical institutions to answer these questions.

For example, individuals differ not only in their genomes, but also in the makeup of their immune cells. Two people who are genetically quite similar can develop vastly different repertoires of T-cells, which are white blood cells that scan for abnormalities and infections. An interplay of our genomes, our environments and our T-cell repertoires determines how well we might react to a specific cancer immunotherapy drug. This interplay leads to a high-dimensional sparse-statistics problem that new techniques in machine learning and statistics can help solve.

A new field forming at the boundary of AI and ethics is another area where STEM researchers are working collaboratively and creatively to generate substantial societal impact. Researchers are developing machine-learning models, for example, to de-bias data sets and make decisions that are fairer than those made by humans. Computer scientists, ethicists, lawyers and social scientists are collaborating to formulate notions of fairness, accountability and transparency. This work will improve the fairness of search engine outputs as well as job placement, school admission and legal decisions, for example.

Biomedicine and ethical data-driven decision-making are just two fields where the explosion of data is opening opportunities for STEM researchers to generate impact. Opportunities for impact are everywhere, from the social sciences to the arts. The keys to success are to follow your passion, develop the confidence to let your scientific intuition lead the way and then do the work to realize your vision.