Tag Archives: researcher

Playing to the crowd and other social media mandates with Dr. Nancy Baym – Microsoft Research

Nancy Baym

Dr. Nancy Baym, Principal Researcher from Microsoft Research

Episode 41, September 12, 2018

Dr. Nancy Baym is a communication scholar, a Principal Researcher in MSR’s Cambridge, Massachusetts, lab, and something of a cyberculture maven. She’s spent nearly three decades studying how people use communication technologies in their everyday relationships and written several books on the subject. The big take away? Communication technologies may have changed drastically over the years, but human communication itself? Not so much.

Today, Dr. Baym shares her insights on a host of topics ranging from the arduous maintenance requirements of social media, to the dialectic tension between connection and privacy, to the funhouse mirror nature of emerging technologies. She also talks about her new book, Playing to the Crowd: Musicians, Audiences and the Intimate Work of Connection, which explores how the internet transformed – for better and worse – the relationship between artists and their fans.



Nancy Baym: It’s not just that it’s work, it’s that it’s work that never, ever ends. Because your phone is in your pocket, right? So, you’re sitting at home on a Sunday morning, having a cup of coffee and even if you don’t do it, there’s always the possibility of, “Oh, I could Tweet this out to my followers right now. I could turn this into an Instagram story.” So, the possibility of converting even your most private, intimate moments into fodder for your work life is always there, now.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: Dr. Nancy Baym is a communication scholar, a Principal Researcher in MSR’s Cambridge, Massachusetts, lab, and something of a cyberculture maven. She’s spent nearly three decades studying how people use communication technologies in their everyday relationships and written several books on the subject. The big take away? Communication technologies may have changed drastically over the years, but human communication itself? Not so much.

Today, Dr. Baym shares her insights on a host of topics ranging from the arduous maintenance requirements of social media, to the dialectic tension between connection and privacy, to the funhouse mirror nature of emerging technologies. She also talks about her new book, Playing to the Crowd: Musicians, Audiences and the Intimate Work of Connection, which explores how the internet transformed – for better and worse – the relationship between artists and their fans. That and much more on this episode of the Microsoft Research Podcast.

Host: Nancy Baym, welcome to the podcast.

Nancy Baym: Nice to be here.

Host: So, you’re a principle researcher at the MSR lab in Cambridge, Massachusetts, not to be confused with the one in Cambridge, England. Give our listeners an overview of the work that goes on in New England and of your work in particular. What are the big issues you’re looking at? Why is the work important? Basically, what gets you up in the morning?

Nancy Baym: So, the lab in New England is one of Microsoft’s smaller researcher labs. We’re very interdisciplinary, so, we have people in my basic area which is social media and social issues around technology from humanistic and social scientific perspectives. And we have that alongside people working on machine learning and artificial intelligence, people working on economics, people working on cryptography, people working on math and complexity theory, people doing algorithmic game theory, and then we also have a bioinformatics and medicine component to this program also. So, we’re really interested in getting people from very different perspectives together and listening to each other and seeing what kinds of new ideas get sparked when you get people from radically different disciplines together in the same environment and you give them long periods of time to get to know one another and get exposed to the kinds of work that they do. So, that’s the lab as a whole. My group is… we call ourselves the Social Media Collective, which is a, sort of, informal name for it. It’s not an official title but it’s sort of an affectionate one. There are three core people here in our New England lab, and then, which would be me, Mary Gray and Tarleton Gillespie, and then we have a postdoc and we have, in the summer, PhD interns, we have a research assistant, and we’re all interested in questions around how people use technologies, the kinds of work that people do through technologies, the kinds of work that technologies create for people, and the ways that that affects them, their identities, their relationships, their communities, societies as a whole.

Host: You know, as you talk about the types of researchers that you have there, I wonder, is New England unique among the labs at Microsoft?

Nancy Baym: I think we are, in that we are more interdisciplinary than many of them. I mean our Redmond lab, obviously, has got people from a huge range of disciplines, but it’s also got a huge number of people, whereas we’re a much smaller group. We’re on one floor of a building and there are, you know, anywhere from twenty to fifty of us, depending on how many visitors are in the lab and how many interns are around or what not, but that’s still a really small fraction of the Redmond group. So, I think anybody in a particular field finds themselves with many fewer colleagues from their own field relative to their colleagues as a whole in this lab. Whereas, I think most of our labs are dominated much more by people from computer science. Obviously, computer science is well-represented here, but we have a number of other fields as well. So, I think that foregrounding of interdisciplinarity is unique to this lab.

Host: That’s great. So, the social science research in the context of social computing and social media, it’s an interesting take on research in general at Microsoft, which is a high-tech company. How do you think the work that you do informs the broader work of Microsoft Research and Microsoft in general?

Nancy Baym: I would like to think that the kinds of work that I do, and that my colleagues are doing, are helping the company, and technology companies in general, think in more sophisticated ways about the ways that the technologies that we create get taken up and get used and with what consequences. I think that people who build technologies, they really want to help people do things. And they’re focused on that mission. And it can be difficult to think about, what are all the ways that that might get taken up besides the way that I imagine it will get taken up, besides the purpose that I’m designing it for? So, in some sense, I think part of our group is here to say, here’s some unexpected things you might not be thinking about. Here’s some consequences, or in the case of my own work, I’d like to think about the ways that technologies are often pushing people toward more connection and more time with others and more engagement and more sharing and more openness. And yet, people have very strong needs for privacy and for distance and for boundaries and what would it mean, for example, to think about how we could design technologies that helped people draw boundaries more efficiently rather than technologies that were pushing them toward openness all the time?

Host: I love that. And I’m going to circle back, in a bit, to some of those issues of designing for dialectic and some of the issues around unintended consequences. But first, I want to talk about a couple books you wrote. Before we talk about your newest book, I want to spend a little time talking about another book you wrote called Personal Connections in the Digital Age. And in it, you challenge conventional wisdom that tends to blame new technologies for what we might call old problems. Talk a little bit about Personal Connections in the Digital Age.

Nancy Baym: That book came out of a course that I had been teaching for, oh gosh, fifteen, sixteen, seventeen years, something like that, about communication and the internet, and one of the things that tends to come up is just what you’re talking about. This idea that people tend to receive new technologies as though this is the first time these things have ever been disrupted. So, part of what that book tries to do is to show how the way that people think and talk about the internet has these very long histories in how people think and talk about other communication technologies that have come before. So, for example, when the telephone was invented, there was a lot of concern that the telephone was going to lead to social disengagement, particularly among women, who would spend all the time talking on the phone and would stop voting. Um… (laughter) which doesn’t sound all that different from some contemporary ways that people talk about phones! Only now it’s the cell phones that are going to cause all that trouble. It’s that, but it’s also questions around things like, how do we present ourselves online? How do we come to understand who other people are online? How does language change when it’s used online? How do we build relationships with other people? How do we maintain relationships with people who we may have met offline? And also, how do communities and social networks form and get maintained through these communication technologies? So, it’s a really broad sweep. I think of that book as sort of the “one stop shop” for everything you need to know about personal connections in the digital age. If you just want to dive in and have a nice little compact introduction to the topic.

Host: Right. There are other researchers looking into these kinds of things as well. And is your work sort of dovetailing with those findings in that area of personal relationships online?

Nancy Baym: Yeah, yeah. There’s quite a bit of work in that field. And I would say that, for the most part, the body of work which I review pretty comprehensively in Personal Connections in the Digital Age tends to show this much more nuanced, balanced, “for every good thing that happens, something bad happens,” and for all of the sort of mythologies about “its destroying children” or “you can’t trust people you meet online,” or “people aren’t their real selves” or even the idea that there’s something called “real life,” which is separate from what happens on the internet, the empirical evidence from research tends to show that, in fact, online interaction is really deeply interwoven with all of our other forms of communication.

Host: I think you used the word “moral panic” which happens when a new technology hits the scene, and we’re all convinced that it’s going to ruin “kids today.” They won’t have manners or boundaries or privacy or self-control, and it’s all technology’s fault. So that’s cool that you have a kind of answer to that in that book. Let’s talk about your new book which is super fascinating: Playing to the Crowd: Musicians, Audiences and the Intimate Work of Connection. Tell us how this book came about and what was your motivation for writing it?

Nancy Baym: So, this book is the result of many years of work, but it came to fruition because I had done some early work about online fan community, particularly soap opera fans, and how they formed community in the early 1990s. And then, at some point, I got really interested in what music fans were doing online and so I started a blog where I was posting about music fans and other kinds of fans and the kinds of audience activities that people were doing online and how that was sort of messing with relationships between cultural producers and audiences. And that led to my being invited to speak at music industry events. And what I was seeing there was a lot of people with expertise saying things like, “The problem is, of course, that people are not buying music anymore, so the solution to this problem is to use social media to connect with your audience because if you can connect with them, and you can engage them, then you can monetize them.” And then I was seeing the musicians ask questions, and the kinds of questions that they were asking seemed very out-of-step with the kind of advice that they were being given. So, they would be asking questions like, do I have to use all of the sites? How do I know which ones to use? So, I got really interested in this question, of sort of, what, from the point of view from these people who were being told that their livelihood depends on creating some kind of new social relationship using these media with audiences, what is this call to connect and engage really about? What does it feel like to live with that? What are the issues it raises? Where did it come from? And then this turned into a much larger-scoped project thinking about musicians as a very specific case, but one with tremendous resonance for the ways that so many workers in a huge variety of fields now, including research, feel compelled to maintain some kind of visible, public persona that engages with and courts an audience so that when our next paper comes out, or our next record drops, or our next film is released or our next podcast comes out, the audience is already there and interested and curious and ready for it.

Host: Well let me interject with a question based on what you said earlier. How does that necessarily translate into monetization? I can see it translating into relationship and, you know, followership, but is there any evidence to support the you know…?

Nancy Baym: It’s magic, Gretchen, magic!

Host: OK. I thought so! I knew it!

Nancy Baym: You know, I work with economists and I keep saying, “Guys, let’s look at this. This is such a great research problem.” Is it true, right? Because you will certainly hear from people who work at labels or work in management who will say, “We see that our artists who engage more do better.” But in terms of any large scale “what works for which artists when?” and “does it really work across samples?” is, the million-dollar question that you just asked, is does it actually work? And I don’t know that we know the answer to that question. For some individuals, some of the time, yes. For the masses, reliably, we don’t know.

Host: Well and the other thing is, being told that you need to have this social media presence. It’s work, you know?

Nancy Baym: That’s exactly the point of the book, yeah. And it’s not just that it’s work, it’s that it’s work that never, ever ends. Because your phone is in your pocket, right? So, you’re sitting at home on a Sunday morning, having a cup of coffee, and even if you don’t do it, there’s always the possibility of, “Oh, I could tweet this out to my followers right now. I could turn this into an Instagram story.” So, the, the possibility of converting even your most private, intimate moments into fodder for your work life is always there, now. And the promise is, “Oh, if you get a presence, then magic will happen.” But first of all, it’s a lot of work to even create the presence and then to maintain it, you have to sell your personality now. Not just your stuff. You have to be about who you are now and make that identity accessible and engaging and what not. And yet it’s not totally clear that that’s, in fact, what audiences want. Or if it is what audiences want, which audiences and for which kinds of products?

(music plays)

Host: Well, let’s get back to the book a little bit. In one chapter, there’s a subsection called How Music Fans came to Rule the Internet. So, Nancy, how did music fans come to rule the internet?

Nancy Baym: So, the argument that I make in that chapter is that from the earliest, earliest days of the internet, music fans, and fans in general, were not just using the internet for their fandom, but were people who were also actively involved in creating the internet and creating social computing. So, I don’t want to say that music fans are the only people who were doing this, because they weren’t, but, from the very beginnings of online interaction, in like 1970, you already had the very people who are inventing the concept of a mailing list, at the same time saying, “Hey, we could use one of these to exchange Grateful Dead tickets, ‘cause I have some extra ones and I know there’s some other people in this building who might want them.” So, you have people at Stanford’s Artificial Intelligence laboratory in the very beginning of the 1970s saying, “Hey, we could use this enormous amount of computing power that we’ve got to digitize The Grateful Dead lyrics.” You have community computing projects like Community Memory being launched in the Bay Area putting their first terminal in a record store as a means of bringing together community. And then, from those early, early moments throughout, you see over and over and over again, music fans creating different forms of online community that then end up driving the way that the internet develops, peer-to-peer file sharing being one really clear example of a case where music fans helped to develop a technology to serve their needs, and by virtue of the success of that technology, ended up changing not just the internet, but industries that were organized around distributing cultural materials.

Host: One of the reviewers of Playing to the Crowd, and these reviews tend to be glowing, right? But he said, “It’ll change the way we think about music, technology and people.” So, even if it didn’t change everything about the way we think about music technology and people, what kinds of sort of “ah-ha findings” might people expect to find in the book?

Nancy Baym: I think one of the big ah-has is the extent to which music is a form of communication which has become co-opted, in so many ways, by commercial markets, and alongside that, the ways in which personal relationships and personal communication, have also become co-opted by commercial markets. Think about the ways that communication platforms monetize our everyday, friendly interaction through advertising. And the way that these parallel movements of music and relational communication from purely social activities to social activities that are permeated by commercial markets raises dialectic tensions that people then have to deal with as they’re continually navigating moving between people and events and circumstances and moments in a world that is so infused by technology and where our relationships are infused by technology.

Host: So, you’ve used the word “dialectic” in the context of computer interface design, and talked about the importance of designing for dialectic. Talk about what you mean by that and what kinds of questions arise for a developer or a designer with that mind set?

Nancy Baym: So, “dialectic” is one of the most important theoretical concepts to me when I think about people’s communication and people’s relationships in this project, but, in general, it’s a concept that I come back to over and over and over, and the idea is that we always have competing impulses that are both valid, and which we have to find balance between. So, a very common dialectic in interpersonal relationships is the desire to, on the one hand, be connected to others, and on the other, to be autonomous from others. So, we have that push and pull between “I want us to be part of each other’s lives all the time, and also leave me alone to make my own decisions.” (laughter) So that dialectic tension is not that one is right and one is wrong. It’s that that, and, as some of the theorists I cite on this argue, probably infinite dialectic tensions between “I want this, but I also want that” and it’s the opposite, right? And so, if we think about social interaction, instead of it being some sort of linear model where we start at point A with somebody and we move onto B and then C and then D, if we think of it instead as, even as we’re moving from A to B to C, that’s a tightrope. But at any given moment we can be toppling into one side or the other if we’re not balancing them carefully. So, if we think about a lot of the communication technologies that are available to us right now, they are founded, often quite explicitly, on a model of openness and connection and sharing. So, those are really, really valuable positions. But they’re also ends of dialectics that have opposite ends that are also very valid. So, all of these ways in which we’re pushed to be more open, more connected, to share more things, they are actually always in conflict within us with desires to be protective of other people or protective of ourselves, to have some distance from other people, to have autonomy. And to be able to have boundaries that separate us from others, as well as boundaries that connect us to one another. So, my question for designers is, how could we design in ways that make it easier for people to adjust those balances? In a way, you could sort of think about it as, what if we made the tightrope, you know, thicker so that it were easier for people to balance on, and you didn’t need to be so good at it, to make it work moment-to-moment?

Host: You know, everything you’ve just said makes me think of, you know, say, someone who wants to get involved in entertainment, in some way, and one of the plums of that is being famous, right? And then you find…

Nancy Baym: Until they are.

Host: …Until you are… that you don’t have control over all the attention you get and so that dialectic of “I want people to notice me/I want people to leave me alone” becomes wildly exacerbated there. But I think, you know, we all see “over-sharers,” as my daughter calls, them on social media. It’s like keep looking at me all the time. It’s like too much information. Have some privacy in your life…

Nancy Baym: Well you know, but that’s a great case, because I would say too much information is not actually a property of information, or of the person sending that information, it’s a property of the person receiving that information. Because, in fact, for some, it’s not going to be too much information. For some, it’s going to be exactly the right amount of information. So, I think of the example, of, from my point of view, a number of people who are parents of young children post much too much information on social networks. In particular, I’m really, really turned off by hearing about the details of their trivial illnesses that they’re going through at any given moment. You know, I mean if they got a real illness, of course I want to hear about it, but if you know, they got a fever this week and they’re just feeling a little sick, I don’t really need daily updates on their temperature, for instance. Um… on the other hand, I look at that, and I say, “Oh, too much information.” But then I say, “I’m not the audience for that.” They’ve got 500-600 friends. They probably put that there for grandma and the cousins who actually really do care. And I’m just not the audience. So, it’s not that that’s too much information. It’s that that information wasn’t meant for me. And instead of blaming them for having posted it, maybe I should just look away and move on to the next item in my feed. That’s ok, too. I’m sure that some of the things that I share strike some people as too much information but then, I’ll tell you what, some of the things that post that I think of as too much information, those are often the ones that people will later, in other contexts, say, “Oh my gosh, it meant so much to me that you posted about… whatever.” So, you know, we can’t just make these judgements about the content of what other people are producing without understanding the contexts in which it’s being received, and by whom.

Host: That is such a great reminder to us to have grace.

Nancy Baym: Grace for other people, that too, yeah.

Host: You’ve been watching, studying and writing about cyberculture for a long time. Going back a ways, what did you see, or even foresee, when you started doing this research and what if anything has surprised you along the way?

Nancy Baym: Well, it’s a funny thing. I mean, when I started doing this research, it was 1991. And the landscape has changed so much since then, so that the kinds of things that I could get away with being an insightful scholar for saying in 1991 are practically laughable now, because people just didn’t understand, at that time, that these technologies were actually going to be really socially useful. That people were going to use these technologies to present themselves to others, to form relationships, to build communities, that they were going to change the way audiences engaged, that they were going to change politics, that they were going to change so many practices of everyday life. And I think that those of us who were involved in cyberculture early, whether it was as researchers or just participants, could see that what was happening there was going to become something bigger than it was in those early days.

(music plays)

Host: I ask all of the researchers that come on the podcast some version of the question, “Is there anything that keeps you up at night?” To some degree, I think your work addresses that. You know, what ought we to be kept up at night about, and how, how ought we to address it? Is there anything that keeps you up at night, or anything that should keep us up at night that we should be thinking about critically as we’re in this landscape now?

Nancy Baym: Oh gosh, do any of us sleep anymore at all? (laughter) I mean I think what keeps me up nights is thinking, is it still ok to study the personal and the ordinary when it feels like we’re in such in extraordinary, tumultuous and frightening times, uh, nationally and globally? And I guess what I keep coming back to, when I’m lying awake at 4 in the morning saying, “Oh, maybe I just need to start studying social movements and give up on this whole interpersonal stuff.” And then I say to myself, “Wait a minute. The reason that we’re having so much trouble right now, at its heart, is that people are not having grace in their relations with one another,” to go back to your phrase. That what we really, really need right now more than anything is to be reconnected to our capacity for human connection with others. And so, in that sense, then, I kind of put myself to sleep by saying, “OK, there’s nothing more important than actual human connection and respect for one another.” And so that’s what I’m trying to foster in my work. So, I’m just going to call that my part and write a check for some of those other causes I can’t contribute to directly.

Host: I, I love that answer. And that actually leads beautifully into another question which is that your social science work at MSR is unique at industrial research labs. And I would call Microsoft, still, an industrial, you know, situation.

Nancy Baym: Definitely.

Host: So, you get to study unique and challenging research problems.

Nancy Baym: I have the best job in the world.

Host: No, I do, but you got a good one. Because I get to talk to people like you. But what do you think compels a company like Microsoft, perhaps somewhat uniquely, to encourage researchers like you to study and publish the things you do? What’s in it for them?

Nancy Baym: My lab director, Jennifer Chayes, talks about it as being like a portfolio which I think is, is a great way to think about it. So, you have this cast of researchers in your portfolio and each of them is following their own path to satisfying their curiosity and by having some of those people in that portfolio who really understand people, who really understand the way that technologies play out in ordinary people’s everyday lives and lived experiences, there may be moments where that’s exactly the stock you need at that moment. That’s the one that’s inflating and that’s the expertise that you need. So, given that we’re such a huge company, and that we have so many researchers studying so many topics, and that computing is completely infused with the social world now… I mean, if we think about the fact that we’ve shifted to so much cloud and that clouds are inherently social in the sense that it’s not on your private device, you have to trust others to store your data, and so many things are now shared that used to be individualized in computing. So, if computing is infused with the social, then it just doesn’t even really make sense for a tech company to not have researchers who understand the social, and who are studying the social, and who are on hand with that kind of expertise.

Host: As we close, Nancy, what advice would you give to aspiring researchers, maybe talking to your 25-year-old self, who might be interested in entering this field now, which is radically different from where it was when you started looking at it. What, what would you say to people that might be interested in this?

Nancy Baym: I would say, remember that there is well over a hundred years of social theory out there right now, and the fact that we have new communication technologies does not mean that people have started from scratch in their communication, and that we need to start from scratch in making sense of it. I think it’s more important than ever, when we’re thinking about new communication technologies, to understand communication behavior and the way that communication works, because that has not fundamentally transformed. The media through which we’ve used it has, but the way communication works to build identity, community, relationships, that has not fundamentally, magically, become something different. The same kind of interpersonal dynamics are still at play in many of these things. I think of the internet and communication technologies as being like funhouse mirrors. Where some phenomena get made huge and others get made small, so there’s a lot of distortion that goes on. But nothing entirely new is reflected that never existed before. So, it’s really important to understand the precedents for what you’re seeing, both in terms of theory and similar phenomena that might have occurred in earlier incarnations, in order to be able to really understand what you’re seeing in terms of both what is new, but also what’s not new. Because otherwise, what I see a lot in young scholarship is, “Look at this amazing thing people are doing in this platform with this thingy.” And it is really interesting, but it also actually looks a whole lot like what people were doing on this other platform in 1992, which also kind of looks a lot like what people were doing with ‘zines in the 1920s. And if we want to make arguments about what what’s new and what’s changing because of these things, it’s so important that we understand what’s not new and what these things are not changing.

(music plays)

Host: Nancy Baym, it’s been an absolute delight talking to you today. I’m so glad you took time to talk to us.

Nancy Baym: Alrighty, bye.

To learn more about Dr. Nancy Baym, and how social science scholars are helping real people understand and navigate the digital world, visit Microsoft.com/research.

Robot social engineering works because people personify robots

Brittany “Straithe” Postnikoff is a graduate researcher at the University of Waterloo in Ontario who has been researching robot social engineering — the intersection of human-robot interaction and security and privacy — for the past four years.

Postnikoff has found human-robot interaction (HRI) to be surprisingly close to human-human interaction in many cases, which piqued her interest into the security and privacy concerns that could arise if robots were used in social engineering attacks. Her research has included using cameras on robots to spy on people, getting victims to give personal information and even using a robot to change people’s opinions.

Although her research is still in early days, Postnikoff has found striking results, not the least of which is how little security is built in to consumer robots on the market today.

How did you begin studying robot social engineering? Did you start on the social engineering side or on the robotics side?
Brittany ‘Straithe’ Postnikoff: I guess I started on the social engineering side, but I didn’t understand that at the time. For background, I collect post-secondary pieces of paper. I have college diplomas in both business administration and business information technology. And in both of those programs, I acquired people management skills, which I learned were useful in social engineering attacks when I attended DEF CON for the first time.
As for robot social engineering, I casually began studying this topic shortly after I started university to get my computer science degree. I had joined a very small robotics team, and within my first three months of university, the team flew to China for a competition.

During this competition, the robots and the humans on the team wore matching blue jerseys and toques that looked like Jayne’s from ‘Firefly.’ You can look up ‘Jennifer the Skiing Robot’ to see what we looked like.

So many people stopped my teammate and I during the competition to take photos with us and our robots. We noticed this wasn’t happening to the other teams. What was really interesting to me was that people cheered for us and our robots even if we were their competition.

I wondered why. Why are people cheering for our robot instead of theirs? Why are we getting all this extra attention? It’s then I started to see opportunities to blend my marketing and human resources knowledge with robots, security and privacy.

Luckily, my undergraduate university was also host to a human-robot interaction lab. I joined the lab the next semester and learned about concepts like robot use of authority, body positioning and gesturing from my senior researchers that are the foundation of the robot social engineering research that I now pursue full time.

Are there any major differences between what people would normally think of as social engineering and robot social engineering?
Postnikoff: Well, the biggest and clearest difference is that the attack is performed by a robot instead of a human. Otherwise, the base attacks are generally quite close to human-performed attacks.

Like a human, robots can make use of authority to convince people to do things; they can make use of empathy to convince someone to take particular actions and so on. What is important for a robot social engineering attack is that the robot has a body and it’s able to interact with humans on a social level.

The interesting thing about the embodiment of a robot is that people will believe each physical robot is its own individual entity, especially if the robot is known to act autonomously. It doesn’t normally occur to people that a typically autonomous robot acting erratically might have been taken over by a third party.
In researching your work, it appears that the human empathy toward the robot is a big part of the attack. Is that right?
Postnikoff: Yes, just like with some human-performed social engineering attacks, robots that are able to interact on a social level with humans can make use of a victim’s empathetic side in order to perform some attacks. For example, a malicious entity could infect a robot with ransomware and only restore the robot once the ransom has been paid.

If it’s a robot that someone is extremely attached to, in need of, or if they have put a lot of work into personalizing the robot and training it, this could be a particularly devastating attack.

What is next in your robot social engineering research?
Postnikoff: Next in my research is performing more attacks in both controlled environments and in the wild in order to collect some stats on how effective it really is. I think it’s important to determine how widespread this issue could become. Hopefully, I’ll be able to post those results publicly in a couple months.
How does artificial intelligence factor into your research into robot social engineering?
Postnikoff: Artificial intelligence is very important, but tangential to the research that I’m currently pursuing. In HRI, we often use the ‘Wizard of Oz’ technique, which involves a person sitting behind a curtain — or in a different room — and controlling the robot while another person is in the same room as the robot and interacting with it. The people interacting with the robot often can’t tell that the robot is being controlled and will assume that the robot is acting on its own. For this reason, I don’t spend time researching AI, because we can fake it effectively enough for our purposes at this time.

Many other experts are working on AI, and my time is better spent focusing on how the physical embodiment of robots and the actions of robots can impact the people they interact with.
How many robots do you have right now?
Postnikoff: Right now, I have direct access to about 30 robots, but I only have five different models of robots. Thankfully, I have a lot of friends and contacts who are in other universities and companies that are willing to let me play with their robots and complete tests and experiments once in a while.

Sometimes, I help them set up their own experiments to try with the robots, and they let me know what happened as a result. Or, I provide them with the background information and resources they need for their own research. Additionally, people will send me robots to perform experiments on if I promise to do security assessments on them.

To me, these are all win-win scenarios.
Are they all consumer robots?

Postnikoff: For the most part, yes. I try and work though all the different types of robots — consumer, industrial, medical and so on. But, unfortunately, many of the medical and industrial robots are quite pricey and are harder to get access to. This leaves me to experiment primarily with consumer robots.

Consumer robots are also more likely to be widespread, which does offer some benefits considering the research that I do — especially when I can show what sorts of things I can do inside somebody’s home. Saying that, much of my research also applies to what can happen inside companies that make use of robots — banks and malls — when they don’t understand what can be done with a social robot if it’s not adequately secured.
How have you found the security to be in the robots you use?
Postnikoff: Not great. A number of the robots I deal with really do need a lot of help. And that’s one reason why I’m trying to bring awareness of this topic to the security and privacy community, especially before robots become more widespread.

What’s interesting here is that the topic of robot security overlaps heavily with IoT security, and most of what is being done in that field to make devices more secure also applies to robots.
With the robots that you use where you’re controlling them, is it generally difficult to get control access?
Postnikoff: It depends on the robot, but many are surprisingly easy to gain control over. There were some first-year computer science students at my university that I was mentoring, and after a bit of instruction and background, they were able to get into the robots, even though they had no experience doing this sort of thing just hours before.

A number of the robots I looked at have had default credentials, sent usernames and passwords in plaintext, transmitted unencrypted video streams and so on. These are a lot of the same problems that plague many of the other devices that people in this industry see.
What kinds of robot social engineering attacks have you run?

Postnikoff: One of my favorite attacks is putting snacks on top of the Roomba-like robot as a way to get access into a locked space.

First, I research who might be in the space, then write that person’s name on a nameplate and put it on the robot, along with the robot’s nametag and the snacks. I use an app to drive the robot to the door, and I get it to run into the door a few times. People hear the robot’s knock, answer the door and might let it in. Meanwhile, I’m able to use the app to look through the robot’s camera and hear through its microphones to absorb what is happening in the space.

There is a paper out by [Serena] Booth et al. called ‘Piggybacking Robots‘ that does a great job of describing a similar attack that inspired me to try this. So, if you ever try one of those food delivery robots that are in D.C. or the Silicon Valley area, you might not want to let them into your house if you don’t have to. You never know who might be piggybacking on the robot’s camera or video feed.
Do you have to be within Bluetooth range to be able to control the robots, or can they be controlled over the internet?
Postnikoff: Some yes; others no. A lot of the robots that I’m personally dealing with have remote-access capabilities. That is actually a common feature that companies selling consumer robots like to boast about. They might say that if you want to check if your front door is locked, you can hop into the robot, point it at your door and use the robot’s camera to check if the door is locked. That might be great for you, but this same capability is also pretty great for an attacker if they can get remote access.
Is there anything else people should know about robot social engineering research?
Postnikoff: Robot social engineering attacks are starting to happen in the wild. I have had a number of groups approach me with incidents involving their social robots that could easily be classified as robot social engineering attacks. If we start focusing on this issue now, we can prevent greater issues in the future.

Amanda Rousseau talks about computer forensics investigations

Amanda Rousseau, the senior malware researcher at Endgame who is also known as Malware Unicorn,  began her career working for the Department of Defense Cyber Crime Center performing computer forensics investigations before moving into the private sector.

At Black Hat USA 2018, Rousseau talked about her experiences with dead box computer forensics investigations — studying a device after a crime has been committed in order to find evidence — how to de-stress after spending a week reverse engineering malware encryption, and how to tell the difference between code written by a script-kiddie and a nation-state actor.

This interview was edited for length and clarity.

What was your role in computer forensics investigations? 
Amanda Rousseau: When I did forensics, I did criminal investigation. So if there was a murder, if there was domestic terrorism or something like that, they would give me the hard drive and I would analyze it. 
It was very specific; it’s not really intrusions, right? Intrusions are more dynamic. But even when you talk about attribution, I cringe because no one really wants to put their finger on where it came from, exactly. If you get it wrong, you could start a war.

I was never on threat intel, thank goodness. I was mainly doing case-by-case, just looking at a certain thing in malware, writing a report on it, giving it up to someone else so that they can do the groundwork. I was more behind the scenes.

Even now, I feel like it’s my job to take out all of the interesting information for them to put the clues together on there. Because when you think about when an FBI agent, or someone that’s doing the investigation, [they know] much more that I don’t know outside of what I see. I can only give my nonbiased results from what I’ve analyzed. And they can put the clues together themselves.

It takes a team. It takes a team to do that kind of stuff.

When it comes to computer forensics investigations, what were the challenges in ensuring the evidence was accurate? 

If there was a murder, if there was domestic terrorism or something like that, they would give me the hard drive and I would analyze it.
Amanda Rousseausenior malware researcher, Endgame

Rousseau: We had to prove that that person was at the computer at that time. Because there would be incidents where the wife’s husband, boyfriend, or whatever would be at her computer or vice versa. So you really couldn’t put that person at the computer doing that thing. Maybe there was a camera that took a picture that [proved] they were there, or maybe their alibi would prove that they were at the computer. But it’s really hard, even for that tiny moment in time, for dead box forensics
For intrusion forensics, it’s completely different. You can trace the IP [address] to the server, and it’s another jump server, and then you see who owns the server, and then the people on the ground have to go trace who’s at that address who owns the server and you get all the credit card accounts that paid for that server.
What was the most difficult thing that you had to do in dead box cyber forensics investigations?
Rousseau: One difficult thing was when I was learning; it was just a learning curve. All you had to do was do it more and practice. It’s kind of like reversing; the more you do it, the more experience you get and [you] see quicker ways to do things.

I think when I did intrusions investigation, the hardest thing to do was encryption, because you have to sit there and try to identify encryption algorithms backwards. And so you’re sitting there with pen and paper like, ‘OK. This bit gets flipped here.’ And you’re writing the whole algorithm down and trying to visualize it. And then you’d identify, ‘Oh, it’s doing this.’ And that’s like a week’s worth of work. But it’s fun. It’s like a puzzle to me.
A week-long puzzle, though. It sounds taxing.
Rousseau: Yeah. You really have to time-manage your brain. Like, ‘OK, it’s the end of the day. I’ll put my notes down.” Next day, pick it back up, figure it out. 
What’s a good way to decompress when trying to reverse encryption like that? 
Rousseau: You know, it’s funny because there’s a lot of reverse engineers that are runners, or triathletes. So I haven’t done a lot of running this year, but before, I was marathon training. Because you’re sitting there for hours and hours … just staring at code. We forget to stand up and move around and everything. But running was my only way to …
Overcompensate with marathons. 
Rousseau: Yeah, exactly. 

Now, rather than cyber forensics investigations, you’re mainly doing reverse engineering of malware. Can you walk us through that process? 
Rousseau: Pretty much my day-to-day job is looking at malware, taking it apart, writing a detection for it, doing the research. It’s either short term or long term, depending on what the product needs, or what the customer needs at that time, pretty much. 
There’s a process. If you’re looking at thousands of samples, you’ve got to have a way to triage all of that and bubble up the things that are important, or the ones that you should be looking at. Same with the file itself. I don’t want to just start from the beginning. I want to look at a clue and start there. 
A lot of the research that I did for my Black Hat talk was triage analysis. My boss asked me to do 1,000 samples in three days, manual analysis. I’m like, ‘I can do one sample in a few hours, but I don’t know if I can do all 1,000 samples in three days.’ 
So I developed this tool that helped me print out all the stuff that I needed in order to look at samples. I don’t have to look at every single sample, but just the ones that are important because otherwise I would be there forever. 
How do you determine what is important?
Rousseau: In a binary, you have these things called libraries that load — imports, pretty much. And a lot of these imports give you an idea of what the program is doing. So as an indicator, say it is loading user32.dll. What that is, is it could be doing user-related actions on the system. If you load in Winsock, it’s for sockets, right?

All of these different clues as to what libraries are loading, you can kind of get a sense of what it’s actually going to do, even the function that it’s going to call. Because then you kind of build in, ‘OK, well, it’s going to do something to the file system, it’s going to open up a socket and connect out to some IP address. I’m going to have to look for an IP address, I’m going to have to look for some strings creating a file in the file system.’ That kind of stuff.

But in order to that, I need to disassemble it and see when that happens, in what order it happens. Because goodware can do the same thing, but depending on the context — the order — is it doing it all in one function, or is it spread out? Some of those little clues pinpoint the ones that you need to look at. 
And these clues help you understand what kind of malware you’re studying? 
Rousseau: Yeah, and it depends on the motive. If you’re ransomware, you’re going to do encryption; you’re going to do file system activity; you’re going to call out to some onion server for the Bitcoin. If you’re spyware, you’re going to be doing keylogging; you’re going to be accessing the camera; you’re going to be trying to take screenshots of the desktop. So those are all different libraries to look. 
If you’re just a regular Trojan or a remote access Trojan, you’re going to be calling back out to your [command-and-control network]. You’ll receive instructions to do stuff. So if you know what kind of class they are, you’re looking for those indicators to place them into that class of malware. 

Have you seen any trends in the code across different malware types? 
Rousseau: Yeah, it’s funny because with ransomware, there were two main libraries that a lot of the ransomware stemmed off of. It’s kind of like this growing tree of variations of the same code. And because some idiot posted it on GitHub somewhere, all these little 19-year-old to 26-year-olds are playing with this code and making ransomware to make a quick buck. 
The ones that do well are the crimeware people that adopt ransomware and make it more like a business, a little large-scale business. 
Rousseau: Right, right. But when you’re reversing, you can see different code, kind of a mishmash of someone writing it this way and another. It’s like handwriting. You can tell when there’s two different types of handwriting on a page. It’s like that in code for me.

If you look at enough of it you can identify, ‘OK, this is kind of weird. Someone wrote it backwards,’ or that kind of thing. Even with WannaCry, the code for the exploit is completely different than the actual ransomware code. Actual ransomware code is really crappily done, but the exploit code was beautiful. So you know they were kind of mishmashed together. 
Well, the exploit code came from
Rousseau: It was released, yeah, from … Yeah.
I guess we know that the government has really good coders. I guess that’s the key there. 
Rousseau: Yeah, the nation-state stuff, you can tell the level of expertise in that developer because usually, that whole thing will look similar. If it’s one or two guys, maybe it will look different. But the more common malware, they buy that stuff off of black market deployment and it comes in a kit. And these kits, they add on their own pictures or whatever they want in the thing. So it kind of has this variant of this s—– code with whatever s—– code that they add in, pretty much.

Microsoft’s Top 100 Security Researchers – Black Hat 2018 Edition

This morning we are excited to unveil the security researcher leaderboard at the Black Hat Security Conference.  This list recognizes the top security researchers who have contributed research to the Microsoft products and services.  If you are curious on how we build the list, check out our blog from last week on The Making of the Top 100 Researcher List

We appreciate all the work and partnerships with the security community over the years.  This is a good annual reflection point on the past year’s contributions.  Keep up the great work and we look forward to hearing from you this year too.

Microsoft’s Top 100 Security Researcher List

Ranking Researcher Name
1 Ashar Javed
2 Junghoon Lee
3 Yuki Chen
4 Cameron Vincent
5 Richard Shupak
6 Suresh Chelladurai
7 MaoFeng Ran
8 Mateusz Jurczyk
9 Ivan Fratric
10 Gal De Leon
11 Jaanus Kääp
12 James Forshaw
13 Kai Song
13 Hui Gao
15 Andreas Sandblad
16 Ajay Kulal
17 Yeivin Nadav
18 Fan Xiaocao
19 Liu Long
20 Zhang Yunhai
21 Dmitri Kaslov
22 Marcin Towalski
23 Qixun Zhao
24 Wayne Low
25 Huang Anwen
26 Dhanesh Kizhakkinan
27 Peter Hlavaty
28 Simon Zuckerbraun
29 Xiao Wei
30 Yassine Nafiai
31 Alex Ionescu
32 WenQunWang
32 Debasish Mandal
34 Ismail Talib
35 Cem Karagun
36 Adrian Ivascu
36 Ahmed Jerbi
38 Kdot
39 Zhong Zhaochen
40 Hung Huynh
40 Rancho Han
42 Jens Muller
43 Linan Hao
43 Lucas Leong
43 Ying Xinlei
43 J00Sean
47 Hamza Bettache
48 Aradnok
48 Zhou Yu
50 Mohamed Hamed
51 Vikash Chaudhary
52 Alec Blance
53 Zhenhuan Li
54 Xiong Wenbin
54 Richard Zhu
56 Minh Tran
57 Frans Rosen
57 Steven Seeley
59 Mario Gomes
60 Matt Nelson
61 Zhang Sen
62 Scott Bell
62 Honggang Ren
62 Ke Liu
63 Nethaniel Gelernter
63 Vladislav Stolyarov
67 Ivan Vagunin
67 Mustafa Hasan
69 SaifAllah Massaoud
70 Adesh Nandkishor Kolte
70 Roman Blachman
70 Omair
73 Tao Yan
73 Giwan Go
73 Nick Freeman
76 Amal Mohandas
77 Lucas Moreira Giungi
78 Marcin Wiazowski
79 Adam Bauer
79 Oleksandr Mirosh
79 Yangkang
79 Wanglu
79 Yong Chuan Koh
79 Jin Chen
79 Rgod
79 Ding Maoyin
79 Song Shenlei
88 Jovon Itwaru
88 Hungtt28
90 Abdulrahman Alqabandi
90 Christian Holler
92 Arik Isik
92 Manish Kumar Gupta
92 Kévin Chalet
92 Linang Yin
96 Ahmed Radi
97 Guangmingliu
97 Amir Shaahin
97 Omair Ahmed
97 nyaacate

Phillip Misner,

Principal Security Group Manager

Microsoft Security Response Center

New MalwareTech indictment adds four more charges

The court saga of Marcus Hutchins, a security researcher from England also known as MalwareTech, will continue after a superseding indictment filed by the U.S. government added new charges to his case.

Hutchins was originally arrested in August 2017 on charges of creating and distributing the Kronos banking Trojan. The superseding MalwareTech indictment, filed on Wednesday, adds four new charges to the original six, including the creation of the UPAS kit malware, conspiracy to commit wire fraud, and lying to the FBI.

Hutchins first gained prominence in May 2017 for being one of the researchers who helped slow the spread of the WannaCry ransomware, and he recently mused on Twitter at the connection between that act and the new MalwareTech indictment.

Hutchins also had strong language to describe the supplemental indictment, but one of his lawyers, Brian Klein was more measured.

A question about the new MalwareTech indictment

The UPAS Kit described in the new filing was a form grabber that Hutchins admitted to creating, but he asserted it was not connected to Kronos. Marcy Wheeler, national security and civil liberties expert, questioned how this was included in the new MalwareTech indictment because of the time frames related to those charges.

The indictment noted that the UPAS Kit was originally sold and distributed in July 2012 and it alleged Hutchins developed Kronos “prior to 2014” and supplied it to the individual who sold the UPAS Kit. However, Wheeler pointed out in a blog post that there should be a five year statute of limitations related to such charges and even if the government could avoid that, Hutchins would have been a minor in 2012 when these actions allegedly took place.

Additionally, Wheeler noted that Hutchins admitted to creating the UPAS form grabber — although he denied it was part of Kronos — when he was first arrested by the FBI. The new MalwareTech indictment claims Hutchins lied to the FBI about creating Kronos which would put into question the new charge that Hutchins lied to the FBI.

Accenture: Intelligent operations goal requires data backbone

A newly released report co-authored by Accenture and market researcher HfS reveals 80% of the global enterprises surveyed worry about digital disruption, but many of those companies lack the data backbone that could help them compete.

The report stated that large organizations are “concerned with disruption and competitive threats, especially from new digital-savvy entrants.” Indeed, digital disrupters such as Uber and Lyft in personal transportation, Airbnb in travel and hospitality, and various fintech startups have upset the established order in those industries. The Accenture-HfS report views “intelligent operations” as the remedy for the digital challenge and the key to bolstering customer experience. But the task of improving operations calls for organizations to pursue more than a few mild course corrections, according to Debbie Polishook, group chief executive at Accenture Operations, a business segment that includes business process and cloud services.

In the past, enterprises that encountered friction in their operations would tweak the errant process, add a few more people and take on a Lean Six Sigma project, she noted. Those steps, however, won’t suffice in the current business climate, Polishook said.

“Given what is happening  today with the multichannel, with the various ways customers and employees can interact with you, making tiny tweaks is not going to get it done and meet the expectations of your stakeholders,” she said.

Graphic detailing data quality problems within organizations
Organizations struggle to leverage their data

Hard work ahead

The report, which surveyed 460 technology and services decision-makers in organizations with more than $3 billion in revenue, suggested professional services firms such as Accenture will have their work cut out for them as they prepare clients for the digital era.

The survey noted most enterprises struggle to harness data with an eye toward improving operations and achieving competitive advantage. The report stated “nearly 80% of respondents estimate that 50% [to] 90% of their data is unstructured” and largely inaccessible. A 2017 Accenture report also pointed to a data backbone deficit among corporations: More than 90% of the respondents to that survey said they struggle with data access.

In addition, half of the Accenture-HfS report respondents who were surveyed acknowledged their back office isn’t keeping pace with the front office demands to support digital capabilities.

“Eighty percent of the organizations we talked to are concerned with digital disruption and are starting to note that their back office is not quite keeping up with their front office,” Polishook said. “The entire back office is the boat anchor holding them back.”

That lagging back office is at odds with enterprises’ desire to rapidly roll out products and services. An organization’s operations must be able to accommodate the demand for speed in the context of a digital, online and mobile world, Polishook said.

Enterprises need a “set of operations that can respond to these pressures,” she added. “Most companies are not there yet.”

One reason for the lag: Organizations tend to prioritize new product development and front office concerns when facing digital disruption. Back office systems such as procurement tend to languish.

“Naturally, as clients … are becoming disrupted in the market, they pay attention first to products and services,” Polishook said. “They are finding that is not enough.”

The report’s emphasis on revamped operations as critical to fending off digital disruption mirrors research from MIT Sloan’s Center for Information Systems Research. In a presentation in 2017, Jeanne Ross, principal research scientist at the center, identified a solid operational backbone as one of four keys to digital transformation. The other elements were strategic vision, a focus on customer engagement or digitized solutions and a plan for rearchitecting the business.

The path to intelligent operations

The Accenture-HfS report identified five essential components necessary for intelligent operations: innovative talent, a data backbone, applied intelligence, cloud computing and a “smart partnership ecosystem.”

As for innovative talent, the report cited “entrepreneurial drive, creativity and partnering ability” as enterprises’ top areas of talent focus.

There is a lot of heavy lifting to be done.
Debbie Polishookgroup chief executive, Accenture Operations

“One of the most important pieces getting to intelligent operations is the talent,” Polishook said. She said organizations in the past looked to ERP or business process management to boost operations, but contended there is no technology silver bullet.

The data-driven backbone is becoming an important focus for large organizations. The report stated more than 85% of enterprises “are developing a data strategy around data aggregation, data lakes, or data curation, as well as mechanisms to turn data into insights and then actions.” Big data consulting is already a growing market for channel partners.

In the area of applied intelligence about 90% of the enterprises surveyed identified automation, analytics and AI as technologies that will emerge as the cornerstone of business and process transformation. Channel partners also look forward to the AI field and the expanded use of such automation tools as robotic process automation as among the top anticipated trends of 2018.

Meanwhile, more than 90% of large enterprises expect to realize “plug-and-play digital services, coupled with enterprise-grade security, via the cloud, according to the Accenture-HfS report. And a like percentage of respondents viewed partnering with an ecosystem as important for exploiting market opportunities. The report said enterprises of the future will create “symbiotic relationships with startups, academia, technology providers and platform players.”

The path to achieving intelligent operations calls for considerable effort among all partners involved in the transformation.

“There is a lot of heavy lifting to be done,” Polishook said.

Researchers use AI to improve accuracy of gene editing with CRISPR

From left, Nicolo Fusi, a researcher at Microsoft, Jennifer Listgarten, who recently joined the faculty at UC Berkeley, and John Doench, an associate director at the Broad Institute, collaborated on a method of using AI to improve gene editing results. Photo by Dana J. Quigley.

A collaboration between computer scientists and biologists from research institutions across the United States is yielding a set of computational tools that increase efficiency and accuracy when deploying CRISPR, a gene-editing technology that is transforming industries from healthcare to agriculture.

CRISPR is a nano-sized sewing kit that can be designed to cut and alter DNA at a specific point in a specific gene.

The technology, for example, may lead to breakthrough applications such as modifying cells to combat cancer or produce high-yielding drought-tolerant crops such as wheat and corn.

Elevation, the newest tool released by the team, uses a branch of artificial intelligence known as machine learning to predict so-called off-target effects when editing genes with the CRISPR system.

Although CRISPR shows great promise in a number of fields, one challenge is that lots of genomic regions are similar, which means the nano-sized sewing kit can accidentally go to work on the wrong gene and cause unintended consequences – the so-called off-target effects.

“Off-target effects are something that you really want to avoid,” said Nicolo Fusi, a researcher at Microsoft’s research lab in Cambridge, Massachusetts. “You want to make sure that your experiment doesn’t mess up something else.”

Fusi and former Microsoft colleague Jennifer Listgarten, together with collaborators at the Broad Institute of MIT and Harvard, University of California Los Angeles, Massachusetts General Hospital and Harvard Medical School, describe Elevation in a paper published Jan. 10 in the journal Nature Biomedical Engineering.

Elevation and a complementary tool for predicting on-target effects called Azimuth are publicly available for free as a cloud-based end-to-end guide-design service running on Microsoft Azure as well as via open-source code.

Using the computational tools, researchers can input the name of the gene they want to modify and the cloud-based search engine will return a list of guides that researchers can sort by predicted on-target or off-target effects.

[embedded content]

Nature as engineer

The CRISPR gene-editing system is adapted from a natural virus-fighting mechanism. Scientists discovered it in the DNA of bacteria in the late 1980s and figured out how it works over the course of the next several decades.

“The CRISPR system was not designed, it evolved,” said John Doench, an associate director at the Broad Institute who leads the biological portions of the research collaboration with Microsoft.

CRISPR stands for “clustered regularly interspaced short palindromic repeats,” which describes a pattern of repeating DNA sequences in the genomes of bacteria separated by short, non-repeating spacer DNA sequences.

The non-repeating spacers are copies of DNA from invading viruses, which molecular messengers known as RNA use as a template to recognize subsequent viral invasions. When an invader is detected, the RNA guides the CRISPR complex to the virus and dispatches CRISPR-associated (Cas) proteins to snip and disable the viral gene.

Modern adaptations

In 2012, molecular biologists figured out how to adapt the bacterial virus-fighting system to edit genes in organisms ranging from plants to mice and humans. The result is the CRISPR-Cas9 gene editing technique.

The basic system works like this: Scientists design synthetic guide RNA to match a DNA sequence in the gene they want to cut or edit and set it loose in a cell with the CRISPR-associated protein scissors, Cas9.

Today, the technique is widely used as an efficient and precise way to understand the role of individual genes in everything from people to poplar trees as well as how to change genes to do everything from fight diseases to grow more food.

“If you want to understand how gene dysfunction leads to disease, for example, you need to know how the gene normally functions,” said Doench. “CRISPR has been a complete game changer for that.”

An overarching challenge for researchers is to decide what guide RNA to choose for a given experiment. Each guide is roughly 20 nucleotides; hundreds of potential guides exist for each target gene in a knockout experiment.

In general, each guide has a different on-target efficiency and a different degree of off-target activity.

The collaboration between the computer scientists and biologists is focused on building tools that help researchers search through the guide choices and find the best one for their experiments.

Several research teams have designed rules to determine where off-targets are for any given gene-editing experiment and how to avoid them. “The rules are very hand-made and very hand-tailored,” said Fusi. “We decided to tackle this problem with machine learning.”

Training models

To tackle the problem, Fusi and Listgarten trained a so-called first-layer machine-learning model on data generated by Doench and colleagues. These data reported on the activity for all possible target regions with just one nucleotide mismatch with the guide.

Then, using publicly available data that was previously generated by the team’s Harvard Medical School and Massachusetts General Hospital collaborators, the machine-learning experts trained a second-layer model that refines and generalizes the first-layer model to cases where there is more than one mismatched nucleotide.

The second-layer model is important because off-target activity can occur with far more than just one mismatch between guide and target, noted Listgarten, who joined the faculty at the University of California at Berkeley on Jan. 1.

Finally, the team validated their two-layer model on several other publicly available datasets as well as a new dataset generated by collaborators affiliated with Harvard Medical School and Massachusetts General Hospital.

Some model features are intuitive, such as a mismatch between the guide and nucleotide sequence, noted Listgarten. Others reflect unknown properties encoded in DNA that are discovered through machine learning.

“Part of the beauty of machine learning is if you give it enough things it can latch onto, it can tease these things out,” she said.

Off target scores

Elevation provides researchers with two kinds of off-target scores for every guide: individual scores for one target region and a single overall summary score for that guide.

Target scores are machine-learning based probabilities provided for every single region on the genome that something bad could happen. For every guide, Elevation returns hundreds to thousands of these off-target scores.

For researchers trying to determine which of potentially hundreds of guides to use for a given experiment, these individual off-target scores alone can be cumbersome, noted Listgarten.

The summary score is a single number that lumps the off-target scores together to provide an overview of how likely the guide is to disrupt the cell over all its potential off-targets.

“Instead of a probability for each point in the genome, it is what’s the probability I am going to mess up this cell because of all of the off-target activities of the guide?” said Listgarten.

End-to-end guide design

Writing in Nature Biomedical Engineering, the collaborators describe how Elevation works in concert with a tool they released in 2016 called Azimuth that predicts on-target effects.

The complementary tools provide researchers with an end-to-end system for designing experiments with the CRISPR-Cas9 system – helping researchers select a guide that achieves the intended effect – disabling a gene, for example – and reduce mistakes such as cutting the wrong gene.

“Our job,” said Fusi, “is to get people who work in molecular biology the best tools that we can.”

In addition to Listgarten, Fusi and Doench, project collaborators include Michael Weinstein from the University of California Los Angeles, Benjamin Kleinstiver, Keith Joung and Alexander A. Sousa from Harvard Medical School and Massachusetts General Hospital, and Melih Elibol, Luong Hoang, Jake Crawford and Kevin Gao from Microsoft Research.


John Roach writes about Microsoft research and innovation. Follow him on Twitter.

Tags: CRISPR, healthcare

Debugging data: Microsoft researchers look at ways to train AI systems to reflect the real world – The AI Blog

Photo of Microsoft researcher Hanna Walach
Hanna Wallach is a senior researcher in Microsoft’s New York City research lab. Photo by John Brecher.

Artificial intelligence is already helping people do things like type faster texts and take better pictures, and it’s increasingly being used to make even bigger decisions, such as who gets a new job and who goes to jail. That’s prompting researchers across Microsoft and throughout the machine learning community to ensure that the data used to develop AI systems reflect the real world, are safeguarded against unintended bias and handled in ways that are transparent and respectful of privacy and security.

Data is the food that fuels machine learning. It’s the representation of the world that is used to train machine learning models, explained Hanna Wallach, a senior researcher in Microsoft’s New York research lab. Wallach is a program co-chair of the Annual Conference on Neural Information Processing Systems from Dec. 4 to Dec. 9 in Long Beach, California. The conference, better known as “NIPS,” is expected to draw thousands of computer scientists from industry and academia to discuss machine learning – the branch of AI that focuses on systems that learn from data.

“We often talk about datasets as if they are these well-defined things with clear boundaries, but the reality is that as machine learning becomes more prevalent in society, datasets are increasingly taken from real-world scenarios, such as social processes, that don’t have clear boundaries,” said Wallach, who together with the other program co-chairs introduced a new subject area at NIPS on fairness, accountability and transparency. “When you are constructing or choosing a dataset, you have to ask, ‘Is this dataset representative of the population that I am trying to model?’”

Kate Crawford, a principal researcher at Microsoft’s New York research lab, calls it “the trouble with bias,” and it’s the central focus of an invited talk she will be giving at NIPS.

“The people who are collecting the datasets decide that, ‘Oh this represents what men and women do, or this represents all human actions or human faces.’ These are types of decisions that are made when we create what are called datasets,” she said. “What is interesting about training datasets is that they will always bear the marks of history, that history will be human, and it will always have the same kind of frailties and biases that humans have.”

Researchers are also looking at the separate but related issue of whether there is enough diversity among AI researchers. Research has shown that more diverse teams choose more diverse problems to work on and produce more innovative solutions. Two events co-located with NIPS will address this issue: The 12thWomen in Machine Learning Workshop, where Wallach, who co-founded Women in Machine Learning, will give an invited talk on the merger of machine learning with the social sciences, and the Black in AI workshop, which was co-founded by Timnit Gebru, a post-doctoral researcher at Microsoft’s New York lab.

“In some types of scientific disciplines, it doesn’t matter who finds the truth, there is just a particular truth to be found. AI is not exactly like that,” said Gebru. “We define what kinds of problems we want to solve as researchers. If we don’t have diversity in our set of researchers, we are at risk of solving a narrow set of problems that a few homogeneous groups of people think are important, and we are at risk of not addressing the problems that are faced by many people in the world.”

Timnit Gebru is a post-doctoral researcher at Microsoft’s New York City research lab. Photo by Peter DaSilva.

Machine learning core

At its core, NIPS is an academic conference with hundreds of papers that describe the development of machine learning models and the data used to train them.

Microsoft researchers authored or co-authored 43 accepted conference papers. They describe everything from the latest advances in retrieving data stored in synthetic DNA to a method for repeatedly collecting telemetry data from user devices without compromising user privacy.

Nearly every paper presented at NIPS over the past three decades considers data in some way, noted Wallach. “The difference in recent years, though,” she added, “is that machine learning no longer exists in a purely academic context, where people use synthetic or standard datasets. Rather, it’s something that affects all kinds of aspects of our lives.”

The application of machine-learning models to real-world problems and challenges is, in turn, bringing into focus issues of fairness, accountability and transparency.

“People are becoming more aware of the influence that algorithms have on their lives, determining everything from what news they read to what products they buy to whether or not they get a loan. It’s natural that as people become more aware, they grow more concerned about what these algorithms are actually doing and where they get their data,” said Jenn Wortman Vaughan, a senior researcher at Microsoft’s New York lab.

The trouble with bias

Data is not something that exists in the world as an object that everyone can see and recognize, explained Crawford. Rather, data is made. When scientists first began to catalog the history of the natural world, they recognized types of information as data, she noted. Today, scientists also see data as a construct of human history.

Crawford’s invited talk at NIPS will highlight examples of machine learning bias such as news organization ProPublica’s investigation that exposed bias against African-Americans in an algorithm used by courts and law enforcement to predict the tendency of convicted criminals to reoffend, and then discuss how to address such bias.

“We can’t simply boost a signal or tweak a convolutional neural network to resolve this issue,” she said. “We need to have a deeper sense of what is the history of structural inequity and bias in these systems.”

One method to address bias, according to Crawford, is to take what she calls a social system analysis approach to the conception, design, deployment and regulation of AI systems to think through all the possible effects of AI systems. She recently described the approach in a commentary for the journal Nature.

Crawford noted that this isn’t a challenge that computer scientists will solve alone. She is also a co-founder of the AI Now Institute, a first-of-its-kind interdisciplinary research institute based at New York University that was launched in November to bring together social scientists, computer scientists, lawyers, economists and engineers to study the social implications of AI, machine learning and algorithmic decision making.

Jenn Wortman Vaughan is a senior researcher at Microsoft’s New York City research lab. Photo by John Brecher.

Interpretable machine learning

One way to address concerns about AI and machine learning is to prioritize transparency by making AI systems easier for humans to interpret. At NIPS, Vaughan, one of the New York lab’s researchers, will give a talk describing a large-scale experiment that she and colleagues are running to learn what factors make machine learning models interpretable and understandable for non-machine learning experts.

“The idea here is to add more transparency to algorithmic predictions so that decision makers understand why a particular prediction is made,” said Vaughan.

For example, does the number of features or inputs to a model impact a person’s ability to catch instances where the model makes a mistake? Do people trust a model more when they can see how a model makes its prediction as opposed to when the model is a black box?

The research, said Vaughan, is a first step toward the development of “tools aimed at helping decision makers understand the data used to train their models and the inherent uncertainty in their models’ predictions.”

Patrice Simard, a distinguished engineer at Microsoft’s Redmond, Washington, research lab who is a co-organizer of the symposium, said the field of interpretable machine learning should take a cue from computer programming, where the art of decomposing problems into smaller problems with simple, understandable steps has been learned. “But in machine learning, we are completely behind. We don’t have the infrastructure,” he said.

To catch up, Simard advocates a shift to what he calls machine teaching – giving machines features to look for when solving a problem, rather than looking for patterns in mountains of data. Instead of training a machine learning model for car buying with millions of images of cars labeled as good or bad, teach a model about features such as fuel economy and crash-test safety, he explained.

The teaching strategy is deliberate, he added, and results in an interpretable hierarchy of concepts used to train machine learning models.

Researcher diversity

One step to safeguard against unintended bias creeping into AI systems is to encourage diversity in the field, noted Gebru, the co-organizer of the Black in AI workshop co-located with NIPS. “You want to make sure that the knowledge that people have of AI training is distributed around the world and across genders and ethnicities,” she said.

The importance of researcher diversity struck Wallach, the NIPS program co-chair, at her fourth NIPS conference in 2005. For the first time, she was sharing a hotel room with three roommates, all of them women. One of them was Vaughan, and the two of them, along with one of their roommates, co-founded the Women in Machine Learning group, which is now in its 12th year and has held a workshop co-located with NIPS since 2008. This year, more than 650 women are expected to attend.

Wallach will give an invited talk at the Women in Machine Learning Workshop about how she applies machine learning in the context of social science to measure unobservable theoretical constructs such as community membership or topics of discussion.

“Whenever you are working with data that is situated within society contexts,” she said, “necessarily it is important to think about questions of ethics, fairness, accountability, transparency and privacy.”


John Roach writes about Microsoft research and innovation. Follow him on Twitter.

Google bug bounty pays $100,000 for Chrome OS exploit

A pseudonymous security researcher has struck it big for the second time, earning the top Google bug bounty in the Chrome Reward Program.

The researcher, who goes by the handle Gzob Qq, notified Google of a Chrome OS exploit on Sept. 18, 2017 that took advantage of five separate vulnerabilities in order to gain root access for persistent code execution.

Google patched the issues in Chrome OS version 62, which was released on Nov. 15th. The details of the exploit chain were then released, showing Gzob Qq used five flaws to complete the system takeover.

As part of the exploit chain, Gzob Qq used a memory access flaw in the V8 JavaScript engine (CVE-2017-15401), a privilege escalation bug in PageState (CVE-2017-15402), a command injection flaw in the network_diag component (CVE-2017-15403) and symlink traversal issues in both the crash_reporter (CVE-2017-15404) and cryptohomed (CVE-2017-15405).

Gzob Qq earned a Google bug bounty of $100,000 for the find, which is the top prize awarded as part of the Chrome Reward Program. Google first increased the Chrome bug bounty reward from $50,000 to $100,000 in March 2015 and since then, this is the second time Gzob Qq has earned that prize.

In September 2016, Gzob Qq notified Google of a Chrome OS exploit chain using an overflow vulnerability in the DNS client library used by the Chrome OS network manager.

In addition to the Google bug bounty, Gzob Qq has also received credit for disclosing flaws in Ubuntu Linux.