Tag Archives: research

WannaMine cryptojacker targets unpatched EternalBlue flaw

New research detailed successful cryptojacking attacks by WannaMine malware after almost one year of warnings about this specific cryptominer and more than a year and a half  of warnings about the EternalBlue exploit.

The Cybereason Nocturnus research team and Amit Serper, head of security research for the Boston-based cybersecurity company, discovered a new outbreak of the WannaMine cryptojacker, which the researchers said gains access to computer systems “through an unpatched [Server Message Block, or SMB] service and gains code execution with high privileges” to spread to more systems.

Serper noted in a blog post that neither WannaMine nor the EternalBlue exploit are new, but they are still taking advantage of those unpatched SMB services, even though Microsoft patched against EternalBlue in March 2017.

“Until organizations patch and update their computers, they’ll continue to see attackers use these exploits for a simple reason: they lead to successful campaigns,” Serper wrote in the blog post. “Part of giving the defenders an advantage means making the attacker’s job more difficult by taking steps to boost an organization’s security. Patching vulnerabilities, especially the ones associated with EternalBlue, falls into this category.”

It is fair to say that any unpatched system with SMB exposed to the internet has been compromised repeatedly and is definitely infected with one or more forms of malware.
Jake Williamsfounder and CEO, Rendition Infosec

The EternalBlue exploit was famously part of the Shadow Brokers dump of National Security Agency cyberweapons in April 2017; less than one month later, the WannaCry ransomware was sweeping the globe and infecting unpatched systems. However, that was only the beginning for EternalBlue.

EternalBlue was added into other ransomware, like GandCrab, to help it spread faster. It was morphed into Petya. And there were constant warnings for IT to patch vulnerable systems.

WannaMine was first spotted in October 2017 by Panda Security. And in January 2018, Sophos warned users that WannaMine was still active and preying on unpatched systems. According to researchers at ESET, the EternalBlue exploit saw a spike in use in April 2018.

Jake Williams, founder and CEO of Rendition Infosec, based in Augusta, Ga., said there are many ways threat actors may use EternalBlue in attacks.

“It is fair to say that any unpatched system with SMB exposed to the internet has been compromised repeatedly and is definitely infected with one or more forms of malware,” Williams wrote via Twitter direct message. “Cryptojackers are certainly one risk for these systems. These systems don’t have much power for crypto-mining (most lack dedicated GPUs), but when compromised en-masse they can generate some profit for the attacker. More concerning in some cases are the use of these systems for malware command and control servers and launching points for other attacks.”

Putting the cloud under the sea with Ben Cutler – Microsoft Research

ben cutler podcast

Ben Cutler from Microsoft Research. Photo by Maryatt Photography.

Episode 40, September 5, 2018

Data centers have a hard time keeping their cool. Literally. And with more and more data centers coming online all over the world, calls for innovative solutions to “cool the cloud” are getting loud. So, Ben Cutler and the Special Projects team at Microsoft Research decided to try to beat the heat by using one of the best natural venues for cooling off on the planet: the ocean. That led to Project Natick, Microsoft’s prototype plan to deploy a new class of eco-friendly data centers, under water, at scale, anywhere in the world, from decision to power-on, in 90 days. Because, presumably for Special Projects, go big or go home.

In today’s podcast we find out a bit about what else the Special Projects team is up to, and then we hear all about Project Natick and how Ben and his team conceived of, and delivered on, a novel idea to deal with the increasing challenges of keeping data centers cool, safe, green, and, now, dry as well!

Related:


Episode Transcript

Ben Cutler: In some sense we’re not really solving new problems. What we really have here is a marriage of these two mature industries. One is the IT industry, which Microsoft understands very well. And then the other is a marine technologies industry. So, we’re really trying to figure out how do we blend these things together in a way that creates something new and beneficial?

(music plays)

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: Data centers have a hard time keeping their cool. Literally. And with more and more data centers coming online all over the world, calls for innovative solutions to “cool the cloud” are getting loud. So, Ben Cutler and the Special Projects team at Microsoft Research decided to try to beat the heat by using one of the best natural venues for cooling off on the planet: the ocean. That led to Project Natick, Microsoft’s prototype plan to deploy a new class of eco-friendly data centers, under water, at scale, anywhere in the world, from decision to power-on, in 90 days. Because, presumably for Special Projects, go big or go home.

In today’s podcast we find out a bit about what else the Special Projects team is up to, and then we hear all about Project Natick, and how Ben and his team conceived of, and delivered on, a novel idea to deal with the increasing challenges of keeping data centers cool, safe, green, and, now, dry as well! That and much more on this episode of the Microsoft Research Podcast.

Host: Ben Cutler. Welcome to the podcast.

Ben Cutler: Thanks for having me.

Host: You’re a researcher in Special Projects at MSR. Give us a brief description of the work you do. In broad strokes, what gets you up in the morning?

Ben Cutler: Well, so I think Special Projects is a little unusual. Rather than have a group that always does the same thing persistently, it’s more based on this idea of projects. We find some new idea, something, in our case, that we think is materially important to the company, and go off and pursue it. And it’s a little different in that we aren’t limited by the capabilities of the current staff. We’ll actually go out and find partners, whether they be in academia or very often in industry, who can kind of help us grow and stretch in some new direction.

Host: How did Special Projects come about? Has it always been “a thing” within Microsoft Research, or is it a fairly new idea?

Ben Cutler: Special Projects is a relatively new idea. In early 2014, my manager, Norm Whitaker, who’s a managing scientist inside Microsoft Research was recruited to come here. Norm had spent the last few years of his career at DARPA, which is Defense Advanced Research Projects Agency, which has a very long history in the United States, and a lot of the seminal technology achieved is not just on the defense side, where we see things like stealth, but also on the commercial or consumer side had their origins in DARPA. And so, we’re trying to bring some of that culture here into Microsoft Research and a willingness to go out and pursue crazy things and a willingness not just to pursue new types of things, but things that are in areas that historically we would never have touched as a company, and just be willing to crash into some new thing and see if it has value for us.

Host: So, that seems like a bit of a shift from Microsoft, in general, to go in this direction. What do you think prompted it, within Microsoft Research to say, “Hey let’s do something similar to DARPA here?”

Ben Cutler: I think if you look more broadly at the company, with Satya, we have this very different perspective, right? Which is, not everything is based on what we’ve done before. And a willingness to really go out there and draw in things from outside Microsoft and new ideas and new concepts in ways that we’ve never done, I think, historically as a company. And this is in some sense a manifestation of this idea of, you know, what can we do to enable every person in every organization on the planet to achieve more? And a part of that is to go out there and look at the broader context of things and what kind of things can we do that might be new that might help solve problems for our customers?

Host: You’re working on at least two really cool projects right now, one of which was recently in the news and we’ll talk about that in a minute. But I’m intrigued by the work you’re doing in holoportation. Can you tell us more about that?

Ben Cutler: If you think about what we typically do with a camera, we’re capturing this two-dimensional information. One stage beyond that is what’s called a depth camera, which is, in addition to capturing color information, it captured the distance to each pixel. So now I’m getting a perspective and I can actually see the distance and see, for example, the shape of someone’s face. Holoportation takes that a step further where we’ll have a room that we outfit with, say, several cameras. And from that, now, I can reconstruct the full, 3-D content of the room. So, you can kind of think of this as, I’m building a holodeck. And so now you can imagine I’m doing a video conference, or, you know, something as simple as like Facetime, but rather than just sort of getting that 2-D, planar information, I can actually now wear a headset and be in some immersive space that might be two identical conferences rooms in two different locations and I see my local content, but I also see the remote content as holograms. And then of course we can think of other contexts like virtual environments, where we kind of share across different spaces, people in different locations. Or even, if you will, a broadcast version of this. So, you can imagine someone’s giving a concert. And now I can actually go be at that concert even if I’m not there. Or think about fashion. Imagine going to a fashion show and actually being able to sit in the front row even though I’m not there. Or, everybody gets the front row seats at the World Cup soccer.

Host: Wow. It’s democratizing event attendance.

Ben Cutler: It really is. And you can imagine I’m visiting the Colosseum and a virtual tour guide appears with me as I go through it and can tell me all about that. Or some, you know, awesome event happens at the World Cup again, and I want to actually be on the soccer field where that’s happening right now and be able to sort of review what happened to the action as though I was actually there rather than whatever I’m getting on television.

Host: So, you’re wearing a headset for this though, right?

Ben Cutler: You’d be wearing an AR headset. For some of the broadcast things you can imagine not wearing a headset. It might be I’ve got it on my phone and just by moving my phone around I can kind of change my perspective. So, there’s a bunch of different ways that this might be used. So, it’s this interesting new capture technology. Much as HoloLens is a display, or a viewing technology, this is the other end, capture, and there’s different ways we can kind of consume that content. One might be with a headset, the other might just be on a PC using a mouse to move around much as I would on a video game to change my perspective or just on a cell phone, because today, there’s a relatively small number of these AR/VR headsets but there are billions of cell phones.

Host: Right. Tell me what you’re specifically doing in this project?

Ben Cutler: In the holoportation?

Host: Yeah.

Ben Cutler: So, really what’s going on right now is, when this project first started to outfit a room, to do this sort of a thing, might’ve been a couple hundred thousand dollars of cost, and it might be 1 to 3 gigabits of data between sites. So, it’s just not really practical, even at an enterprise level. And so, what we’re working on is, with the HoloLens team and other groups inside the company, to really sort of dramatically bring down that cost. So now you can imagine you’re a grandparent and you want to kind of play with your grandkids who are in some other location in the world. So, this is something that we think, in the next couple years, actually might be at the level the consumers can have access to this technology and use it every day.

Host: This is very much in the research stage, though, right?

Ben Cutler: We have an email address and we hear from people every day, “How do I buy this? How can I get this?” And you know, it’s like, “Hey, here’s our website. It’s just research right now. It’s not available outside the company. But keep an eye on this because maybe that will change in the future.”

Host: Yeah. Yeah, and that is your kind of raison d’etre is to bring these impossibles into inevitables in the market. That should be a movie. The Inevitables.

Ben Cutler: I think there’s something similar to that, but anyway…

Host: I think a little, yeah. So just drilling a little bit on the holoportation, what’s really cool I noticed on the website, which is still research, is moving from a room-based hologram, or holoported individual, into mobile holoportation. And you’ve recently done this, at least in prototype, in a car, yes?

Ben Cutler: We have. So, we actually took an SUV. We took out the middle seat. And then we mounted cameras in various locations. Including, actually, the headrests of the first-row passengers. So that if you’re sitting in that back row we could holoport you somewhere. Now this is a little different than, say, that room-to-room scenario. You can imagine, for example, the CEO of our company can’t make a meeting in person, so he’ll take it from the car. And so, the people who are sitting in that conference room will wear an AR headset like a HoloLens. And then Satya would appear in that room as though he’s actually there. And then from Satya’s perspective, he’d wear a VR headset, right? So, he would not be sitting in his car anymore. He would be holoported into that conference room.

(music plays)

Host: Let’s talk about the other big project you’re doing: Project Natick. You basically gave yourself a crazy list of demands and then said, “Hey, let’s see if we can do it!” Tell us about Project Natick. Give us an overview. What it is, how did it come about, where it is now, what does it want to be when it grows up?

Ben Cutler: So, Project Natick is an exploration of manufactured data centers that we place underwater in the ocean. And so, the genesis of this is kind of interesting, because it also shows not just research trying to influence the rest of the company, but that if you’re working elsewhere inside Microsoft, you can influence Microsoft Research. So, in this case, go back to 2013, and a couple employees, Sean James and Todd Rawlings, wrote this paper that said we should put data centers in the ocean and the core idea was, the ocean is a place where you can get good cooling, and so maybe we should look at that for data centers. Historically, when you look at data centers, the dominant cost, besides the actual computers doing the work, is the air conditioning. And so, we have this ratio in the industry called PUE, or Power Utilization Effectiveness. And if you go back a long time ago to data centers, PUEs might be as high as 4 or 5. A PUE of 5 says that, for every watt of power for computers, there’s an additional 4 watts for the air conditioning, which is just kind of this crazy, crazy thing. And so, industry went through this phase where we said, “OK, now we’re going to do this thing called hot aisle/cold aisle. We line up all the computers in a row, and cold air comes in one side and hot air goes out the other.” Now, modern data centers that Microsoft builds have a PUE of about 1.125. And the PUE we see of what we have right now in the water is about 1.07. So, we have cut the cooling cost. But more importantly we’ve done it in a way that we’ve made the data center much colder. So, we’re about 10-degrees Celsius cooler than land data centers. And we’ve known, going back to the middle of the 20th century, that higher temperatures are a problem for components and in fact, a factor of 10-degree Celsius difference can be a factor of 2 difference of the life expectancy of equipment. So, we think that this is one way to bring reliability up a lot. So, this idea of reliability is really a proxy for server longevity and how do we make things last longer? In addition to cooling, there’s other things that we have here. One of which is the atmosphere inside this data center is dry nitrogen atmosphere. So, there’s no oxygen. And the humidity is low. And we think that helps get rid of corrosion. And then the other thing is, data centers we get stuff comes from outside. So, by having this sealed container, safe under the ocean we hopefully have this environment that will allow servers to last much longer.

Host: How did data center technology and submarine technology come together so that you could put the cloud under water?

Ben Cutler: Natick is a little bit unusual as a research project because in some sense we’re not really solving new problems. What we really have here is a marriage of these two mature industries. One is the IT industry, which Microsoft understands very well. And then the other is a marine technologies industry. So, we’re really trying to figure out, how do we blend these things together in a way that creates something new and beneficial?

Host: And so, the submarine technology, making something watertight and drawing on the decades that people have done underwater things, how did you bring that together? Did you have a team of naval experts…?

Ben Cutler: So, the first time we did this, we just, sort of, crashed into it, and we, literally, just built this can and we just kind of dropped it in the water, and ok, we can do this, it kind of works. And so, then the second time around, we put out what we call a Request for Information. We’re thinking of doing this thing, and we did this to government and to academia and to industry, and just to see who’s interested in playing this space? What do they think about it? What kind of approaches would they take? And you know, we’re Microsoft. We don’t really know anything about the ocean. We’ve identified a bunch of folks we think do know about it. And on the industry side we really looked at three different groups. We looked to ship builders, we looked to people who were doing renewable energy in the ocean, which we should come back to that, and then we looked to oil and gas services industry. And so, we got their response and on the basis of that, we then crafted a Request for Proposal to actually go off and do something with us. And that identified what kind of equipment we put inside it, what our requirements were in terms of how we thought that this would work, how cool it had to be, the operating environment that needed to be provided for the servers, and also some more mundane stuff like, when you’re shipping it, what’s the maximum temperature things can get to when it’s like, sitting in the sun on a dock somewhere? And, on the basis of that, we got a couple dozen proposals from four different continents. And so, we chose a partner and then set forward. And so, in part, we were working with University of Washington Applied Physics Lab… is one of three centers of excellence for ocean sciences in the United States, along with Woods Hole and Scripps. And so, we leveraged that capability to help us go through the selection process. And then the company we chose to work with is a company called Naval Group, which is a French company, and among other things, they do naval nuclear submarines, surface ships, but they also do renewable energies. And, in particular, renewable energies in the ocean, so offshore wind, they do tidal energy which is to say, gaining energy from the motion of the tides, as well as something called OTEC which is Ocean Thermal Energy Conversion. So, they have a lot of expertise in renewable energy. Which is very interesting to us. Because another aspect of this that we like is this idea of co-location with offshore renewable energies. So, the idea is, rather than connecting to the grid, I might connect to renewable energies that get placed in the same location where we put this. That’s actually not a new idea for Microsoft. We have data centers that are built near hydroelectric dams or built near windfarms in Texas. So, we like this idea of renewable energy. And so, as we think about this idea of data centers in the ocean, it’s kind of a normal thing, in some sense, that this idea of the renewables would go with us.

Host: You mentioned the groups that you reached out to. Did you have any conversation with environmental groups or how this might impact sea life or the ocean itself?

Ben Cutler: So, we care a lot about that. We like the idea of co-location with the offshore renewables, not just for the sustainability aspects of this, but also for the fact that a lot of those things are going up near large populations centers. So, it’s a way to get close to customers. We’re also interested in other aspects of sustainability. And those include things like artificial reefs. We’ve actually filed an application for a patent having to use this idea of undersea data centers, potentially, as artificial reefs.

Host: So, as you look to maybe, scaling up… Say this thing, in your 5-year experiment, does really well. And you say, “Hey, we’re going to deploy more of these.” Are you looking, then, with the sustainability goggles on, so to speak, for Natick staying green both for customers but also for the environment itself?

Ben Cutler: We are. And I think one thing people should understand too, is you look out at the ocean and it looks like this big, vast open space, but in reality, it’s actually very carefully regulated. So anywhere we go, there are always authorities and rules as to what you can do and how you do them, so there’s that oversight. And there’s also things that we look at directly, ourselves. One of the things that we like about these, is from a recyclability standpoint, it’s a pretty simple structure. Every five years, we bring that thing back to shore, we put a new set of servers in, refresh it, send it back down, and then when we’re all done we bring it back up, we recycle it, and the idea is you leave the seabed as you found it. On the government side, there’s a lot of oversight, and so, the first thing to understand is, typically, like, as I look at the data center that’s there now, the seawater that we eject back into the ocean is about 8/10 of a degree warmer, Celsius, than the water that came in. It’s a very rapid jet, so, it very quickly mixes with the other seawater. And in our case, the first time we did this, a few meters downstream it was a few thousandths of a degree warmer by the time we were that far downstream.

Host: So, it dissipates very quickly.

Ben Cutler: Water… it takes an immense amount of energy to heat it. If you looked at all of the energy generated by all the data centers in the world and pushed all of them at the ocean, per year you’d raise the temperature a few millionths of a degree. So, in net, we don’t really worry about it. The place that we worry about it is this idea of local warming. And so, one of the things that’s nice about the ocean is because there are these persistent currents, we don’t have buildup of temperature anywhere. So, this question of the local heating, it’s really just, sort of, make sure your density is modest and then the impact is really negligible. An efficient data center in the water actually has less impact on the oceans than an inefficient data center on land does.

Host: Let’s talk about latency for a second. One of your big drivers in putting these in the water, but near population centers, is so that data moves fairly quickly. Talk about the general problems of latency with data centers and how Natick is different.

Ben Cutler: So, there are some things that you do where latency really doesn’t matter. But I think latency gets you in all sorts of ways, and in sometimes surprising ways. The thing to remember is, even if you’re just browsing the web, when a webpage gets painted, there’s all of this back-and-forth traffic. And so, ok, so I’ve got now a data center that’s, say, 1,000 kilometers away, so it’s going to be 10 milliseconds, roundtrip, per each communication. But I might have a couple hundred of those just to paint one webpage. And now all of a sudden it takes me like 2 seconds to paint that webpage. Whereas it would be almost instantaneous if that data center is nearby. And think about, also, I’ve got factories and automation and I’ve got to control things. I need really tight controls there in terms of the latency in order to do that effectively. Or imagine a future where autonomous vehicles become real and they’re interacting with data centers for some aspect of their navigation or other critical functions. So, this notion of latency really matters in a lot of ways that will become, I think, more present as this idea of intelligent edge grows over time.

Host: Right. And so, what’s Natick’s position there?

Ben Cutler: So, Natick’s benefit here, is more than half the world’s population lives within a couple hundred kilometers of the ocean. And so, in some sense, you’re finding a way to put data centers very close to a good percentage of the population. And you’re doing it in a way that’s very low impact. We’re not taking land because think about if I want to put a data center in San Francisco or New York City. Well turns out, land’s expensive around big cities. Imagine that. So, this is a way to go somewhere where we don’t have some of those high costs. And, potentially, with this offshore renewable energy, and not, as we talked about before, having any impact on the water supply.

Host: So, it could solve a lot of problems all at once.

Ben Cutler: It could solve a lot of problems in this very, sort of, environmentally sustainable way, as well as, in some sense, adding these socially sustainable factors as well.

Host: Yeah. Talk a little bit about the phases of this project. I know there’s been more than one. You alluded to that a little bit earlier. But what have you done stage wise, phase wise? What have you learned?

Ben Cutler: So, Phase 1 was a Proof of Concept, which is literally, we built a can, and that can had a single computer rack in it, and that rack only had 24 servers. And that was about one-third of the space of the rack. It was a standard, what we call, 42U rack, which reflects the size of the rack. Fairly standard for data centers. And then other two thirds were filled with what we call load trays. Think of them as, all they do is, they’ve got big resistors that generate heat. So, it’s like hairdryers. And so, they’re used, actually, today in data centers to just, sort of, commission new data centers. Test the cooling system, actually. In our case, we just wanted to generate heat. Could we put these things in the water? Could we cool it? What would that look like? What would be the thermal properties? So, that was a Proof of Concept just to see, could we do this? Could we just, sort of, understand the basics? Were our intuitions right about this? What sort of problems might we encounter? And just, you know, I hate to use… but, you know, get our feet wet. Learning how to interact…

Host: You had to go there.

Ben Cutler: It is astonishing the number of expressions that relate to water that we use.

Host: Oh gosh, the puns are…

Ben Cutler: It’s tough to avoid. So, we just really wanted to get some sense of, what it like was to work with the marine industry? Every company and, to some degree, industry, has ways in which they work. And so, this was really an opportunity for us to learn some of those and become informed, before we go to this next stage that we’re at now. Which is more as a prototype stage. So, this vessel that we built this time, is about the size of a shipping container. And that’s by intent. Because then we’ve got something that’s of a size that we can use standard logistics to ship things around. Whether the back of a truck, or on a container ship. Again, keeping with this idea of, if something like this is successful, we have to think about what are the economics of this? So, it’s got 12 racks this time. It’s got 864 servers. It’s got FPGAs, which is something that we use for certain types of acceleration. And then, each of those 864 servers has 32 terabytes of disks. So, this is a substantial amount of capability. It’s actually located in the open ocean in realistic operating conditions. And in fact, where we are, in the winter, the waves will be up to 10 meters. We’re at 36 meters depth. So that means the water above us will vary between 26 and 46 meters deep. And so, it’s a really robust test area. So, we want to understand, can this really work? And what, sort of, the challenges might be in this realistic operating environment.

Host: So, this is Phase 2 right now.

Ben Cutler: This is Phase 2. And so now we’re in the process of learning and collecting data from this. And just going through the process of designing and building this, we learned all sorts of interesting things. And so, turns out, when you’re building these things to go under the ocean, one of the cycling that you get is just from the waves going by. And so, as you design these things, you have to think about how many waves go by this thing over the lifetime? What’s the frequency of those waves? What’s the amplitude of those waves? And this all impacts your design, and what you need to do, based on where you’re going to put it and how long it will be. So, we learned a whole bunch of stuff from this. And we expect everything will all be great and grand over the next few years here. But we’ll obviously be watching, and we’ll be learning. If there is a next phase, it would be a pilot. And now we’re talking to build something that’s larger scale. So, it might be multiple vessels. There might be a different deployment technology than what we used this time, to get greater efficiency. So, I think those are things that, you know, we’re starting to think about, but mostly, right now, we’ve got this great thing in the water and we’re starting to learn.

Host: Yeah. And you’re going to leave it alone for 5 years, right?

Ben Cutler: This thing will just be down there. Nothing will happen to it. There will be no maintenance until it’s time to retire the servers, which, in a commercial setting, might be every 5 years or longer. And then we’ll bring it back. So, it really is the idea of a lights-out thing. You put it there. It just does its thing and then we go and pull it back later. In an actual commercial deployment, we’d probably be deeper than 36 meters. The reason we’re at 36 meters, is, it turns out, 40 meters is a safe distance for human divers to go without a whole lot of special equipment. And we just wanted that flexibility in case we did need some sort of maintenance or some sort of help during this time. But in a real commercial deployment, we’d go deeper, and one of the reasons for that, also, is just, it will be harder for people to get to it. So, people worry about physical security. We, in some sense, have a simpler challenge than a submarine because a submarine is typically trying to hide from its adversaries. We’re not trying to hide. If we deploy these things, we’d always be within the coastal waters of a country and governed by the laws of that country. But we do also think about, let’s make this thing safe. And so, one of the safety aspects is not just the ability to detect when things are going around you, but also to put it in a place where it’s not easy for people to go and mess with it.

Host: Who’s using this right now? I mean this is an actual test case, so, it’s a data center that somebody’s accessing. Is it an internal data center or what’s the deal on that?

Ben Cutler: So, this data center is actually on our global network. Right now, it’s being used by people internally. We have a number of different teams that are using it for their own production projects. One group that’s working with it, is we have an organization inside Microsoft called AI for Earth. We have video cameras, and so, one of the things that they do is, they’re watching the different fish going by, and other types of much more bizarre creatures that we see. And characterizing and counting those, and so we can kind of see how things evolve over time. And one of the things we’re looking to do, potentially, is to work with other parties that do these more general assessments and then provide some of those AI technologies to them for their general research of marine environment and how, when you put different things in the water, how that affects things, either positively or negatively. Not just, sort of, what we’re doing, but other types of things that go in the water which might be things as simple as cables or marine energy devices or other types of infrastructure.

Host: I would imagine, when you deploy something in a brand-new environment, that you have unintended consequences or unexpected results. Is there anything interesting that’s come out of this deployment that you’d like to share?

Ben Cutler: So, I think when people think of the ocean, they think this is like a really hostile and dangerous place to put things. Because we’re all used to seeing big storms, hurricanes and everything that happens. And to be sure, right at that interface between land and water is a really dangerous place to be. But what you find is that, deep under the waves on the seabed, is a pretty quiet and calm place. And so, one of the benefits that we see out of this, is that even for things like 100-year hurricanes, you will hear, acoustically, what’s going on, on the surface, or near the land… waves crashing and all this stuff going on. But it’s pretty calm down there. The idea that we have this thing deep under the water that would be immune to these types of things is appealing. So, you can imagine this data center down there. This thing hits. The only connectivity back to land is going to be fiber. And that fiber is largely glass, with some insulating shell, so it might be fuse so it will break off. But the data center will keep operating. Your data center will still be safe, even though there might be problems on land. So, this diversity of risk is another thing that’s interesting to people when we talk about Natick.

Host: What about deployment sites? How have you gone about selecting where you put Project Natick and what do you think about other possibilities in the future?

Ben Cutler: So, for this Phase 2, we’re in Europe. And Europe, today, is the leader in offshore renewable energies. Twenty-nine of the thirty largest offshore windfarms are located in Europe. We’re deployed at the European Marine Energy Center in the Orkney Islands of Scotland. The grid up there is 100% renewable energy. It’s a mix of solar and wind as well as these offshore energies that people are testing at the European Marine Energy Center or EMEC. So, tidal energy and wave energy. One of the things that’s nice about EMEC is people are testing these devices. So, in the future, we have the option to go completely off this grid. It’s 100% renewable grid, but we can go off and directly connect to one of those devices and test out this idea of a co-location with renewable energies.

Host: Did you look at other sites and say, hey, this one’s the best?

Ben Cutler: We looked at a number of sites. Both test sites for these offshore renewables as well as commercial sites. For example, go into a commercial windfarm right off the bat. And we just decided, at this research phase, we had better support and better capabilities in a site that was actually designed for that. One of the things is, as I might have mentioned, the waves there get very, very large in the winter. So, we wanted some place that had very aggressive waters so that we know that if we survive in this space that we’ll be good pretty much anywhere we might choose to deploy.

Host: Like New York. If you can make it there…

Ben Cutler: Like New York, exactly.

Host: You can make it anywhere.

Ben Cutler: That’s right.

(music plays)

Host: what was your path to Microsoft Research?

Ben Cutler: So, my career… I would say that there’s been very little commonality in what I’ve done. But the one thing that has been common is this idea of taking things from early innovation to market introduction. So, a lot of my early career was in startup companies, either as a founder or as a principle. I was in super computers, computer storage, video conferencing, different types of semiconductors, and then I was actually here at Microsoft earlier, and I was working in a group exploring new operating system technologies. And then, after that, I went to DARPA, where I was there for a few years working on different types of information technology. And then I came back here. And, truthfully, when I first heard about this idea that they were thinking about doing these underwater data centers, it just sounded like the dumbest idea to me, and… But you know, I was willing to go and then, sort of, try and think through, ok, on the surface it sounds ridiculous. But a lot of things start that way. And you have to be willing to go in, understand the economics, understand the science and the technology involved, and then draw some conclusion of whether you think that can actually go somewhere reasonable.

Host: As we close, Ben, I’m really interested in what kinds of people you have on your team, what kinds of people might be interested in working on Special Projects here. Who’s a good fit for a Special Projects research career?

Ben Cutler: I think we’re looking for people who are excited about the idea of doing something new and don’t have fear of doing something new. In some sense, it’s a lot like people who’d go into a startup. And what I mean by that is, you’re taking a lot more risk, because I’m not in in a large organization, I have to figure out a lot of things out myself, I don’t have a team that will know all these things, and a lot of things may fall on the floor just because we don’t have enough people do get everything done. It’s kind of like driving down the highway and you’re, you know, lashed to the front bumper of the car. You’re fully exposed to all the risk and all the challenges of what you’re doing. And you’re, you know, wide open. There’s no end of things to do and you have to figure out what’s important, what to prioritize, because not everything can get done. But have the flexibility to really, then, understand that even though I can’t get everything done, I’m going to pick and choose the things that are most important and really drive in new directions without a whole lot of constraints on what you’re doing. So, I think that’s kind of what we look to. I have only two people who actually directly report to me on this project. That’s the team. But then I have other people who are core members, who worked on it, who report to other people, and then across the whole company, more than two hundred people touched this Phase 2, in ways large and small. Everything from helping us design the data center, to people who refurbished servers that went into this. So, it’s really a “One Microsoft” effort. And so, I think that there’s always opportunities to engage, not just by being on a team, but interacting and providing your expertise and your knowledge base to help us be successful. Because only in that way that we can take these big leaps. And so, in some sense, we’re trying to make sure that Microsoft Research is really staying true to this idea of pursuing new things but not just five years out, in known fields, but look at these new fields. Because the world is changing. And so, we’re always looking for people who are open to these new ideas and frankly are willing to bring new ideas with them as to where they think we should go and why. And that’s how we as a company I think grow and see new markets and are successful.

(music plays)

Host: Ben Cutler, it’s been a pleasure. Thanks for coming on the podcast today.

Ben Cutler: My pleasure as well.

To learn more about Ben Cutler, Project Natick, and the future of submersible data centers, visit natick.research.microsoft.com.

Skip User Research Unless You’re Doing It Right — Seriously


Skip User Research Unless You’re Doing It Right — Seriously

Is your research timeless? It’s time to put disposable research behind us

Focus on creating timeless research. (Photo: Aron on Unsplash)

We need to ship soon. How quickly can you get us user feedback?”

What user researcher hasn’t heard a question like that? We implement new tools and leaner processes, but try as we might, we inevitably meet the terminal velocity of our user research — the point at which it cannot be completed any faster while still maintaining its rigor and validity.

And, you know what? That’s okay! While the need for speed is valuable in some contexts, we also realize that if an insight we uncover is only useful in one place and at one time, it becomes disposable. Our goal should never be disposable research. We want timeless research.

Speed has its place

Now, don’t get me wrong. I get it. I live in this world, too. First to market, first to patent, first to copyright obviously requires an awareness of speed. Speed of delivery can also be the actual mechanism by which you get rapid feedback from customers.

I recently participated in a Global ResOps workshop. One thing I heard loud and clear was the struggle for our discipline to connect into design and engineering cycles. There were questions about how to address the “unreasonable expectations” of what we can do in short time frames. I also heard that researchers struggle with long and slow timelines: Anyone ever had a brilliant, generative insight ignored because “We can’t put that into the product for another 6 months”?

The good news is that there are methodologies such as “Lean” and “Agile” that can help us. Our goal as researchers is to use knowledge to develop customer-focused solutions. I personally love that these methodologies, when implemented fully, incorporate customers as core constituents in collaborative and iterative development processes.

In fact, my team has created an entire usability and experimentation engine using “Lean” and “Agile” methods. However, this team recognizes that letting speed dictate user research is a huge risk. If you cut corners on quality, customer involvement, and adaptive planning, your research could become disposable.

Do research right, or don’t do it at all

I know, that’s a bold statement. But here’s why: When time constraints force us to drop the rigor and process that incorporates customer feedback, the user research you conduct loses its validity and ultimately its value.

The data we gather out of exercises that over-index on speed are decontextualized and disconnected from other relevant insights we’ve collected over time and across studies. We need to pause and question whether this one-off research adds real value and contributes to an organization’s growing understanding of customers when we know it may skip steps critical to identifying insights that transcend time and context.

User research that takes time to get right has value beyond the moment for which it was intended. I’m betting you sometimes forgo conducting research if you think your stakeholders believe it’s too slow. But, if your research uncovered an insight after v1 shipped, you could still leverage that insight on v1+x.

For example, think of the last time a product team asked you, “We’re shipping v1 next week. Can you figure out if our customers want or need this?” As a researcher, you know you need more time to answer this question in a valid way. So, do you skip this research? No. Do you rush through your research, compromising its rigor? No. You investigate anyway and apply your learnings to v2.

To help keep track of these insights, we should build systems that capture our knowledge and enable us to resurface it across development cycles and projects. Imagine this: “Hey Judy, remember that thing we learned 6 months ago? Research just reminded me that it is applicable in our next launch!”

That’s what we’re looking for: timeless user insights that help our product teams again and again and contribute to a curated body of knowledge about our customers’ needs, beliefs, and behaviors. Ideally, we house these insights in databases, so they can be accessed and retrieved easily by anyone for future use (but that’s another story for another time). If we only focus on speed, we lose sight of that goal.

Creating timeless research

Here’s my point: we’ll always have to deal with requests to make our research faster, but once you or your user research team has achieved terminal velocity with any given method, stop trying to speed it up. Instead, focus on capturing each insight, leveling it up to organizational knowledge, and applying that learning in the future. Yes, that means when an important insight doesn’t make v1, go ahead and bring it back up to apply to v2. Timeless research is really about building long-term organizational knowledge and curating what you’ve already learned.

Disposable research is the stuff you throw away, after you ship. To be truly lean, get rid of that wasteful process. Instead, focus your research team’s time on making connections between past insights, then reusing and remixing them in new contexts. That way, you’re consistently providing timeless research that overcomes the need for speed.

Have you ever felt pressure to bypass good research for the sake of speed? Tell me about it in the comments, or tweet @insightsmunko.


To stay in-the-know with what’s new at Microsoft Research + Insight, follow us on Twitter and Facebook. And if you are interested in becoming a user researcher at Microsoft, head over to careers.microsoft.com.

WhatsApp vulnerabilities let hackers alter messages

Attackers are able to intercept and manipulate messages in the encrypted messaging app WhatsApp.

According to new research from Check Point, there are WhatsApp vulnerabilities that enable attackers to manipulate and modify messages in both public and private conversations. This type of manipulation could make it easy to continue the spread of misinformation.

WhatsApp, which is owned by Facebook, has over 1.5 billion users who send approximately 65 billion messages daily. The Check Point researchers warned of online scams, rumors and the spread of fake news with a user base that large, and WhatsApp has already been used for a number of these types of scams.

The new WhatsApp vulnerabilities that Check Point outlined in its blog post involve social engineering techniques that can be used to deceive users in three ways: by changing the identity of the sender of a message in a group, changing the text of someone else’s reply message, and by sending a private message to a group member to which replies are made public.

“We believe these vulnerabilities to be of the utmost importance and require attention,” the researchers wrote.

The WhatsApp vulnerabilities have to do with the communications between the mobile version of the application and the desktop version. Check Point was able to discover them by decrypting the communications between the mobile and desktop version.

“By decrypting the WhatsApp communication, we were able to see all the parameters that are actually sent between the mobile version of WhatsApp and the Web version. This allowed us to then be able to manipulate them and start looking for security issues,” the researchers wrote in their blog post detailing the WhatsApp vulnerabilities.

In the first attack outlined by Check Point’s Dikla Barda, Roman Zaikin and Oded Vanunu, hackers can change the identity of a sender in a group message, even if they are not part of the group. The researchers were also able to change the text of the message to something completely different.

In the second attack, a hacker can change someone’s reply to a message. In doing this, “it would be possible to incriminate a person, or close a fraudulent deal,” the Check Point team explained.

In the final attack disclosed, “it is possible to send a message in a group chat that only a specific person will see, though if he replies to this message, the entire group will see his reply.” This means that the person who responds could reveal information to the group that he did not intend to.

Check Point said it disclosed these vulnerabilities to WhatsApp before making them public.

In other news

  • Computers at the office of PGA America have reportedly been infected with ransomware. According to a report from Golfweek, employees of the golf organization noticed the infection earlier this week when a ransom note appeared on their screens when they tried to access the affected files. “Your network has been penetrated. All files on each host in the network have been encrypted with a strong algorythm (sic),” the note said, according to Golfweek. The files contained information for the PGA Championship at Bellerive and the Ryder Cup in France, including “extensive” promotional materials. According to the Golfweek report, no specific ransom amount was demanded, though the hacker included a bitcoin wallet number.
  • Microsoft may be adding a new security feature to Windows 10 called “InPrivate Desktop.” According to a report from Bleeping Computer, the feature acts like a “throwaway sandbox for secure, one-time execution of untrusted software” and will only be available on Windows 10 Enterprise. Bleeping Computer became aware of this previously undisclosed feature through a Windows 10 Insider Feedback Hub quest and said that it will enable “administrators to run untrusted executables in a secure sandbox without fear that it can make any changes to the operating system or system’s files.” The Feedback Hub said it is an “in-box, speedy VM that is recycled when you close” the application, according to the report. There are no details yet about when this feature may be rolled out.
  • Comcast Xfinity reportedly exposed personal data of over 26.5 million of its customers. Security researcher Ryan Stevenson discovered two previously unreported vulnerabilities in Comcast Xfinity’s customer portals and through those vulnerabilities, partial home addresses and Social Security numbers of Comcast customers were exposed. The first vulnerability could be exploited by refreshing an in-home authentication page that lets users pay their bills without signing into their accounts. Through this, hackers could have figured out the customer’s IP address and partial home address. The second vulnerability was on a sign-up page for Comcast’s Authorized Dealer and revealed the last four digits of a customer’s SSN. There is no evidence yet that the information was actually stolen, and Comcast patched the vulnerabilities after Stevenson reported them.

Malvertising campaign tied to legitimate online ad companies

Check Point Research uncovered an extensive malvertising campaign that has ties to legitimate online advertising companies.

Check Point’s report, titled “A Malvertising Campaign of Secrets and Lies,” detailed how a threat actor group used more than 10,000 compromised WordPress sites and multiple exploit kits to spread a variety of malware, including ransomware and banking Trojans. The group, which Check Point refers to as “Master134,” was responsible for a “well-planned” malvertising campaign that involved several online advertisement publishers, resellers and networks, including a company known as AdsTerra that Check Point claims was “powering the whole process.”

The technical aspects the Master134 campaign aren’t novel, according to Check Point. The threat actors used unpatched WordPress sites that were vulnerable to remote code execution attacks and then redirected traffic from those sites to pages run by ad networks, which in turn redirected users to a malicious domain that downloads malware to users’ systems.

Check Point researchers took a closer look at how traffic was directed to the malicious domains and found “an alarming partnership between a threat actor disguised as a publisher and several legitimate resellers.” According to the report, Master134 sells its traffic or “ad space” to the AdsTerra network, which then sells it to advertising resellers such as ExoClick, AdKernel, EvoLeads and AdventureFeeds.

The reseller then sells the Master134 traffic to their clients, but Check Point said its researchers discovered an odd pattern with the sales. “All the clients who bid on the traffic directed via AdsTerra, from Master134, happen to be threat actors, and among them some of the exploit kit land’s biggest players,” the report claimed.

Check Point Research speculated that threat actors operating these malicious domains and exploit kits pay Master134 for traffic or “victims,” which are supplied to them via a seemingly legitimate channel of ad networks. While the vendor didn’t accuse AdsTerra or the resellers of knowingly participating in the malvertising campaign, the report did say the ad networks would need to “turn a blind eye” for this scheme to be successful.

“[A]lthough we would like to believe that the resellers that purchase Master134’s ad space from AdsTerra are acting in good faith, unaware of Master134’s malicious intentions, an examination of the purchases from AdsTerra showed that somehow, space offered by Master134 always ended up in the hands of cyber criminals, and thus enables the infection chain to be completed,” the report stated.

SearchSecurity contacted AdsTerra, ExoClick, EvoLeads, AdventureFeeds and AdKernel for comment on the Check Point report.

AdKernel denied any involvement with the Master134 group or related threat actors. Judy Shapiro, chief strategy advisor, emailed a statement to SearchSecurity claiming the Check Point report is false and that AdKernel is an ad-serving technology provider, not an ad network or reseller. Shapiro also wrote that AdKernel did not own the malicious domains cited in the Check Point report, and that those domains were “owned by ad network clients of AdKernel.” The company, however, did not say who those clients were.

The other four companies had not responded at press time.

The Check Point Research report had strong words for the online advertising industry and its inability or unwillingness to prevent such malvertising campaigns from taking advantage of their networks.

“[W]hen legitimate online advertising companies are found at the heart of a scheme, connecting threat actors and enabling the distribution of malicious content worldwide, we can’t help but wonder — is the online advertising industry responsible for the public’s safety?” the report asked. “Indeed, how can we be certain that the advertisement we encounter while visiting legitimate websites are not meant to harm us?”

60 seconds with … Cambridge Research Lab Director Chris Bishop

Microsoft’s Research Lab in Cambridge UK, back when the lab was first opened in 1997, before being named Lab Director two-and-a-half years ago, so I’ve been involved in growing and shaping the lab for more than two decades. Today my role includes leadership of the MSR Cambridge lab, as well as coordination of the broader Microsoft presence in Cambridge. I am fortunate in being supported by a very talented leadership team and a highly capable and motivated team of support staff.

What were your previous jobs?

My background is in theoretical physics. After graduating from Oxford, I did a PhD in quantum field theory at the University of Edinburgh, exploring some of the fundamental mathematics of matter, energy, and space-time. After my PhD I wanted to do something that would have potential for practical impact, so I joined the UK’s national fusion research lab to work on the theory of magnetically confined plasmas as part of a long-term goal to create unlimited clean energy. It was during this time that there were some breakthroughs in the field of neural networks. I was very inspired by the concept of machine intelligence, and the idea that computers could learn for themselves. Initially I started applying neural networks to problems in fusion research, and we became the first lab to use neural networks for real-time feedback control of a high-temperature fusion plasma.

In fact, I found neural networks so fascinating that, after about eight years working on fusion research, I took a rather radical step and switched fields into machine learning. I became a Professor at Aston University in Birmingham, where I set up a very successful research lab. Then I took a sabbatical and came to Cambridge for six months to run a major, international programme called “Neural Networks and Machine Learning” at the Isaac Newton Institute. The programme started on July 1, 1997, on the very same day that Microsoft announced it was opening a research lab in Cambridge, its first outside the US. I was approached by Microsoft to join the new lab, and have never looked back.

What are your aims at Microsoft?

My ambition is for the lab to have an impact on the real world at scale by tackling very hard research problems, and by leveraging the advantages and opportunities we have as part of Microsoft. I often say that I want the MSR Cambridge lab to be a critical asset for the company.

I’m also very passionate about diversity and inclusion, and we have introduced multiple initiatives over the last year to support this. We are seeing a lot of success in bringing more women into technical roles in the lab, across both engineering and research, and that’s very exciting to see.

What’s the hardest part of your job?

A core part of my job is to exercise judgment in situations where there is no clear right answer. For instance, in allocating limited resources I need to look at the risk, the level of investment, the potential for impact, and the timescale. At any one time there will be some things we are investing in that are quite long term but where the impact could be revolutionary, along with other things that have perhaps been researched for several years which are beginning to get real traction, all the way to things that have had real-world impact already. The hardest part of my job is to weigh up all these factors and make some difficult decisions on where to place our bets.

What’s the best part of your job?

The thing I enjoy most is the wonderful combination of technology and people. Those are two aspects I find equally fascinating, yet they offer totally different kinds of challenges. We, as a lab, are constantly thinking about technology, trends and opportunities, but also about the people, teams, leadership, staff development and recruitment, particularly in what has become a very competitive talent environment. The way these things come together is fascinating. There is never a dull day here.

What is a leader?

I think of leadership as facilitating and enabling, rather than directing. One of the things I give a lot of attention to is leadership development. We have a leadership team for the lab and we meet once a week for a couple of hours. I think about the activities of that team, but also about how we function together. It’s the diversity of the opinions of the team members that creates a value that’s greater than the sum of its parts. Leadership is about harnessing the capabilities of every person in the lab and allowing everyone to bring their best game to the table. I therefore see my role primarily as drawing out the best in others and empowering them to be successful.

What are you most proud of?

Last year I was elected a Fellow of the Royal Society, and that was an incredibly proud moment. There’s a famous book I got to sign, and you can flip back and see the signatures of Isaac Newton, Charles Darwin, Albert Einstein, and pretty much every scientist you’ve ever heard of. At the start of the book is the signature of King Charles II who granted the royal charter, so this book contains over three-and-a-half centuries of scientific history. That was a very humbling but thrilling moment.

Another thing I’m very proud of was the opportunity to give the Royal Institution Christmas Lectures. The Royal Institution was set up more than 200 years ago – Michael Faraday was one of the early directors – and around 14 Nobel prizes have been associated with the Institution, so there is a tremendous history there too. These days it’s most famous for the Christmas Lectures, which were started by Faraday. Ever since the 1960s these lectures have been broadcast on national television at Christmas, and I watched them as a child with my mum and dad. They were very inspirational for me and were one of the factors that led me to choose a career in science. About 10 years ago, I had the opportunity to give the lectures, which would have been inconceivable to me as a child. It was an extraordinary moment to walk into that famous iconic theatre, where Faraday lectured many times and where so many important scientific discoveries were first announced.

One Microsoft anecdote that relates to the lectures was that getting selected was quite a competitive process. It eventually came down to a shortlist of five people, and I was very keen to be chosen, especially as it was the first time in the 200 year history of the lectures that they were going to be on the subject of computer science. I was thinking about what I could do to get selected, so I wrote to Bill Gates, explained how important these lectures were and asked him whether, if I was selected, he would agree to join me as a guest in one of the lectures. Fortunately, he said yes, and so I was able to include this is my proposal to the Royal Institution. When I was ultimately selected, I held Bill to this promise, and interviewed him via satellite on live television during one of the lectures.

Chris Bishop is elected a Fellow of the Royal Society

What inspires you?

I love the idea that through our intellectual drive and curiosity we can use technology to make the world a better place for millions of people. For example, the field of healthcare today largely takes a one-size-fits-all approach that reactively waits until patients become sick before responding, and which is increasingly associated with escalating costs that are becoming unsustainable. The power of digital technology offers the opportunity to create a data-driven approach to healthcare that is personalised, predictive and preventative, and which could significantly reduce costs while also improving health and wellbeing. I’ve made Healthcare AI one of the focal points of the Cambridge lab, and I find it inspiring that the combination of machine learning, together with Microsoft’s cloud, could help to bring about a much-needed transformation in healthcare.

What is your favourite Microsoft product?

A few years ago, the machine learning team here in Cambridge built a feature, in collaboration with the Exchange team, called Clutter. It sorts out the email you should pay attention to now, from the ones that can be left to, say, a Friday afternoon. I love it because it’s used by tens of millions of people, and it has some very beautiful research ideas at the heart of it – something called a hierarchical Bayesian machine learning model. This gives it a nice out-of-the-box experience, a sort of average that does OK for everybody, but as you engage with it, it personalises and learns your particular preferences of what constitutes urgent versus non-urgent email. The other reason I’m particularly fond of it is that when I became Lab Director, the volume of email in my inbox quadrupled. That occurred just as we were releasing the Clutter feature, so it arrived just in time to save me from being overwhelmed.

What was the first bit of technology that you were excited about?

When I was a child I was very excited about the Apollo moon landings. I was at an age where I could watch them live on television and knew enough to understand what an incredible achievement they were. Just think of that Saturn launch vehicle that’s 36 storeys high, weighs 3,000 tonnes, is burning 15 tonnes of fuel a second, and yet it’s unstable. So, it must be balanced, rather like balancing a broom on your finger, by pivoting those massive engines backwards and forwards on hydraulic rams in response to signals from gyroscopes at the top of the rocket. It’s that combination of extreme brute force with exquisite precision, along with dozens of other extraordinary yet critical innovations, that made the whole adventure just breath-taking. And the filtering algorithms used by the guidance system are an elegant application of Bayesian inference, so it turns out that machine learning is, literally, rocket science.

Tags: , , , , ,

The nine roles you need on your data science research team

It’s easy to focus too much on building a data science research team loaded with Ph.D.s to do machine learning at the expense of developing other data science skills needed to compete in today’s data-driven, digital economy. While high-end, specialty data science skills for machine learning are important, they can also get in the way of a more pragmatic and useful adoption of data science. That’s the view of Cassie Kozyrkov, chief decision scientist at Google and a proponent of the democratization of data-based organizational decision-making.

To start, CIOs need to expand their thinking about the types of roles involved in implementing data science programs, Kozyrkov said at the recent Rev Data Science Leaders Summit in San Francisco.

For example, it’s important to think about data science research as a specialty role developed to provide intelligence for important business decisions. “If an answer involves one or more important decisions, then you need to bring in the data scientists,” said Kozyrkov, who designed Google’s analytics program and trained more than 15,000 Google employees in statistics, decision-making and machine learning.

But other tasks related to data analytics, like making informational charts, testing out various algorithms and making better decisions, are best handled by other data science team members with entirely different skill sets.

Data science roles: The nine must-haves

There are a variety of data science research roles for an organization to consider and certain characteristics best suited for each. Most enterprises already have correctly filled several of these data science positions, but most will also have people with the wrong skills or motivations in certain data science roles. This mismatch can slow things down or demotivate others throughout the enterprise, so it’s important for CIOs to carefully consider who staffs these roles to get the most from their data science research.

Here is Kozyrkov’s rundown of the essential data science roles and the part each plays in helping organizations make more intelligent business decisions.

Data engineers are people who have the skills and ability to get data required for analysis at scale.

Basic analysts could be anyone in the organization with a willingness to explore data and plot relationships using various tools. Kozyrkov suggested it may be hard for data scientists to cede some responsibility for basic analysis to others. But, in the long run, the value of data scientists will grow, as more people throughout the company are already doing basic analytics.

Expert analysts, on the other hand, should be able to search through data sets quickly. You don’t want to put a software engineer or very methodical person in this role, because they are too slow.

“The expert software engineer will do something beautiful, but won’t look at much of your data sets,” Kozyrkov said. You want someone who is sloppy and will run around your data. Caution is warranted in buffering expert analysts from software developers inclined to complain about sloppy — yet quickly produced — code.

Statisticians are the spoilsports who will explain how your latest theory does not hold up for 20 different reasons. These people can kill motivation and excitement. But they are also important for coming to conclusions safely for important decisions.

A machine learning engineer is not a researcher who builds algorithms. Instead, these AI-focused computer programmers excel at moving a lot of data sets through a variety of software packages to decide if the output looks promising. The best person for this job is not a perfectionist who would slow things down by looking for the best algorithm.

A good machine learning engineer, in Kozyrkov’s view, is someone who doesn’t know what they are doing and will try out everything quickly. “The perfectionist needs to have the perfection encouraged out of them,” she said.

Too many businesses are trying to staff the team with a bunch of Ph.D. researchers. These folks want to do research, not solve a business problem.
Cassie Kozyrkovchief decision scientist at Google

A data scientist is an expert who is well-trained in statistics and also good at machine learning. They tend to be expensive, so Kozyrkov recommended using them strategically.

A data science manager is a data scientist who wakes up one day and decides he or she wants to do something different to benefit the bottom line. These folks can connect the decision-making side of business with the data science of big data. “If you find one of these, grab them and never let them go,” Kozyrkov said.

A qualitative expert is a social scientist who can assess decision-making. This person is good at helping decision-makers set up a problem in a way that can be solved with data science. They tend to have better business management training than some of the other roles.

A data science researcher has the skills to craft customized data science and machine learning algorithms. Data science researchers should not be an early hire. “Too many businesses are trying to staff the team with a bunch of Ph.D. researchers. These folks want to do research, not solve a business problem,” Kozyrkov said. “This is a hire you only need in a few cases.”

Prioritize data science research projects

For CIOs looking to build their data science research team, develop a strategy for prioritizing and assigning data science projects. (See the aforementioned advice on hiring data science researchers.)

Decisions about what to prioritize should involve front-line business managers, who can decide what data science projects are worth pursuing.

In the long run, some of the most valuable skills lie in learning how to bridge the gap between business decision-makers and other roles. Doing this in a pragmatic way requires training in statistics, neuroscience, psychology, economic management, social sciences and machine learning, Kozyrkov said. 

Microsoft and National Geographic form AI for Earth Innovation Grant partnership | Stories

New grant offering will support research and scientific discovery with AI technologies to advance agriculture, biodiversity conservation, climate change and water

REDMOND, Wash., and WASHINGTON, D.C. — July 16, 2018 — On Monday, Microsoft Corp. and National Geographic announced a new partnership to advance scientific exploration and research on critical environmental challenges with the power of artificial intelligence (AI). The newly created $1 million AI for Earth Innovation Grant program will provide award recipients with financial support, access to Microsoft cloud and AI tools, inclusion in the National Geographic Explorer community, and affiliation with National Geographic Labs, an initiative launched by National Geographic to accelerate transformative change and exponential solutions to the world’s biggest challenges by harnessing data, technology and innovation. Individuals and organizations working at the intersection of environmental science and computer science can apply today at https://www.nationalgeographic.org/grants/grant-opportunities/ai-earth-innovation/.

National Geographic logo“National Geographic is synonymous with science and exploration, and in Microsoft we found a partner that is well-positioned to accelerate the pace of scientific research and new solutions to protect our natural world,” said Jonathan Baillie, chief scientist and executive vice president, science and exploration at the National Geographic Society. “With today’s announcement, we will enable outstanding explorers seeking solutions for a sustainable future with the cloud and AI technologies that can quickly improve the speed, scope and scale of their work as well as support National Geographic Labs’ activities around technology and innovation for a planet in balance.”

“Microsoft is constantly exploring the boundaries of what technology can do, and what it can do for people and the world,” said Lucas Joppa, chief environmental scientist at Microsoft. “We believe that humans and computers, working together through AI, can change the way that society monitors, models and manages Earth’s natural systems. We believe this because we’ve seen it — we’re constantly amazed by the advances our AI for Earth collaborators have made over the past months. Scaling this through National Geographic’s global network will create a whole new generation of explorers who use AI to create a more sustainable future for the planet and everyone on it.”

The $1 million AI for Earth Innovation Grant program will provide financial support to between five and 15 novel projects that use AI to advance conservation research toward a more sustainable future. The grants will support the creation and deployment of open-sourced trained models and algorithms that will be made broadly available to other environmental researchers, which offers greater potential to provide exponential impact.

Qualifying applications will focus on one or more of the core areas: agriculture, biodiversity conservation, climate change and water. Applications are open as of today and must be submitted by Oct. 8, 2018. Recipients will be announced in December 2018. Those who want more information and to apply can visit https://www.nationalgeographic.org/grants/grant-opportunities/ai-earth-innovation/.

About the National Geographic Society

The National Geographic Society is a leading nonprofit that invests in bold people and transformative ideas in the fields of exploration, scientific research, storytelling and education. The Society aspires to create a community of change, advancing key insights about the planet and probing some of the most pressing scientific questions of our time, all while ensuring that the next generation is armed with geographic knowledge and global understanding. Its goal is measurable impact: furthering exploration and educating people around the world to inspire solutions for the greater good. For more information, visit www.nationalgeographic.org.

About Microsoft

Microsoft (Nasdaq “MSFT” @microsoft) enables digital transformation for the era of an intelligent cloud and an intelligent edge. Its mission is to empower every person and every organization on the planet to achieve more.

For more information, press only:

Microsoft Media Relations, WE Communications for Microsoft, (425) 638-7777,

rrt@we-worldwide.com

Note to editors: For more information, news and perspectives from Microsoft, please visit the Microsoft News Center at http://news.microsoft.com. Web links, telephone numbers and titles were correct at time of publication, but may have changed. For additional assistance, journalists and analysts may contact Microsoft’s Rapid Response Team or other appropriate contacts listed at http://news.microsoft.com/microsoft-public-relations-contacts.

TextWorld: A learning environment for training reinforcement learning agents, inspired by text-based games – Microsoft Research

Today, fresh out of the Microsoft Research Montreal lab, comes an open-source project called TextWorld. TextWorld is an extensible Python framework for generating text-based games. Reinforcement learning researchers can use TextWorld to train and test AI agents in skills such as language understanding, affordance extraction, memory and planning, exploration and more. Researchers can study these in the context of generalization and transfer learning. TextWorld further runs existing text-based games, like the legendary Zork, for evaluating how well AI agents perform in complex, human-designed settings.

Figure 1 – Enter the world of TextWorld. Get the code at aka.ms/textworld.

Text-based games – also known as interactive fiction or adventure games – are games in which the play environment and the player’s interactions with it are represented solely or primarily via text. As players moves through the game world, they observe textual descriptions of their surroundings (typically divided into discrete ‘rooms’), what objects are nearby, and any other pertinent information. Players issue text commands to an interpreter to manipulate objects, other characters in the game, or themselves. After each command, the game usually provides some feedback to inform players how that command altered the game environment, if at all. A typical text-based game poses a series of puzzles to solve, treasures to collect, and locations to reach. Goals and waypoints may be specified explicitly or may have to be inferred from cues.

Figure 2 – An example game from TextWorld with a house-based theme.

Text-based games couple the freedom to explore a defined space with the restrictions of a parser and game world designed to respond positively to a relatively small set of textual commands. An agent that can competently navigate a text-based game needs to be able to not only generate coherent textual commands but must also generate the right commands in the right order, with little to no mistakes in between. Text-based games encourage experimentation and successful playthroughs involve multiple game losses and in-game “deaths.” Close observation and creative interpretation of the text the game provides and a generous supply of common sense are also integral to winning text-based games. The relatively simple obstacles present in a TextWorld game serve as an introduction to the basic real-life challenges posed by text-based games. In TextWorld, an agent needs to learn how to observe, experiment, fail and learn from failure.

TextWorld has two main components: a game generator and a game engine. The game generator converts high-level game specifications, such as number of rooms, number of objects, game length, and winning conditions, into an executable game source code in the Inform 7 language. The game engine is a simple inference machine that ensures that each step of the generated game is valid by using simple algorithms such as one-step forward and backward chaining.

Figure 3 – An overview of the TextWorld architecture.

“One reason I’m excited about TextWorld is the way it combines reinforcement learning with natural language,” said Geoff Gordon, Principal Research Manager at Microsoft Research Montreal “These two technologies are both really important, but they don’t fit together that well yet. TextWorld will push researchers to make them work in combination.” Gordon pointed out that reinforcement learning has had a number of high-profile successes recently (like Go or Ms. Pac-Man), but in all of these cases the agent has fairly simple observations and actions (for example, screen images and joystick positions in Ms. Pac-Man). In TextWorld, the agent has to both read and produce natural language, which has an entirely different and, in many cases, more complicated structure.

“I’m excited to see how researchers deal with this added complexity, said Gordon.”

Microsoft Research Montreal specializes in start-of-the art research in machine reading comprehension, dialogue, reinforcement learning, and FATE (Fairness, Accountability, Transparency, and Ethics in AI). The lab was founded in 2015 as Maluuba and acquired by Microsoft in 2017. For more information, check out Microsoft Research Montreal.

This release of TextWorld is a beta and we are encouraging as much feedback as possible on the framework from fellow researchers across the world. You can send your feedback and questions to textworld@microsoft.com. Also, for more information and to get the code, check out TextWorld, and our related publications TextWorld: A Learning Environment for Text-based Games and Counting to Explore and Generalize in Text-based Games. Thank you!

The case for cloud storage as a service at Partners

Partners HealthCare relies on its enterprise research infrastructure and services group, or ERIS, to provide an essential service: storing, securing and enabling access to the data files that researchers need to do their work.

To do that, ERIS stood up a large network providing up to 50 TB of storage, so the research departments could consolidate their network drives, while also managing access to those files based on a permission system.

But researchers were contending with growing demands to better secure data and track access, said Brent Richter, director of ERIS at the nonprofit Boston-based healthcare system. Federal regulations and state laws, as well as standards and requirements imposed by the companies and institutions working with Partners, required increasing amounts of access controls, auditing capabilities and security layers.

That put pressure on ERIS to devise a system that could better meet those heightened healthcare privacy and security requirements.

“We were thinking about how do we get audit controls, full backup and high availability built into a file storage system that can be used at the endpoint and that still carries the nested permissions that can be shared across the workgroups within our firewall,” he explained.

Hybrid cloud storage as a service

At the time, ERIS was devising security plans based on the various requirements established by the different contracts and research projects, filling out paperwork to document those plans and performing time-intensive audits.

It was then that ERIS explored ClearSky Data. The cloud-storage-as-a-service provider was already being used by another IT unit within Partners for block storage; ERIS decided six months ago to pilot the ClearSky Data platform.

“They’re delivering a network service in our data center that’s relatively small; it has very fast storage inside of it that provides that cache, or staging area, for files that our users are mapping to their endpoints,” Richter explained.

From there, automation and software systems from ClearSky Data take those files and move them to its local data center, which is in Boston. “It replicates the data there, and it also keeps the server in our data center light. [ClearSky Data] has all the files on it, but not all the data in the files on it; it keeps what our users need when they’re using it.”

Essentially, ClearSky Data delivers on-demand primary storage, off-site backup and disaster recovery as a single service, he said.

All this, however, is invisible to the end users, he added. The researchers accessing data stored on the ClearSky Data platform, as well as the one built by ERIS, do not notice the differences in the technologies as they go about their usual work.

ClearSky benefits for Partners

ERIS’ decision to move to ClearSky Data’s fully managed service delivered several specific benefits, Richter said.

He said the new approach reduced the system’s on-premises storage footprint, while accelerating a hybrid cloud strategy. It delivered high performance, as well as more automated security and privacy controls. And it offered more data protection and disaster recovery capabilities, as well as more agility and elasticity.

Richter said buying the capabilities also helped ERIS to stay focused on its mission of delivering the technologies that enable the researchers.

“We could design and engineer something ourselves, but at the end of the day, we’re service providers. We want to provide our service with all the needed security so our users would just be able to leverage it, so they wouldn’t have to figure out whether it met the requirements on this contract or another,” Richter said.

He noted, too, that the decision to go with a hybrid cloud storage-as-a-service approach allowed ERIS to focus on activities that differentiate the Partners research community, such as supporting its data science efforts.

“It allows us to focus on our mission, which is providing IT products and services that enable discovery and research,” he added.

Pros and cons of IaaS platform

Partners’ storage-as-a-service strategy fits into the broader IaaS market, which has traditionally been broken into two parts: compute and storage, said Naveen Chhabra, a senior analyst serving infrastructure and operations professionals at Forrester Research Inc.

[Cloud storage as a service] allows us to focus on our mission, which is providing IT products and services that enable discovery and research.
Brent Richterdirector of ERIS at Partners HealthCare

In that light, ClearSky Data is one of many providers offering not just cloud storage, but the other infrastructure layers — and, indeed, the whole ecosystem — needed by enterprise IT departments, with AWS, IBM and Google being among the biggest vendors in the space, Chhabra said.

As for the cloud-storage-as-a-service approach adopted by Partners, Chhabra said it can offer enterprise IT departments flexibility, scalability and faster time to market — the benefits that traditionally come with cloud. Additionally, it can help enterprise IT move more of their workloads to the cloud.

There are potential drawbacks in a hybrid cloud storage-as-a-service setup, however, Chhabra said. Applying and enforcing access management policies in an environment where there are both on-premises and IaaS platforms can be challenging for IT, especially as deployment size grows. And while implementation of cloud-storage-as-a-service platforms, as well as IaaS in general, isn’t particularly challenging from a technology standpoint, the movement of applications on the new platform may not be as seamless or frictionless as promoted.

“The storage may not be as easily consumable by on-prem applications. [For example,] if you have an application running on-prem and it tries to consume the storage, there could be an integration challenge because of different standards,” he said.

IaaS may also be more expensive than keeping everything on premises, he said, adding that the higher costs aren’t usually significant enough to outweigh the benefits. “It may be fractionally costlier, and the customer may care about it, but not that much,” he said.

Competitive advantage

ERIS’ pilot phase with ClearSky Data involves standing up a Linux-based file service, as well as a Windows-based file service.

Because ERIS uses a chargeback system, Richter said the research groups his team serves can opt to use the older internal system — slightly less expensive — or they can opt to use ClearSky Data’s infrastructure.

“For those groups that have these contracts with much higher data and security controls than our system can provide, they now have an option that fulfills that need,” Richter said.

That itself provides Partners a boost in the competitive research market, he added.

“For our internal customers who have these contracts, they then won’t have to spend a month auditing their own systems to comply with an external auditor that these companies bring as part of the sponsored research before you even get the contract,” Richter said. “A lot of these departments are audited to make sure they have a base level [of security and compliance], which is quite high. So, if you have that in place already, that gives you a competitive advantage.”