Episode 10, January 31, 2018
Neural Program Synthesis and the Quest to Democratize Programming with Dr. Rishabh Singh
Every day, computers take on more and more of our daily tasks. Fill in a few cells on your spreadsheet? It’ll fill in the rest. Ask your car for directions? It’ll get you there. We can program computers to do almost anything. But what about programming computers to… program computers? That’s a task that Dr. Rishabh Singh, and the team in the Cognition group at Microsoft Research, are tackling with Neural Program Synthesis, also known as artificial programming.
Today, Dr. Singh explains how deep neural networks are already training computers to do things like take classes and grade assignments, shares how programmers can perform complicated, high-level debugging through the delightfully named process of neural fuzzing, and lays out his vision to democratize computer programming in the brave new world of Software 2.0.
Rishabh Singh: The idea is, instead of programmers specifying or writing a program step-by-step with every statement, instead of specifying the complete logic, what if we can allow programmers to specify their intent for what they want to program to do at a high level?
Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting edge of technology research, and the scientists behind it. I’m your host, Gretchen Huizinga.
Every day, computers take on more and more of our daily tasks. Fill in a few cells on your spreadsheet? It’ll fill in the rest. Ask your car for directions? It’ll get you there. Anymore, we can program computers to do almost anything. But what about programming computers to… program computers? That’s a task that Dr. Rishabh Singh, and the team in the Cognition group at Microsoft Research, are tackling with neural program synthesis, also known as artificial programming. Today, Dr. Singh explains how deep neural networks are already training computers to do things like take classes and grade assignments, shares how programmers can perform complicated, high-level debugging through the delightfully-named process of neural fuzzing, and lays out his vision to democratize computer programming in the brave new world of Software 2.0.
That and much more on today’s episode of the Microsoft Research Podcast.
Host: Rishabh Singh, welcome to the podcast.
Rishabh Singh: Thanks a lot, Gretchen, yes.
Host: It’s great to have you here.
Rishabh Singh: Great to be here.
Host: Yeah. So, your research resides under Research in Software Engineering or the RiSE group at MSR, but your specific area is the relatively new group, just called Cognition. Tell us what goes on under the Cognition umbrella.
Rishabh Singh: Yeah, I can briefly talk about that. So, about a couple of years back, we found this group called Cognition where the idea was how do we build the next generation of AI systems that go towards general intelligence? And the idea was to bring together people from deep learning, reinforcement learning and programming languages such that we can build systems that can learn complex algorithmic tasks. So, for example actually, we have seen a lot of recent breakthroughs in deep learning where systems have achieved even super human performances on recognizing images, understanding text, uh, speech… But one thing that has been there amongst all these successes is that these networks, what they are learning are still relatively simpler models, tasks where these networks learned patterns. But to get towards more general intelligence, we need systems that can learn to perform a complex sequence of tasks. And that was something when we started this group, we wanted to build these next generation of network architectures that can learn to generate programs. And more broadly we call this area neural program synthesis, basically training these networks to generate programs.
Host: Okay. So, let’s do a little bit of a level-set, because our audience is fairly sophisticated, as you know.
Rishabh Singh: Yeah.
Host: But I always like to do a layman’s version of what we’re going to spend a half hour talking about.
Rishabh Singh: Um-hum.
Host: And you’ve sort of alluded to it already, but give us a kind of definition of program synthesis, because that’s sort of the nugget of what you’re working on, and why we need it.
Rishabh Singh: Yeah, that’s a great point actually. So, the idea is, instead of programmers specifying or writing a program step-by-step with every statement, instead of specifying the complete logic… what if we can allow programmers to specify their intent for what they want to program to do at a high level? Some examples might be maybe we can specify the intent by either using a few number of input/output examples or test cases or maybe in natural language. And the goal of program synthesis is to take those under-specified specifications and generate programs that are consistent with it. Now, the reason we need program synthesis is first, not everybody who is using these sophisticated software systems, might be professional programmers. So, for them to be able to get maximum leverage out of these systems, we can enable them to also perform complicated tasks by having simpler mechanisms for them to specify the intent. One example, is let’s say Microsoft Excel, where millions and millions of users use the system, but not all of them are programmers. They come from various backgrounds. If it can allow them to specify the intents using examples or natural language, then we can really democratize the notion of programming. So that’s one reason, actually, I’m quite excited about that. But even for more sophisticated users, let’s say professional programmers, there are many tasks that are quite tricky and complex. Instead of them having to reason about it all manually, we now have efficient search algorithms that we can leverage to remove some of that burden out of these programming tasks and let the machine perform that kind of tricky reasoning and have programmers perform more high-level reasoning.
Host: You gave a talk, that people can find on the MSR website, called Programming Synthesis for the Masses…
Rishabh Singh: Right.
Host: And you talk about democratizing computer programming which sounds like what you just explained right there.
Rishabh Singh: Yes, yes.
Host: Is taking a very complex activity, programming a computer, to do something you want it to do.
Rishabh Singh: Right.
Host: But allowing it to be accessible to… you mentioned students, end-users and programmers…
Rishabh Singh: Right.
Host: Is there more to the masses? I mean, would it be useful for someone like me?
Rishabh Singh: Actually, the hope is yeah. The hope is, I think, there are newer and newer applications people are using program synthesis for. For example, there are now speech systems where somebody can specify their high-level intent using speech for these voice assistants. And then in the back-end, the idea is to take that high-level specification and break it down into smaller sub-tasks and then compose them to perform the original task that the user intended to do. So, I think some flavor of these techniques are now being used in almost every day use-cases now. There are many classes of users and I believe, yeah, program synthesis techniques can help pretty much any scenario.
Host: So, are you aiming for a time when I could just articulate what I’d like to do and then the computer’s resources that it can draw on, could actually take cues from me and execute?
Rishabh Singh: Yeah, exactly. That’s the vision that we want to get towards. I think certain domains we can first target to do such things. For example, in Excel let’s say, a user can specify what they want to do either in natural language or speech and then we can have the machine in the background search over the space of programs and then find one that fits their intent. And then there can be many other domains where something like that could possible.
Host: For my benefit, can you explain when it says “search over the range of programs,” what kind of programs would it be looking for?
Rishabh Singh: Yeah, that actually goes back to the definition of program synthesis. The way we frame this problem is, it’s essentially a search problem where the goal is, we have some programming language – it could either been Turing complete, specifying all possible programs – or it could be domain-specific as I was saying, maybe for programs inside Excel or programs for web extraction. But the search problem essentially says that a programming language defines a space of programs that could be written in that language. And given a specification, the goal is to search over this large space of programs to find one program that satisfies the specification users provided. So, it’s this giant search problem. And that’s why actually it’s one of the hardest problems as well, because the search space is humungous. Actually, in some cases even infinite, because if the language allows for recursion and arbitrary expansions, yeah. There’s a giant space, and a lot of research that happens is how to make the search more tractable, or how to actually make the search process more efficient in different domains.
Host: That sounds like you’ve got your work cut out for you and a lot of research opportunities over the next – I don’t know, years? Decades?
Rishabh Singh: Yeah. And that’s the exciting part, yeah. I think we’ve already had quite a few recent breakthroughs and certainly, recently, with more advances in hardware, in CPU speed and then also newer search methods like deep learning techniques and constraint solving, people are now trying to combine these newer techniques to make the search process even more efficient.
Host: You know, I read a lot of your papers before this podcast and… One of them was called Artificial Programming. That one resonates to me because that makes sense. I get what you are saying. It’s like artificial intelligence, artificial programming. But it is program synthesis, yes?
Rishabh Singh: Right, exactly. We were trying to think of a better way to describe this whole system. But actually, one more motivation of defining it that was, there’s been a lot of work previously in program synthesis which has mostly been based on rule-based or more algorithmic way to perform search, which has required lots and lots of human insights to make the search tractable. And, in this case actually, what we were trying to do was, can we frame the learning problem in terms of learning abstractions using neural networks, and that which was related to artificial intelligence? So that’s one thing. But, on top of that, there’s several other benefits we get if it can learn programs. First, that we can learn interpretable models. So, whatever the network is learning, since it’s producing a program, one can look at what the network has produced, they can debug the learned program, they can modify it if doesn’t work the right way. But at least if it’s more interpretable than the current networks that are basically learning a set of low-level weight matrices. And also, one more benefit we get from learning programs is that these models tend to be more generalizable. So, these programs tend to also work well on newer inputs or newer data sets that we have not seen in training.
Host: That was actually a question I had in trying to wrap my brain around this, because having talked to a couple of other researchers out here on programming languages specifically, there’s so many of them and they fall into different categories and they are each useful for a particular kind of task. And within program synthesis, how does it sort through the kinds of languages that might be useful for that particular program? I mean is that part of the search?
Rishabh Singh: Yeah, that’s actually an excellent point. Yeah. So, a lot of research actually also goes on into designing these languages. The task is actually we want to design, for any given domain, we want to come up with languages that are expressive because we want it to be able to express many, many tasks in that domain. But at the same time, we want to constrain it such that it is easier to learn or perform search in some sense. So, there’s always this trade-off between expressivity and conciseness of the language. And a lot of work actually goes into designing languages specifically for program synthesis.
Rishabh Singh: And actually, similar things have happened in our work as well. We’ve been looking at various domains, designing various domain-specific languages for different program synthesis applications, and since we wanted to get towards a system that can learn programs in a very general-purpose language, we’ve been taking steps towards adding newer features, one at a time, to start from maybe a very small language, then adding few features and then getting towards a complete language, let’s say, like Python or R.
Host: The work you’ve done in neural program synthesis for learning string transformations in a functional language like AutoFill and FlashFill for Excel. But you’ve got this recent work using a more challenging imperative language, Karel, in which you’ve built this program that took an introductory computer science class and passed. I don’t think it was a grade you might have been happy with, but for a machine it was pretty good. Tell us that story.
Rishabh Singh: Yeah. So, FlashFill was a system we developed a few years back for Excel users to enable them to program using fewer input/output examples for string transformation, but this was mostly an algorithmic system where we came up with new efficient ways to search over the space of programs in the domain-specific language we designed for FlashFill. And we were actually quite happy that we can build these systems that can learn programs in a language like FlashFill which is mostly a functional language. It’s learning composition of functional abstractions. Then we wanted to see actually what happens if we make the language a little bit more complex. So, what if we can add, a bit of control flow over these functions like loops and conditionals? And interestingly enough, we found this language called Karel which is taught in Stanford’s introductory programming class. Students, they are basically given a set of input/output worlds where robots are in certain position and there are markers and blocks. And so they have to write these programs in the Karel language to basically move robot, starting from the input world such that it ends up in the output world. And that was one of the reasons we chose Karel, because it had control flow and it added one level more complexity over the FlashFill language. Secondly, as you mentioned Gretchen, we also wanted to see actually how well it would do in not just learning programs that we generate in the language synthetically, but also, can it solve problems in a class test setting that students typically go through? And yet, it was interesting that when we trained the system to perform program synthesis in this language, for the class test right now, it was able to solve 7 out of 16 problems which was yeah, which is still I think better than what we expected. But we are now improving the system to get even better, yeah.
Host: So, it needs to get better grades in order to, you know, get in the parade.
Rishabh Singh: Yeah. So, the hope is actually, keep improving the system and then we can see, yeah, how well it performs compared to yeah, human programmers that are also learning to program from scratch.
Host: Which is interesting because you are like working side-by-side with beginning programmers and this program that’s a beginning programmer.
Rishabh Singh: Exactly, right, right. So, the hope is that these systems also starting from somewhat scratch and then, maybe in the beginning they are not so good at learning programs, but over time, over experience, they are getting better and better. And that’s also one of the things we want to look at in future where right now we’re training these systems individually for every domain. For FlashFill we were training a separate system, for Karel we are training a separate system. Is there a way we can leverage knowledge across domains? Can we build a system that actually learns from experiences, not just in one domain, but be able to transfer the knowledge to other domains a well?
Host: So, doesn’t that kind of lead into the work you are doing in inductive programming?
Rishabh Singh: Basically, this leads towards the notion of general intelligence, that humans, we are able to learn something in one context and then use it later on. We don’t have to learn everything from scratch. In the same way, right, if we can build a system that can evolve over time and continuously keeps adapting itself and improving itself from experiences, that will lead towards more general intelligence.
Host: Yeah. So, I guess where I was heading with the question is like you talk about these very specific tasks with a specific language within Excel or within Karel or so on. And the goal eventually would be that these systems could work among the different kinds of things and make the same kinds of gains that humans do with their general intelligence.
Rishabh Singh: Yeah, absolutely, right. And also similar to humans, right, we are able to write really complex and large programs in general-purpose languages. So, if we are able to learn across domains, hopefully we can also increase the complexity of the domain-specific language in which we learning, and hopefully at some point, it becomes almost a general-purpose language.
Host: So, I used to teach English, and I know what it’s like to stare at a stack of essays, overwhelmed by the task of having to grade them and make meaningful contributions, and I often remember thinking, “I’m only ever going to be asymptotically approaching zero. I’m never going to get this stack done.” Tell me the story about Auto Grader and our shared experience, our shared pain and what you did about it.
Rishabh Singh: Actually, this was something that started when I was a teaching assistant for the introductory programming class at MIT. And actually, we used to go through the same process. We had a few hundred students in the class, and they would submit their programming assignments, and quite often actually this would happen, they would make small mistakes in their programs, and if you just give them traditional feedback that says either your program works or it doesn’t work, first of all, there might be just some small mistakes and we are penalizing students for that. And secondly, there’s no meaningful feedback to students as well about how they can improve, if you just say pass or fail.
Rishabh Singh: So, that’s why actually we, when we were TAs actually we used to go through each program manually, try to understand what the problem might be, and give some more detailed feedback on what might be going wrong and how they can think about improving their solutions. And then actually around similar time, edX was also starting out and we decided to also offer this introductory programming class on edX. And the scale there was completely different from the classroom setting. We had maybe 200, 300 students in the class; whereas on edX, I think the very first offering had more than 100,000 students sign-up and there was no way actually we can manually go through all the submissions and give meaningful feedback. And then actually, that’s when the notion of this, notion of auto grader or giving automated feedback to students came up. The idea was actually, since specifically in programs… programming courses, we thought actually program synthesis might help in this setting. So, let’s say a student submits a solution for any given problem. Now, there are many, many different ways to solve the same problem. There are many different algorithms. And even for a given algorithm, there are many different programmatic choices one can make because the languages are very expressive. So, we cannot just come up with a solution that can do any kind of syntactic reasoning because that would be too much work and it’s not even possible to come up with all possible ways students are going to solve the problem. But that’s where we can leverage program synthesis. The way we framed this problem was, given an incorrect submission, can we find minimum number of changes to the original student submission that is incorrect such that it becomes functionally equivalent to teacher’s submission. And one thing that this particular system needed was, basically a teacher had to come in and define a set of common mistakes students are making. So there was a little bit of teacher effort, but that wasn’t too much. And the hope was once that is there, then it could be reused across multiple years and more similar assignments. So, that was something we were doing when I was at MIT doing my PhD. But afterwards, when I came to Microsoft Research, we have the Learning Experiences team here that also offers many, many interesting courses on edX platform which was quite interesting, and it was lovely to talk to them. And we’ve been collaborating with them now to also build these systems to help students taking those courses. One interesting thing we’d recently done is to also reduce the amount of effort teachers now need to do. So instead of teachers having to specify space of edits or changes, we leverage all the previous student submissions. Since the scale is quite big, we can actually automatically try to learn the space of changes from the data itself. So, our current auto grading system basically just takes a set of previous submissions, a few thousands, and then automatically learns the space of changes and then uses program synthesis techniques to find changes to new student submissions.
Host: So, is this actually in use in a lot of venues, educational venues?
Rishabh Singh: Yeah, actually we just deployed it, one version of it I think a few months back to the C# programming course here at Microsoft. So currently we are still doing some studies about what level of feedbacks are actually more helpful to students because we can either give a bit of hint about saying, “There is something wrong about your program at this place,” or we can go all the way and say, “Here is the mistake and this is how you want to change it”. So, we’ve been right now just taking initial feedback and trying to see what different kinds of feedbacks, what kind of effect it has on students and what is the right time to provide what kinds of hints.
Host: Let’s talk a little bit about neural networks, and how, kind of, this neural deep learning is representing what some people are calling a fundamental shift in how we write software. Some have referred to it as software 2.0.
Rishabh Singh: Right.
Host: Tell me about that.
Rishabh Singh: Yeah, so actually this is quite fascinating. I think we are changing at a really rapid pace now. And, as AI is becoming more and more prevalent and then we are using newer and newer machine-learning systems as a core part of our software, as a result of that actually, not all software is written by humans, or programmers, specifically. So, there are all these components that are typically learned from different data sets. So, it changes the way we think about how software is going to be developed in future. Since many parts of it are going to come from these learned systems, we need ways to effectively manage how to compose piece of code that is written by programmers, piece of code that is coming as a black box from outside, a learned model. And at the same time, it also introduces many challenges. Yeah. First, is interpretability, that if you are just using black box models, it’s hard to really give guarantees about what exactly is happening. These are going to be stochastic models, so we will have to write code that somehow understands that variability and tries to ensure that even if it is off by some amount, the software will still be making reasonable decisions.
Host: That’s a hard task in itself.
Rishabh Singh: Yeah, that’s a great challenge. The way I think about software 2.0 going forward is, we are going to have a shift in how programmers think about programming, in the sense that they don’t have to specify each and every step of the computation imperatively. They can actually provide the intent at more high-level concepts, like test cases or maybe partial program. They can write piece of code, but keep some parts of the code empty or maybe in natural language they can specify what they want this particular function to do. And then we’re going to have these synthesizers or smart compilers that would take these ambiguous, under-specified intent and convert that into code for us or something that we can execute. It’s still unclear what exactly that future is going to be.
Rishabh Singh: And hopefully at some point we’ll converge to one common framework where we can have this notion of multi-modal specification, where we allow all different ways to specify the intent and have these smart compilers or synthesis systems that can understand them and generate code that satisfies the intent.
Host: So, the sensational way of describing Software 2.0 is that, at some point, humans will stop writing code and the machines will take over. But what I’m hearing is that this complementarity of humans and machines is going to remain and that it’s never going to be, you know, this is going to replace Software 1.0…
Rishabh Singh: Yeah, that’s actually a great way to put it, Gretchen, yeah, basically it’s not the case that the goal is to replace programmers; the goal is actually to empower programmers to achieve even more. And even here actually, programmers still need to provide the specification or intent on what the synthesis system needs to perform. The way we have evolved over time, we used to have these low-level assembly languages, in the beginning, where somebody would go in and write all these low-level assembly instructions, move data from one register to another. Then we evolved into high-level languages where the idea was instead of writing assembly code, we’ll have compilers which will go from high-level languages to assembly, and programmers can actually specify, write their programs in a high-level of abstraction. Then we’ll have these more, even smarter compilers that can take those kinds of ambiguous, under-specified intent and still be able to generate code that is consistent. Yeah, we’re not going to replace programmers, but just make them even more – or basically make everybody a super programmer, right?
Host: So, along the spectrum of ability, you are going to make professional programmers be able to do better at what they do, and programmers that aren’t even programmers, be able to program within specific parameters.
Rishabh Singh: Yeah, exactly. There’s this whole spectrum, in the sense, yeah, we can make developers even more productive. But, yeah, there’s this large class of audience who are not necessarily developers or programmers, but we can also enable them to perform tasks, like programmatic tasks in various domains, without having to know how to program things, yeah. To be able to… for them to specify what they want the system to do at a high-level abstraction and let the machine generate the code for them.
Host: That’s really exciting. I mean, that’s enabling access to what’s usually been reserved for a very small subset of the population on the planet.
Rishabh Singh: Yeah, and that’s something I’m actually most excited about, where we can build systems which can enable people from all different professions to use computing and achieve these tasks that require programming of some sort.
Host: Yeah, yeah. So, let’s talk about fuzzing for a minute. Regular fuzzing versus neural fuzzing. Will you unpack that for us?
Rishabh Singh: Oh, sure. Actually, let me start by – so fuzzing is this process where which has had quite remarkable success in almost every software company. The idea is actually, we want to write code that is secure, that doesn’t have any bugs. And the idea of fuzzing is actually to automate some of that testing process. So, instead of a human or a tester writing these programs to test, the idea is, we’ll have a process that can generate random inputs. It can either take an input, put it randomly again and again, and check if any one of those inputs crashes the system. Then we know there is something wrong with the code and we also have an input that triggered the bug. It has been actually quite successful in various companies and it has found many, many safety-critical bugs or security bugs. So, this works quite well for binary formats where the inputs are, let’s say, an image and I can put up pixels and check if any one of them crashes the system or leads to some vulnerability. But when we have more structured input formats, let’s say a PDF file or a Word file or a .doc format, the problem is, if we make random changes to those files, very quickly they would become invalid files. And the parser would just say, “Oh this is an invalid PDF file.” So, one of the challenges is how do we do fuzzing for these more structured formats such that we can still generate new random inputs, but that still adheres to the grammar of the format? And that’s where we were using some of these neural synthesis techniques to be able to learn grammars. And over time, it turns out actually, it’s able to automatically learn some notion of grammar of these formats that makes the fuzzing process much more efficient. It only generates good-looking inputs. And it also reflects then when we were trying out these new systems that are augmented with these neural grammar learning techniques, they are able to generate inputs that cover more parts of the code. And they also find many more crashes than the current state-of-the-art system. So that was also a really nice application of learning high-level representations of different formats. It’s actually interesting now with recent advances in these learning techniques, the systems are becoming more efficient. So now we are seeing more and more usage of these learning techniques for fuzzing.
Host: Talk a little bit about your path to Microsoft Research from your beginnings in India. What got you into this and how did you end up here of all places?
Rishabh Singh: Oh, that’s – that’s quite deep. Yeah, so actually, um, growing up, I was in India and then I did my undergraduate at IIT Kharagpur in computer science. So, I really liked mathematics, but at the same time, this notion of being able to automate things by writing programs, that was quite fascinating, and I joined MIT after finishing up my undergraduate degree to do research in software verification. That was my starting point. But then, slowly I got more interested in program synthesis where, instead of writing code and then doing verification, what if actually we write code that was correct by construction? So then, after graduating, I was considering academic positions and Microsoft Research. And I really liked the atmosphere here, we had great sets of researchers, but also, I think MSR, it has some unique advantages compared to academic places which I really liked. And then I decided to be here, and it’s been almost 3 and a half, 4 years now. It’s been quite a lot of fun.
Host: Yeah. So, as we think of your future, what excites you right now? What are you working on? I know your, sort of, nugget is program synthesis, but is there any new thread coming off of that that’s got you really excited?
Rishabh Singh: Yeah, that’s something at Microsoft Research, I found which was quite unique compared to other places. When we formed this group, Cognition group, we were able to bring in researchers from many different backgrounds, come together and work on this larger vision of doing neural program synthesis. And for a long time, we were using more algorithmic and more rule-based systems to perform search. Whereas, I think the fascinating thing has been in the last few years we’ve been building these architectures and learning search algorithms from data itself. And that’s something I’m actually quite excited about. And also this notion of actually combining these more neural approaches with more symbolic approaches, and this combination of symbolic AI and neural AI is something also quite exciting and that’s what something in the group also we are hoping to, explore even more going forward.
Host: Yeah, yeah. If you had to say, at the end of your career, “This is what I would define success as, in my field,” what would be the definition of success?
Rishabh Singh: So, I think one of my dreams… I don’t know when we will ever get there, or if we will ever get there, but… so, right now we are able to synthesize small snippets, few lines, but if we can build a system that can actually be able to assist developers write sizable amounts of code from high-level specifications or high-level intent, that would be quite magical if we can do that. And that’s one of my dreams at maybe end of the career if I can see that happening; that would be super fantastic, yeah.
Host: Rishabh Singh, it’s been great having you.
Rishabh Singh: Thanks a lot, Gretchen, for having me, yeah. It was great to talk to you.
To learn more about Dr. Rishabh Singh’s work, and the fascinating advances in machine learning and neural program synthesis, visit Microsoft.com/research