Tag Archives: Jennifer

Researchers use AI to improve accuracy of gene editing with CRISPR

From left, Nicolo Fusi, a researcher at Microsoft, Jennifer Listgarten, who recently joined the faculty at UC Berkeley, and John Doench, an associate director at the Broad Institute, collaborated on a method of using AI to improve gene editing results. Photo by Dana J. Quigley.

A collaboration between computer scientists and biologists from research institutions across the United States is yielding a set of computational tools that increase efficiency and accuracy when deploying CRISPR, a gene-editing technology that is transforming industries from healthcare to agriculture.

CRISPR is a nano-sized sewing kit that can be designed to cut and alter DNA at a specific point in a specific gene.

The technology, for example, may lead to breakthrough applications such as modifying cells to combat cancer or produce high-yielding drought-tolerant crops such as wheat and corn.

Elevation, the newest tool released by the team, uses a branch of artificial intelligence known as machine learning to predict so-called off-target effects when editing genes with the CRISPR system.

Although CRISPR shows great promise in a number of fields, one challenge is that lots of genomic regions are similar, which means the nano-sized sewing kit can accidentally go to work on the wrong gene and cause unintended consequences – the so-called off-target effects.

“Off-target effects are something that you really want to avoid,” said Nicolo Fusi, a researcher at Microsoft’s research lab in Cambridge, Massachusetts. “You want to make sure that your experiment doesn’t mess up something else.”

Fusi and former Microsoft colleague Jennifer Listgarten, together with collaborators at the Broad Institute of MIT and Harvard, University of California Los Angeles, Massachusetts General Hospital and Harvard Medical School, describe Elevation in a paper published Jan. 10 in the journal Nature Biomedical Engineering.

Elevation and a complementary tool for predicting on-target effects called Azimuth are publicly available for free as a cloud-based end-to-end guide-design service running on Microsoft Azure as well as via open-source code.

Using the computational tools, researchers can input the name of the gene they want to modify and the cloud-based search engine will return a list of guides that researchers can sort by predicted on-target or off-target effects.

[embedded content]

Nature as engineer

The CRISPR gene-editing system is adapted from a natural virus-fighting mechanism. Scientists discovered it in the DNA of bacteria in the late 1980s and figured out how it works over the course of the next several decades.

“The CRISPR system was not designed, it evolved,” said John Doench, an associate director at the Broad Institute who leads the biological portions of the research collaboration with Microsoft.

CRISPR stands for “clustered regularly interspaced short palindromic repeats,” which describes a pattern of repeating DNA sequences in the genomes of bacteria separated by short, non-repeating spacer DNA sequences.

The non-repeating spacers are copies of DNA from invading viruses, which molecular messengers known as RNA use as a template to recognize subsequent viral invasions. When an invader is detected, the RNA guides the CRISPR complex to the virus and dispatches CRISPR-associated (Cas) proteins to snip and disable the viral gene.

Modern adaptations

In 2012, molecular biologists figured out how to adapt the bacterial virus-fighting system to edit genes in organisms ranging from plants to mice and humans. The result is the CRISPR-Cas9 gene editing technique.

The basic system works like this: Scientists design synthetic guide RNA to match a DNA sequence in the gene they want to cut or edit and set it loose in a cell with the CRISPR-associated protein scissors, Cas9.

Today, the technique is widely used as an efficient and precise way to understand the role of individual genes in everything from people to poplar trees as well as how to change genes to do everything from fight diseases to grow more food.

“If you want to understand how gene dysfunction leads to disease, for example, you need to know how the gene normally functions,” said Doench. “CRISPR has been a complete game changer for that.”

An overarching challenge for researchers is to decide what guide RNA to choose for a given experiment. Each guide is roughly 20 nucleotides; hundreds of potential guides exist for each target gene in a knockout experiment.

In general, each guide has a different on-target efficiency and a different degree of off-target activity.

The collaboration between the computer scientists and biologists is focused on building tools that help researchers search through the guide choices and find the best one for their experiments.

Several research teams have designed rules to determine where off-targets are for any given gene-editing experiment and how to avoid them. “The rules are very hand-made and very hand-tailored,” said Fusi. “We decided to tackle this problem with machine learning.”

Training models

To tackle the problem, Fusi and Listgarten trained a so-called first-layer machine-learning model on data generated by Doench and colleagues. These data reported on the activity for all possible target regions with just one nucleotide mismatch with the guide.

Then, using publicly available data that was previously generated by the team’s Harvard Medical School and Massachusetts General Hospital collaborators, the machine-learning experts trained a second-layer model that refines and generalizes the first-layer model to cases where there is more than one mismatched nucleotide.

The second-layer model is important because off-target activity can occur with far more than just one mismatch between guide and target, noted Listgarten, who joined the faculty at the University of California at Berkeley on Jan. 1.

Finally, the team validated their two-layer model on several other publicly available datasets as well as a new dataset generated by collaborators affiliated with Harvard Medical School and Massachusetts General Hospital.

Some model features are intuitive, such as a mismatch between the guide and nucleotide sequence, noted Listgarten. Others reflect unknown properties encoded in DNA that are discovered through machine learning.

“Part of the beauty of machine learning is if you give it enough things it can latch onto, it can tease these things out,” she said.

Off target scores

Elevation provides researchers with two kinds of off-target scores for every guide: individual scores for one target region and a single overall summary score for that guide.

Target scores are machine-learning based probabilities provided for every single region on the genome that something bad could happen. For every guide, Elevation returns hundreds to thousands of these off-target scores.

For researchers trying to determine which of potentially hundreds of guides to use for a given experiment, these individual off-target scores alone can be cumbersome, noted Listgarten.

The summary score is a single number that lumps the off-target scores together to provide an overview of how likely the guide is to disrupt the cell over all its potential off-targets.

“Instead of a probability for each point in the genome, it is what’s the probability I am going to mess up this cell because of all of the off-target activities of the guide?” said Listgarten.

End-to-end guide design

Writing in Nature Biomedical Engineering, the collaborators describe how Elevation works in concert with a tool they released in 2016 called Azimuth that predicts on-target effects.

The complementary tools provide researchers with an end-to-end system for designing experiments with the CRISPR-Cas9 system – helping researchers select a guide that achieves the intended effect – disabling a gene, for example – and reduce mistakes such as cutting the wrong gene.

“Our job,” said Fusi, “is to get people who work in molecular biology the best tools that we can.”

In addition to Listgarten, Fusi and Doench, project collaborators include Michael Weinstein from the University of California Los Angeles, Benjamin Kleinstiver, Keith Joung and Alexander A. Sousa from Harvard Medical School and Massachusetts General Hospital, and Melih Elibol, Luong Hoang, Jake Crawford and Kevin Gao from Microsoft Research.


John Roach writes about Microsoft research and innovation. Follow him on Twitter.

Tags: CRISPR, healthcare

Opportunities abound in the creative and collaborative culture of STEM

Jennifer Chayes, Distinguished Scientist and Managing Director, Microsoft Research New England & New York City

By Jennifer Chayes, Technical Fellow and Managing Director, Microsoft Research New England & New York City 

The explosion of data available today everywhere from biomedicine to the arts is opening new opportunities for researchers with backgrounds in science, technology, engineering and math to pursue creative and collaborative endeavors that have deep societal impact.

My research has always been interdisciplinary, which by nature is collaborative and creative. You need experts from different fields and you must make scientific leaps to bring the perspectives, results and methodology from one discipline to another. I encourage the scientists who work with me to trust their scientific intuition and make those leaps. Of course, we must also do the hard work to fill in the gaps after we leap. The work is deeply satisfying and delivers genuine societal impact.

The explosion of biomedical data, for example, is generating opportunities for researchers in STEM fields to make tremendous contributions to goals such as increasing the longevity and quality of life.

For example, a researcher in one of my labs worked in collaboration with physicians to develop a reinforcement-learning algorithm for a tool that provides personalized exercise incentives for diabetics. Different patients react differently to specific incentives at specific times. A text message saying that you are exercising less than 90 percent of other patients might cause one patient to exercise more and another to give up exercise altogether. The reinforcement learning algorithm produces the right messages for the right patients at the right time. These personalized messages significantly improve most patients’ exercise regimes and, in some cases, decrease the need for diabetes medication. This reinforcement learning algorithm is now being applied more widely.

My labs are also involved in a project that aims to predict which cancer patients are good candidates for specific cancer immunotherapy drugs. These drugs enhance the ability of our immune systems to go after cancer cells. The approach is more targeted than chemotherapy or radiation, more effective at eliminating cancer cells and less damaging to non-cancerous tissue. Cancer immunotherapy is a new field with more questions than answers. The organization Stand Up to Cancer is sponsoring about ten researchers from my labs to work on a host of projects with biologists and oncologists from many top biomedical institutions to answer these questions.

For example, individuals differ not only in their genomes, but also in the makeup of their immune cells. Two people who are genetically quite similar can develop vastly different repertoires of T-cells, which are white blood cells that scan for abnormalities and infections. An interplay of our genomes, our environments and our T-cell repertoires determines how well we might react to a specific cancer immunotherapy drug. This interplay leads to a high-dimensional sparse-statistics problem that new techniques in machine learning and statistics can help solve.

A new field forming at the boundary of AI and ethics is another area where STEM researchers are working collaboratively and creatively to generate substantial societal impact. Researchers are developing machine-learning models, for example, to de-bias data sets and make decisions that are fairer than those made by humans. Computer scientists, ethicists, lawyers and social scientists are collaborating to formulate notions of fairness, accountability and transparency. This work will improve the fairness of search engine outputs as well as job placement, school admission and legal decisions, for example.

Biomedicine and ethical data-driven decision-making are just two fields where the explosion of data is opening opportunities for STEM researchers to generate impact. Opportunities for impact are everywhere, from the social sciences to the arts. The keys to success are to follow your passion, develop the confidence to let your scientific intuition lead the way and then do the work to realize your vision.