Powerful CRISPR/Cas9 screens via computational prediction of DNA repair profile

Wellcome Trust Sanger Institute

Dr Felicity Allen’s project will make novel large-scale measurements of DNA mutations generated by CRISPR/Cas9 to build a predictive machine learning model of gene editing outcome. This will resolve costly issues of redundant design and inaccurate quantification of powerful genome-wide gene knockout experiments.

CRISPR/Cas9, a recently discovered DNA editing system, is revolutionising biological research across medicine, agriculture and fundamental cell biology. Its perhaps most simple, yet exciting application disables a gene to test whether this affects a chosen cell function, such as correct development or cancerous growth. Large-scale experimental designs allow this to be carried out for all genes in the human genome simultaneously, in a comprehensive, unbiased fashion. While this has transformed the way we answer a wide range of biological questions, we still cannot efficiently measure or predict the exact editing outcome within each gene. As this impacts whether the gene is effectively disabled, it limits the power of the overall method. In this project, Felicity will build models and tools to solve this problem.

The CRISPR/Cas9 system disables a gene by causing short insertions or deletions in its DNA sequence. Only some of these possible mutations will succeed in preventing the gene from functioning, and despite their central role, it is too laborious and expensive to measure which editing outcomes occurred within each experiment.

It is known that the generated mutations are not random, and depend on the DNA sequence of the gene target, which motivates a systematic study of the exact nature of this link, and the development of a predictive tool, as proposed here.

Together with collaborators at the Wellcome Trust Sanger Institute, Felicity has designed a novel approach to efficiently measure the mutations for 90,000 CRISPR/Cas9 gene edits. With the data from this experiment, Felicity will use statistical and machine learning methods to develop a model that predicts the distribution of mutations for each gene target. She will then produce a design tool that uses these predictions to select targets with desirable mutations, as well as an analysis tool that accounts for variability in editing outcomes when determining causal genes from large-scale CRISPR/Cas9 experiments. These will be the first public CRISPR/Cas9 tools to be informed by the diversity of generated mutations, and will explicate experimental results for researchers using the transformative CRISPR/Cas9 technology worldwide.