Synthetic biologists, our group included, have been drawn by the power of CRISPR technologies1 to completely revolutionize how we program cell behavior, for example, by designing programmable CRISPR-based genetic circuits. Genetic circuits, like digital ones, take input signals from the surrounding environment (e.g. relevant biologics, chemicals, temperature, UV), and through a series of logic gates, perform computation or “signal processing”, with useful output that can be read out or passed to biological actuators within a programmed cell. To date, most genetic circuits have been built using transcription factors (TFs), which have inherent limitations: (1) only a handful of orthogonal TFs have been characterized2, (2) proteins require a lot of encoding space (i.e. more DNA), and (3) TFs are toxic when expressed at too high of a protein level. These issues have generally limited TF-based genetic circuits to about 10 regulators, where each regulator emulates a logic function (e.g. a NOT-gate)
In our first year of graduate school, during our synthetic biology graduate course, we were brainstorming novel types of genetic circuits to work on for a collaborative side-project. We came up with an idea for a large-scale analog genetic circuit, with potentially thousands of regulators, to be implemented in Escherichia coli. Even to this day, this scale and complexity has not been realized in an engineered biological program, but we realized CRISPR could help get us there. A CRISPR-based circuit would be relatively small in construct size, as only one protein “master” regulator (dCas9) is required to be co-expressed with each sgRNA regulator. We also noted that if the guides were designed to be orthogonal to the genome, off-target binding could be largely mitigated, minimizing cell toxicity. There seemed to be an issue however – most CRISPR based systems at the time had only 2-3 co-expressed sgRNAs, with few examples expressing up to 12 sgRNAs. The question was: What limits the scalability of CRISPR systems?
To address this, we conducted a “red teaming” exercise3 which involved simulating the design, synthesis, assembly, integration and expression of the proposed system to identify project failure points. One core design aspect stood out as the primary issue: Repetitive DNA causes problems.
In order to build our many-regulator CRISPR circuit, we wanted to order large arrays of sgRNAs on gBlock Gene Fragments from Integrated DNA Technologies, Inc. (IDT). Hierarchical assembly of individual sgRNAs using cloning methods such as Golden Gate or Gibson assembly was going to require too much cloning time and would generally limit the scale of the sgRNA array that we could build. Instead, we recognized that we could fit 13 sgRNAs on a 3 kb gBlock (the maximum size offered by IDT), and if we could order a series of gBlocks to assemble together, we could quickly be building systems with many co-expressed sgRNAs. One issue here though – gene synthesis companies couldn’t synthesize this design because of the repeated genetic elements (promoters, sgRNA handles, terminators). Existing gene fragment synthesis technology generally relies on assembly via hybridization between many small oligonucleotides, and if these oligos are repetitive, they mis-hybridize, and synthesis fails.
Intriguingly, during the red teaming exercise, we recognized another related failure point: repetitive DNA is susceptible to homologous recombination in vivo. In E. coli, the minimal length required for recombination to efficiently occur is approximately 20 base pairs4. We acknowledged that our engineered CRISPR systems could be a resource burden or interfere with native host processes, potentially introducing a selective phenotype, which then over time would most likely recombine to delete portions of our arrays. Repetitive DNA in designed genetic systems has been the cause of failure in a few reported cases, in other cases reported to our group but never mentioned during publication, and in potentially many more undiagnosed cases.
With these challenges in mind, we compiled and characterized a non-repetitive parts library including all of the regulatory elements needed for sgRNA array design: constitutive promoters, sgRNA handles, transcriptional terminators, and biologically neutral spacers (Fig. 1a). The most significant challenge here was to design unique sgRNA handles that performed equally as well as the wild-type handle. Work had been previously done to identify some of the sequence-structure features required for Cas9-handle recognition5,6, but no comprehensive mutagenesis study had been conducted. We tested non-repetitive sgRNA handles across 3 design-build-test-learn cycles using an in vivo mRFP1 reporter knockdown assay and machine learning to iteratively improve the design constraint, resulting in 28 highly functional non-repetitive handles for CRISPR-Cas9 applications.
We then designed and thoroughly characterized 3 “extra-long sgRNA arrays” (ELSAs), with up-to 22 coexpressed sgRNAs in E. coli, for various demonstrative applications including: a succinate producing strain via metabolic rewiring, a biocontainment strain with inducible multi-auxotrophy, and an antibiotic susceptible strain with reduced persister cell survival (Fig. 1b-c). Notably, to achieve synthesis success and genetic stability of these ELSAs, we developed and used an automated design algorithm, referred to as ELSA Calculator that combines the non-repetitive parts to have a maximum DNA repeat length (e.g. L=12 bp) and to satisfy 23 design rules pertaining to synthesis and stability (link below).
This work is the first of many steps towards realizing large scale CRISPR systems. We hope to soon see even larger toolboxes of non-repetitive sgRNA handles for both S. pyogenes Cas9, as well as other Cas proteins and CRISPR systems. Of note, the final design constraint for the sgRNA handle encompasses a design space including an estimated hundreds of thousands of non-repetitive handles that could be co-expressed! We’re excited to see how other researchers across the life sciences make use of ELSAs and we’re excited to see the future of non-repetitive DNA design in synthetic biology.
Doudna, J.A. & Charpentier, E. The new frontier of genome engineering with CRISPR-Cas9. Science 346, 1258096 (2014).
Stanton, B.C. et al. Genomic mining of prokaryotic repressors for orthogonal logic gates. Nature chemical biology 10, 99 (2014).
Hoffman, B.G. Red Teaming: How Your Business Can Conquer the Competition by Challenging Everything. (Crown business, 2017).
Shen, P. & Huang, H.V. Homologous recombination in Escherichia coli: dependence on substrate length and homology. Genetics 112, 441-457 (1986).
Briner, A.E. et al. Guide RNA functional modules direct Cas9 activity and orthogonality. Molecular cell 56, 333-339 (2014).
Nishimasu, H. et al. Crystal structure of Cas9 in complex with guide RNA and target DNA. Cell 156, 935-949 (2014).