Member-only story
Is DNA all you need?
AI + Medicine Newsletter 2024–03–05
2 min readMar 14, 2024
Most genetic information is stored in DNA sequences. The accumulated genomic data is also much larger than other omic data, such as transcriptomic (RNA) and proteomic (protein). When building large foundational models for biology, should we only consider DNA sequences?
Researchers from the Arc Institute, Stanford, and TogetherAI recently developed a large DNA-only foundational model called Evo. It has 7 billion parameters and was trained using 85k prokaryotic genomes with a total of 300 billion nucleotides. The model performed well across multiple tasks, including those related to RNA and proteins.
- Zero-short protein fitness prediction
- Zero-short non-coding RNA fitness prediction
- Zero-shot mRNA expression prediction
- Zero-short protein expression prediction
- Zero-shot gene essentiality prediction
- Generative design of CRISPR-Cas system
- Generative design of transposable element
- Generating genome sequences containing plausible high-level genomic organization