Member-only story

Is DNA all you need?

Most genetic information is stored in DNA sequences. The accumulated genomic data is also much larger than other omic data, such as transcriptomic (RNA) and proteomic (protein). When building large foundational models for biology, should we only consider DNA sequences?

Researchers from the Arc Institute, Stanford, and TogetherAI recently developed a large DNA-only foundational model called Evo. It has 7 billion parameters and was trained using 85k prokaryotic genomes with a total of 300 billion nucleotides. The model performed well across multiple tasks, including those related to RNA and proteins.

  • Zero-short protein fitness prediction
  • Zero-short non-coding RNA fitness prediction
  • Zero-shot mRNA expression prediction
  • Zero-short protein expression prediction
  • Zero-shot gene essentiality prediction
  • Generative design of CRISPR-Cas system
  • Generative design of transposable element
  • Generating genome sequences containing plausible high-level genomic organization

--

--

Encode Box
Encode Box

No responses yet