Deep learning applications in biology — the publication trend

Encode Box
4 min readJul 23, 2019

--

Deep learning has been disrupting many areas, including biology. The following two resources are very good collections of deep learning applications in biology. We can get some general ideas on how deep learning has been applied in biology.

  1. https://github.com/greenelab/deep-review
  2. https://github.com/hussius/deeplearning-biology

Is deep learning in biology/drug development a hype? Am I too late to the party?

I cannot directly provide answers for you, but I can help list some data and resources (like the two above) for you to find the answers. As a start, this post focuses on the trend of related publications. (Many posts will follow.)

PubMed is like a google scholar in biology. Its bibliographic database (called MEDLINE) contains more than 25 million references to journal articles in life sciences with a concentration on biomedicine.

I used a simple search query to retrieve relevant papers in PubMed.

(“deep learning”[Title/Abstract]) AND (“neural network”[Title/Abstract] or “layer”[Title/Abstract] or “convolutional”[Title/Abstract])

If only using “deep learning”, a lot of old papers about education would show up, so I added a few other keywords. The database and the search query I used cannot cover all the publications about deep learning in biology, nor can they filter out deep learning applications in other fields, but at least I believe the trend should be relatively accurate.

Number of papers about deep learning in biology published in each year (as of 07/22/2019)

The earliest paper in the result was “Discovering Binary Codes for Documents by Learning Deep Generative Models” in 2011 by Geoffrey Hinton, the deep learning pioneer, but this was actually not a biological application. The earliest paper about deep learning application in biology in the result, I think, was “Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules” in 2014. Surprisingly, it was not about image analysis.

The number of related publications has been increasing ~3 times a year in the last 5 years. Comparing with other two revolutionary technologies in biology, next-generation sequencing (NGS) and CRISPR, the uprising of deep learning just started but is much quicker. (Number of papers about NGS and CRISPR increased ~2 times a year in the beginning.)

Number of papers about next-generation sequencing published in each year (as of 07/22/2019)
Number of papers about CRISPR published in each year (as of 07/22/2019)

More importantly, we are currently at the convergence of NGS, CRISPR, and deep learning.

“It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity,…”

Who published the most?

  1. 33. Shen, Dinggang. The University of North Carolina at Chapel Hill. Medical image analysis
  2. 13. Liu, Fang. The University of Wisconsin-Madison. Medical image analysis
  3. 12. Xing, Lei. Stanford University. Medical image analysis
  4. 11. Acharya, U Rajendra. Singapore University of Social Science. Biomedical signal processing
  5. 11. Nei, Dong. (Ph.D. student in Dinggang Shen’s lab, Top 1 in the list) The University of North Carolina at Chapel Hill. Medical image analysis
  6. 9. Baldi, Pierre. University of California, Irvine. Chemoinformatics
  7. 9. Chan, Heang-Ping. The University of Michigan. Medical image analysis
  8. 9. Chen, Hao. The Chinese University of Hong Kong. Medical image analysis
  9. 9. Dou, Qi. The Chinese University of Hong Kong (currently at Imperial College London). Medical image analysis
  10. 9. Park, Kang Ryoung. Dongguk University. Medical image analysis
  11. 9. van Ginneken, Bram. Radboud University. Medical image analysis

I manually checked the names and separated the papers from different persons with the same name.

Which journal published the most?

  1. 207. Sensors (Basel)
  2. 84. Scientific Report
  3. 63. Engineering in Medicine and Biology Society, Annual International Conference of the IEEE
  4. 63. Medical Image Analysis
  5. 63. Medical Physics
  6. 62. PLoS ONE
  7. 59. IEEE Transactions on Image Processing
  8. 58. Bioinformatics
  9. 55. IEEE Transactions on Medical Imaging
  10. 53. Computers in Biology and Medicine

What are the most frequent keywords?

Top keywords
Word cloud of paper titles

Clearly, medical image analysis (mostly in cancer MRI/CT images) is the area that deep learning can be directly applied. Many other areas such as cheminformatics, medical signaling processing, and bioinformatics still have relatively fewer applications, but they are emerging.

There are so many things we can do, not just simply applying deep learning algorithm to any biological problems, but really thinking about the connections between the biological question and the deep learning algorithms. I can think of many examples (not just image analysis) that deep learning is a good fit, directly or with a little tweak. Stay tuned. I will illustrate many ideas (including some of mine) in the future.

R code:

--

--

Encode Box
Encode Box

No responses yet