Quick Intro to BioPython

Jacob Heyman
2 min readFeb 28, 2021

--

taking a look at BioPython and some applications of the package

A Little Personal Background

My introduction to Data Science was through my research in Bioinformatics. In our lab, we where analyzing the genomic data of algae biofuel candidates. My initial job in the lab was manually looking for telomere sequences in one of our algae candidates that had just been sequenced. I found odd duplications of the telomere sequences in the genome, which lead us to believe there where sequencing errors in the genome. To clean the errors from the genome, we resolved to use python and BioPython to parse through the 2.5 million basepairs of the genome, and remove sequence redundancies without compromising the overall genome. To achieve this we used BioPython which is made to handle complex computations with DNA data.

What Can BioPython Do

BioPython is a free tool designed to work with the complex nature of DNA data. DNA is a string data type with 4 basic components that code for complex proteins. One neat trick with biopython is Seq object. This biopython object lets you preform biological processes, for example translation of mRNA with code.

from Bio.Seq import Seq
messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG")
messenger_rna.translate()
Seq('MAIVMGR*KGAR*')

In this code snippet from BioPythons documentation page, you can see how a string of DNA data is saved as a Seq object and translated into the corresponding amino acids using the .translate method. This tool can be extremely useful to quickly observe the coded proteins in a sample of DNA.

my_rna
Seq('AGUACACUGGU')
my_rna.back_transcribe().reverse_complement()
Seq('ACCAGTGTACT')

In this example, the reverse complement sequence for an RNA is returned.

SeqIO and working with Genomic data

One of the main struggles of working with Genomic data, is the sheer size and complexity of the data. SeqIO is an input output interface for simple and uniform sequence analysis. SeqIO primarily functions to input data and return SeqRecord which contain the sequence information such as a description, the gene name, sequence length and additional annotations. SeqIO allows you to keep your data organized and labeled while you preform further analysis or make changes/ edits to the data.

Wrap up

This was just a very brief description of some basic functions of BioPython. The package is essential for data scientists working with genomic data. The ability to parse, annotate and identify key features in DNA greatly increases our ability to analyze genetic data and improve research methods.

--

--

Jacob Heyman
Jacob Heyman

Written by Jacob Heyman

Data Scientist With A Background in Biology

No responses yet