Bioinformatics Analysis Term Review

Jacob Heyman
3 min readApr 12, 2021

A quick review of some essential Genome Sequence analysis terminology

Next Generation Sequence Analysis

Next Generation Sequencing or (NGS) is the current method for sequencing genomic data. The process is a massive parallel sequencing of DNA that allows for fast and accurate genome sequencing. An accurate genome is essential for multiple applications across research and medicine. When working with NGS data, understanding the terminology is essential for beginning any analysis. In this short blog I will cover those essential terms so anyone can get a start on understanding NGS.

NGS Terminology

Contig: The joined collection of overlapping sequence clones

Scaffold: The connection of contigs through the linkage of paired ends. (the paired ends are from the Bacterial artificial chromosome vectors used to clone large segments of DNA)

Fingerprint clone contigs: The formation of contigs using their restrictive digest fingerprints.

Sequence clone layout: Sequence clones assigned to a physical map of fingerprint clone contigs.

Initial Sequence contigs: Merging of overlapping reads from a single clone to form a contig.

Merged sequence contigs: The production of new contigs using the overlapping section of the initial sequence contigs.

N50 Length: Contig or scaffold size

STS: Sequence Tagged site (easily PCR amplified sites in the genomoe)

EST: Expressed sequence tag

SSR: Simple sequence repeat, large sections of repeated sequences that are good for mapping.

SNP: Single nucleotide polymorphism, a single basepair mutation in a sequenced gene.

Wrap up

These are some of the most common terms when it comes to sequence analysis. I hope that these visualizations and brief descriptions are useful for understanding some of the basics of genome data analysis.

--

--