From FASTQ to BAM: A Beginner’s Guide
From FASTQ to BAM: A Beginner’s Guide
Welcome to the part of bioinformatics where the work gets real-where clean theory meets noisy data. You’ve your new FASTQ files from the sequencer and now need to have a clear way of analysis. Moreover the following article, we have been creating a free course for you, to start from scratch. the course is available here.
We believe the most crucial phase of any NGS analysis pipeline starts with converting FASTQ files into clean, well-aligned BAM files. Everything that comes after depends on the quality of this foundation.
1. What You’re Actually Looking At
A FASTQ file contains reads – short sequences of DNA or RNA – plus a quality score for each base. They often come in pairs (R1 and R2), because most modern and reliable sequencing is paired-end.
A read in a FASTQ file is like this:
@SRR12345.1
AGCTTAGCTAAGCTTAGCTA
+
??BFGGGHHHIIJJJJJJIJ
Each 4-line block is one read: sequence ID, nucleotide sequence, a +(comment), and the quality string.
2. Quality Control
Before doing anything, you need to know whether your reads are decent. Sequencing machines make mistakes – adapters, low-quality tails, random garbage bases.
Tools: FASTQC, 123FASTQ
You’ll get an HTML report. Pay attention to:
- Per base sequence quality (should mostly be green)
- Adapter content (ideally zero)
- Overrepresented sequences (often contamination)
If it looks like a downhill ski slope, your data quality stinks.
3. Trimming: Cleaning Before Cooking
You remove adapters and low-quality bases using a trimmer.
Tools: 123FASTQ, TRIMMOMATIC, Fastp
After trimming, run the second step again to confirm the reads look good enough.
4. Alignment: Putting Reads Where They Belong
This is the step where your sequences are mapped back to a reference genome (human, mouse, etc.) so you know where they came from.
Tools:
- HISAT2 or STAR for RNA-seq
- BWA-MEM, HISAT2, or STAR for DNA-seq
This creates a SAM file (text-based). It’s a massive file and needs compression.
5. Convert SAM to BAM (Because Nobody Likes Gigantic Text Files)
The SAM file is converted to BAM – a binary format that’s smaller and indexable. Keep in mind that always store the BAM files sorted.
Tool: SAMtools
Now you have your sorted BAM file — the standard format for aligned reads. You can visualize it in IGV or feed it into downstream tools.
Up to this step is almost same in all the pipelines created till now. To gain detailed understanding of these steps visit here, our completely free course for you!