RNA-seq Commands

Commands and Resources

Use this page as the command-line reference for the RNA-seq tutorial series. The original STAR workflow is preserved for learning, while the HISAT2 workflow is marked as the recommended path for the current run.

Section A: STAR Reference Section B: HISAT2 Recommended

Before you run commands

Confirm your read layout: paired-end or single-end.
Keep your reference FASTA and GTF from the same genome build.
Adjust `sample_R1.fastq.gz` and similar placeholders to match your real filenames.
Create output folders before running tools that write into them.

Section A: Original planned STAR workflow

This section is included for learning and reference. Keep it available so viewers understand the planned approach, even if the actual local workflow switched to another aligner.

Conda environment and package installation Reference workflow setup

conda create -n rnaseq_tutorial python=3.10 -y
conda activate rnaseq_tutorial
conda install -c bioconda fastqc cutadapt star samtools subread -y

Suggested folder structure Create clear output directories first

mkdir -p raw_data trimmed_data qc_reports reference star_index alignments counts logs

Dataset download placeholder Replace with your actual source URL or accession logic

cd raw_data
wget <dataset_url> -O sample_R1.fastq.gz
wget <dataset_url> -O sample_R2.fastq.gz

Raw FastQC Check baseline read quality

fastqc raw_data/sample_R1.fastq.gz raw_data/sample_R2.fastq.gz -o qc_reports

Cutadapt trimming Example paired-end trimming block

cutadapt \
  -a AGATCGGAAGAGC \
  -A AGATCGGAAGAGC \
  -o trimmed_data/sample_trimmed_R1.fastq.gz \
  -p trimmed_data/sample_trimmed_R2.fastq.gz \
  raw_data/sample_R1.fastq.gz raw_data/sample_R2.fastq.gz

Reference genome and annotation download Use matched build files

cd reference
wget <genome_fasta_url> -O genome.fa.gz
wget <annotation_gtf_url> -O annotation.gtf.gz
gunzip genome.fa.gz
gunzip annotation.gtf.gz

STAR index building Reference workflow index

STAR \
  --runThreadN 8 \
  --runMode genomeGenerate \
  --genomeDir star_index \
  --genomeFastaFiles reference/genome.fa \
  --sjdbGTFfile reference/annotation.gtf \
  --sjdbOverhang 99

STAR alignment Reference-only workflow block

STAR \
  --runThreadN 8 \
  --genomeDir star_index \
  --readFilesIn trimmed_data/sample_trimmed_R1.fastq.gz trimmed_data/sample_trimmed_R2.fastq.gz \
  --readFilesCommand zcat \
  --outFileNamePrefix alignments/sample_ \
  --outSAMtype BAM SortedByCoordinate

Section B: Working local workflow used in practice

This is the recommended workflow for the current tutorial run. It uses HISAT2-based alignment and keeps the rest of the upstream analysis chain explicit and reproducible.

Conda environment setup Recommended workflow setup

conda create -n rnaseq_hisat2 python=3.10 -y
conda activate rnaseq_hisat2
conda install -c bioconda fastqc cutadapt hisat2 samtools subread -y

Directory setup Recommended folder layout

mkdir -p raw_data trimmed_data qc_reports reference hisat2_index alignments counts logs

Dataset download Adjust to your dataset source

cd raw_data
wget <dataset_url> -O sample_R1.fastq.gz
wget <dataset_url> -O sample_R2.fastq.gz

Raw FastQC QC before trimming

fastqc raw_data/sample_R1.fastq.gz raw_data/sample_R2.fastq.gz -o qc_reports

Cutadapt trimming Paired-end example

cutadapt \
  -a AGATCGGAAGAGC \
  -A AGATCGGAAGAGC \
  -o trimmed_data/sample_trimmed_R1.fastq.gz \
  -p trimmed_data/sample_trimmed_R2.fastq.gz \
  raw_data/sample_R1.fastq.gz raw_data/sample_R2.fastq.gz

Reference genome and annotation Matched genome build required

cd reference
wget <genome_fasta_url> -O genome.fa.gz
wget <annotation_gtf_url> -O annotation.gtf.gz
gunzip genome.fa.gz
gunzip annotation.gtf.gz

HISAT2 index building Recommended aligner index

hisat2-build reference/genome.fa hisat2_index/genome_index

HISAT2 alignment Recommended workflow aligner step

hisat2 \
  -x hisat2_index/genome_index \
  -1 trimmed_data/sample_trimmed_R1.fastq.gz \
  -2 trimmed_data/sample_trimmed_R2.fastq.gz \
  -S alignments/sample.sam

SAM to BAM conversion and sorting BAM processing

samtools view -bS alignments/sample.sam > alignments/sample.bam
samtools sort alignments/sample.bam -o alignments/sample.sorted.bam

BAM indexing and summary Quick validation step

samtools index alignments/sample.sorted.bam
samtools flagstat alignments/sample.sorted.bam > logs/sample.flagstat.txt

FeatureCounts Gene-level count generation

featureCounts \
  -a reference/annotation.gtf \
  -o counts/gene_counts.txt \
  alignments/sample.sorted.bam