RNA-seq Commands

Commands and Resources

Use this page as the command-line reference for the RNA-seq tutorial series. The original STAR workflow is preserved for learning, while the HISAT2 workflow is marked as the recommended path for the current run.

Section A: STAR Reference Section B: HISAT2 Recommended

Before you run commands

  • Confirm your read layout: paired-end or single-end.
  • Keep your reference FASTA and GTF from the same genome build.
  • Adjust `sample_R1.fastq.gz` and similar placeholders to match your real filenames.
  • Create output folders before running tools that write into them.

Section A: Original planned STAR workflow

This section is included for learning and reference. Keep it available so viewers understand the planned approach, even if the actual local workflow switched to another aligner.

Conda environment and package installation Reference workflow setup
conda create -n rnaseq_tutorial python=3.10 -y
conda activate rnaseq_tutorial
conda install -c bioconda fastqc cutadapt star samtools subread -y
Suggested folder structure Create clear output directories first
mkdir -p raw_data trimmed_data qc_reports reference star_index alignments counts logs
Dataset download placeholder Replace with your actual source URL or accession logic
cd raw_data
wget <dataset_url> -O sample_R1.fastq.gz
wget <dataset_url> -O sample_R2.fastq.gz
Raw FastQC Check baseline read quality
fastqc raw_data/sample_R1.fastq.gz raw_data/sample_R2.fastq.gz -o qc_reports
Cutadapt trimming Example paired-end trimming block
cutadapt \
  -a AGATCGGAAGAGC \
  -A AGATCGGAAGAGC \
  -o trimmed_data/sample_trimmed_R1.fastq.gz \
  -p trimmed_data/sample_trimmed_R2.fastq.gz \
  raw_data/sample_R1.fastq.gz raw_data/sample_R2.fastq.gz
Reference genome and annotation download Use matched build files
cd reference
wget <genome_fasta_url> -O genome.fa.gz
wget <annotation_gtf_url> -O annotation.gtf.gz
gunzip genome.fa.gz
gunzip annotation.gtf.gz
STAR index building Reference workflow index
STAR \
  --runThreadN 8 \
  --runMode genomeGenerate \
  --genomeDir star_index \
  --genomeFastaFiles reference/genome.fa \
  --sjdbGTFfile reference/annotation.gtf \
  --sjdbOverhang 99
STAR alignment Reference-only workflow block
STAR \
  --runThreadN 8 \
  --genomeDir star_index \
  --readFilesIn trimmed_data/sample_trimmed_R1.fastq.gz trimmed_data/sample_trimmed_R2.fastq.gz \
  --readFilesCommand zcat \
  --outFileNamePrefix alignments/sample_ \
  --outSAMtype BAM SortedByCoordinate

Section B: Working local workflow used in practice

This is the recommended workflow for the current tutorial run. It uses HISAT2-based alignment and keeps the rest of the upstream analysis chain explicit and reproducible.

Conda environment setup Recommended workflow setup
conda create -n rnaseq_hisat2 python=3.10 -y
conda activate rnaseq_hisat2
conda install -c bioconda fastqc cutadapt hisat2 samtools subread -y
Directory setup Recommended folder layout
mkdir -p raw_data trimmed_data qc_reports reference hisat2_index alignments counts logs
Dataset download Adjust to your dataset source
cd raw_data
wget <dataset_url> -O sample_R1.fastq.gz
wget <dataset_url> -O sample_R2.fastq.gz
Raw FastQC QC before trimming
fastqc raw_data/sample_R1.fastq.gz raw_data/sample_R2.fastq.gz -o qc_reports
Cutadapt trimming Paired-end example
cutadapt \
  -a AGATCGGAAGAGC \
  -A AGATCGGAAGAGC \
  -o trimmed_data/sample_trimmed_R1.fastq.gz \
  -p trimmed_data/sample_trimmed_R2.fastq.gz \
  raw_data/sample_R1.fastq.gz raw_data/sample_R2.fastq.gz
Reference genome and annotation Matched genome build required
cd reference
wget <genome_fasta_url> -O genome.fa.gz
wget <annotation_gtf_url> -O annotation.gtf.gz
gunzip genome.fa.gz
gunzip annotation.gtf.gz
HISAT2 index building Recommended aligner index
hisat2-build reference/genome.fa hisat2_index/genome_index
HISAT2 alignment Recommended workflow aligner step
hisat2 \
  -x hisat2_index/genome_index \
  -1 trimmed_data/sample_trimmed_R1.fastq.gz \
  -2 trimmed_data/sample_trimmed_R2.fastq.gz \
  -S alignments/sample.sam
SAM to BAM conversion and sorting BAM processing
samtools view -bS alignments/sample.sam > alignments/sample.bam
samtools sort alignments/sample.bam -o alignments/sample.sorted.bam
BAM indexing and summary Quick validation step
samtools index alignments/sample.sorted.bam
samtools flagstat alignments/sample.sorted.bam > logs/sample.flagstat.txt
FeatureCounts Gene-level count generation
featureCounts \
  -a reference/annotation.gtf \
  -o counts/gene_counts.txt \
  alignments/sample.sorted.bam