nf-core Samplesheet Creation
Create and validate input samplesheets for nf-core pipelines.
Quick Process
- •Identify Pipeline: Know which pipeline you're creating a samplesheet for
- •Check Schema: Find the samplesheet schema in
assets/schema_input.json - •Create Template: Generate CSV/TSV with required columns
- •Populate Data: Add your sample information
- •Validate: Run pipeline with
--inputto validate
Common Samplesheet Formats
RNA-seq (nf-core/rnaseq)
csv
sample,fastq_1,fastq_2,strandedness SAMPLE1,/path/to/sample1_R1.fastq.gz,/path/to/sample1_R2.fastq.gz,auto SAMPLE2,/path/to/sample2_R1.fastq.gz,/path/to/sample2_R2.fastq.gz,auto SAMPLE3_SE,/path/to/sample3.fastq.gz,,auto
ATAC-seq (nf-core/atacseq)
csv
sample,fastq_1,fastq_2,replicate SAMPLE1,/path/to/sample1_R1.fastq.gz,/path/to/sample1_R2.fastq.gz,1 SAMPLE1,/path/to/sample1_rep2_R1.fastq.gz,/path/to/sample1_rep2_R2.fastq.gz,2 SAMPLE2,/path/to/sample2_R1.fastq.gz,/path/to/sample2_R2.fastq.gz,1
Variant Calling (nf-core/sarek)
csv
patient,sex,status,sample,lane,fastq_1,fastq_2 PATIENT1,XX,0,SAMPLE1_NORMAL,lane1,/path/to/normal_R1.fastq.gz,/path/to/normal_R2.fastq.gz PATIENT1,XX,1,SAMPLE1_TUMOR,lane1,/path/to/tumor_R1.fastq.gz,/path/to/tumor_R2.fastq.gz
Metagenomics (nf-core/mag)
csv
sample,group,short_reads_1,short_reads_2,long_reads SAMPLE1,GROUP1,/path/to/sample1_R1.fastq.gz,/path/to/sample1_R2.fastq.gz, SAMPLE2,GROUP1,/path/to/sample2_R1.fastq.gz,/path/to/sample2_R2.fastq.gz,/path/to/sample2.ont.fastq.gz
Amplicon (nf-core/ampliseq)
csv
sampleID,forwardReads,reverseReads,run SAMPLE1,/path/to/sample1_R1.fastq.gz,/path/to/sample1_R2.fastq.gz,run1 SAMPLE2,/path/to/sample2_R1.fastq.gz,/path/to/sample2_R2.fastq.gz,run1
Finding Schema Requirements
Check Pipeline Documentation
bash
# Visit pipeline page # e.g., https://nf-co.re/rnaseq/usage#samplesheet-input
Check Schema File
bash
# Look for input schema cat assets/schema_input.json
Check nextflow_schema.json
bash
# Find input parameter schema reference grep -A5 '"input"' nextflow_schema.json
Schema Structure
Input schemas define columns:
json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "array",
"items": {
"type": "object",
"required": ["sample", "fastq_1"],
"properties": {
"sample": {
"type": "string",
"pattern": "^\\S+$",
"description": "Sample name"
},
"fastq_1": {
"type": "string",
"format": "file-path",
"exists": true,
"description": "Path to R1 FASTQ"
},
"fastq_2": {
"type": "string",
"format": "file-path",
"exists": true,
"description": "Path to R2 FASTQ (optional)"
},
"strandedness": {
"type": "string",
"enum": ["auto", "forward", "reverse", "unstranded"],
"description": "Strandedness"
}
}
}
}
Generating Samplesheets
From File Listing
bash
# Generate from directory of FASTQs
for f in *_R1.fastq.gz; do
sample=$(basename "$f" _R1.fastq.gz)
r2="${sample}_R2.fastq.gz"
echo "${sample},$(pwd)/${f},$(pwd)/${r2},auto"
done > samplesheet.csv
# Add header
sed -i '1i sample,fastq_1,fastq_2,strandedness' samplesheet.csv
From Manifest
bash
# If you have a manifest file
awk -F'\t' 'NR>1 {print $1","$2","$3",auto"}' manifest.tsv > samplesheet.csv
Validation
Using nf-schema
When pipeline uses nf-schema, validation happens automatically:
bash
nextflow run nf-core/rnaseq --input samplesheet.csv -profile test,docker # Validation errors shown immediately
Manual Validation
bash
# Check file exists
while IFS=, read -r sample fastq_1 fastq_2 strand; do
[ -f "$fastq_1" ] || echo "Missing: $fastq_1"
[ -n "$fastq_2" ] && [ ! -f "$fastq_2" ] && echo "Missing: $fastq_2"
done < <(tail -n +2 samplesheet.csv)
Common Issues
"File not found"
- •Use absolute paths
- •Check file permissions
- •Verify file extensions match exactly
"Invalid sample name"
- •No spaces in sample names
- •Use alphanumeric and underscores
- •Match pattern in schema
"Missing required column"
- •Check column names match exactly (case-sensitive)
- •Include all required columns
"Invalid value"
- •Check enum values match allowed options
- •Verify numeric values are in range
Best Practices
- •Use absolute paths:
/full/path/to/file.fastq.gz - •Consistent naming: Follow pipeline conventions
- •No special characters: Avoid spaces, quotes in values
- •Validate early: Test with one sample first
- •Document samples: Keep metadata in separate file
- •Version control: Track samplesheet in git
Converting Formats
CSV to TSV
bash
sed 's/,/\t/g' samplesheet.csv > samplesheet.tsv
TSV to CSV
bash
sed 's/\t/,/g' samplesheet.tsv > samplesheet.csv
Excel to CSV
- •Save As → CSV UTF-8
- •Or use:
xlsx2csv samplesheet.xlsx > samplesheet.csv