DESeq2

Differential expression analysis for (bulk) RNA-Seq data

Input data

Files:

Salmon or Kallisto output files (with one sample associated with a subfolder)
A design.csv file describing each sample (i.e. sample metadata), for example,

sampleid,group,subjectid
SRR2123765,neg,1
SRR2123766,pos,1
SRR2123767,neg,2
SRR2123768,pos,2
SRR2123769,pos,3
SRR2123770,neg,3
SRR2123771,neg,4
SRR2123772,pos,4
...

, where the sample identifiers must be in the first column.

Files are organized as follows:

my_project
|-- SRR2123771
|   |-- aux_info
|   |   |-- ambig_info.tsv
|   |   |-- exp3_seq.gz
|   |   |-- exp5_seq.gz
|   |   |-- expected_bias.gz
|   |   |-- exp_gc.gz
|   |   |-- fld.gz
|   |   |-- meta_info.json
|   |   |-- obs3_seq.gz
|   |   |-- obs5_seq.gz
|   |   |-- observed_bias_3p.gz
|   |   |-- observed_bias.gz
|   |   `-- obs_gc.gz
|   |-- libParams
|   |   `-- flenDist.txt
|   |-- logs
|   |   `-- salmon_quant.log
|   |-- cmd_info.json
|   |-- lib_format_counts.json
|   `-- quant.sf
|-- SRR2123772
|   |-- aux_info
|   |   |-- ambig_info.tsv
|   |   |-- exp3_seq.gz
|   |   |-- exp5_seq.gz
|   |   |-- expected_bias.gz
|   |   |-- exp_gc.gz
|   |   |-- fld.gz
|   |   |-- meta_info.json
|   |   |-- obs3_seq.gz
|   |   |-- obs5_seq.gz
|   |   |-- observed_bias_3p.gz
|   |   |-- observed_bias.gz
|   |   `-- obs_gc.gz
|   |-- libParams
|   |   `-- flenDist.txt
|   |-- logs
|   |   `-- salmon_quant.log
|   |-- cmd_info.json
|   |-- lib_format_counts.json
|   `-- quant.sf
|-- design.csv

Output files

expression_matrix.tsv : Normalized gene expression matrix
*_comparion.tsv : comparison results (one for each contrast) from running LIMMA package

For example,

output/
|-- expression_matrix.tsv
|-- group-pos-vs-neg_comparison.tsv

Parameter settings

factor_of_interest — Comparison column; optional
reference_level — Baseline level; must be one of the values in factor_of_interest column
grouping_factor — Subgroup analysis column; optional
blocking_factor — Blocking/Pairing column, used for controlling blocking/pairing effects; optional
minimum_total_counts — Threshold used for filtering out low-expression genes

References

Please cite:

Bioconductor DESeq2 package