DESeq2
Differential expression analysis for (bulk) RNA-Seq data
Input data
Files:
- Salmon or Kallisto output files (with one sample associated with a subfolder)
- A design.csv file describing each sample (i.e. sample metadata), for example,
sampleid,group,subjectid
SRR2123765,neg,1
SRR2123766,pos,1
SRR2123767,neg,2
SRR2123768,pos,2
SRR2123769,pos,3
SRR2123770,neg,3
SRR2123771,neg,4
SRR2123772,pos,4
...
, where the sample identifiers must be in the first column.
Files are organized as follows:
my_project
|-- SRR2123771
| |-- aux_info
| | |-- ambig_info.tsv
| | |-- exp3_seq.gz
| | |-- exp5_seq.gz
| | |-- expected_bias.gz
| | |-- exp_gc.gz
| | |-- fld.gz
| | |-- meta_info.json
| | |-- obs3_seq.gz
| | |-- obs5_seq.gz
| | |-- observed_bias_3p.gz
| | |-- observed_bias.gz
| | `-- obs_gc.gz
| |-- libParams
| | `-- flenDist.txt
| |-- logs
| | `-- salmon_quant.log
| |-- cmd_info.json
| |-- lib_format_counts.json
| `-- quant.sf
|-- SRR2123772
| |-- aux_info
| | |-- ambig_info.tsv
| | |-- exp3_seq.gz
| | |-- exp5_seq.gz
| | |-- expected_bias.gz
| | |-- exp_gc.gz
| | |-- fld.gz
| | |-- meta_info.json
| | |-- obs3_seq.gz
| | |-- obs5_seq.gz
| | |-- observed_bias_3p.gz
| | |-- observed_bias.gz
| | `-- obs_gc.gz
| |-- libParams
| | `-- flenDist.txt
| |-- logs
| | `-- salmon_quant.log
| |-- cmd_info.json
| |-- lib_format_counts.json
| `-- quant.sf
|-- design.csv
Output files
- expression_matrix.tsv : Normalized gene expression matrix
- *_comparion.tsv : comparison results (one for each contrast) from running LIMMA package
For example,
output/
|-- expression_matrix.tsv
|-- group-pos-vs-neg_comparison.tsv
Parameter settings
- factor_of_interest — Comparison column; optional
- reference_level — Baseline level; must be one of the values in factor_of_interest column
- grouping_factor — Subgroup analysis column; optional
- blocking_factor — Blocking/Pairing column, used for controlling blocking/pairing effects; optional
- minimum_total_counts — Threshold used for filtering out low-expression genes
References
Please cite: