Seurat
A toolkit for processing single cell data
Input data
Files:
- CellRanger output files from 10x Genomics (with one folder associated with each sample)
- A design.csv file describing each sample (i.e. sample metadata), for example,
sample_id,is_treated
sample_1,yes
sample_2,no
sample_3,yes
sample_4,no
, where the sample identifiers must be in the first column.
Files are organized as follows:
my_project
|-- sample_1
| |-- barcodes.tsv.gz
| |-- features.tsv.gz
| `-- matrix.mtx.gz
|-- sample_2
| |-- barcodes.tsv
| |-- genes.tsv
| `-- matrix.mtx
|-- sample_3
| |-- barcodes.tsv
| |-- genes.tsv
| `-- matrix.mtx
|-- sample_4
| |-- barcodes.tsv
| |-- genes.tsv
| `-- matrix.mtx
|-- design.csv
Note: The folder names should match the sample identifiers in the design.csv file.
Output files
- Clustering results (in metadata.tsv)
- Normalized gene expression matrix (matrix.mtx)
For example,
output/
|-- barcodes.tsv
|-- cluster_marker.tsv
|-- file_description.json
|-- genes.tsv
|-- matrix.mtx
|-- metadata.tsv
|-- tsne_embedding.tsv
Parameter settings
Options for cell filtering:
- cutoff_MT — Mitochondrial reads cutoff (%), default: 25
- max_n_features — Maximum number of features, default: 8000
- min_n_features — Minimum number of features, default: 200
Options for dimension reduction:
- normalization_method — Normalization method, possible values: ‘LogNormalize’, ‘CLR’, ‘RC’, default: ‘LogNormalize’
- number_of_features — Number of features selected for PCA, default: 2000
- number_of_PCs — Number of principle components (PCs) selected, default: 40
- resolution — Cluster resolution, default: 0.5
- reduction_method — Nonlinear dimention reduction; possible values: ’tSNE’ or ‘UMAP’, default: tSNE
Options for cluster markers selection
- log_fc_threshold — Difference between two groups of cells, default: 0.25
- min_pct — Minimum percentage of cells expressed in any one group, default: 0.25
- number_of_markers — Number of makers selected, default: 5
References
Please cite: