MAESTRO(Model-based AnalysEs of Single-cell Transcriptome and RegulOme) is a comprehensive single-cell RNA-seq and ATAC-seq analysis suit built using snakemake. MAESTRO combines several dozen tools and packages to create an integrative pipeline, which enables scRNA-seq and scATAC-seq analysis from raw sequencing data (fastq files) all the way through alignment, quality control, cell filtering, normalization, unsupervised clustering, differential expression and peak calling, celltype annotation and transcription regulation analysis. Currently, MAESTRO support Smart-seq2, 10x-genomics, Drop-seq, SPLiT-seq for scRNA-seq protocols; microfudics-based, 10x-genomics and sci-ATAC-seq for scATAC-seq protocols.


MAESTRO provides ten functions serving as sub-commands. To get a full list of commands and descriptions:


usage: MAESTRO [-h] [-v]
               {scrna-init,scatac-init,integrate-init,samples-init,mtx-to-h5,count-to-h5, merge-h5,scrna-qc,scatac-qc,scatac-peakcount,scatac-genescore}
Subcommand Description
scrna-init Initialize the MAESTRO scRNA-seq workflow.
scatac-init Initialize the MAESTRO scATAC-seq workflow.
integrate-init Initialize the MAESTRO integration workflow.
samples-init Initialize samples.json file in the current directory.
mtx-to-h5 Convert 10X mtx format matrix to HDF5 format.
count-to-h5 Convert plain text count table to HDF5 format.
merge-h5 Merge multiple HDF5 files, e.g. different replicates.
scrna-qc Perform quality control for scRNA-seq gene-cell count matrix.
scatac-qc Perform quality control for scATAC-seq peak-cell count matrix.
scatac-peakcount Generate peak-cell binary count matrix.
scatac-genescore Calculate gene score based on the binarized scATAC peak count.


The most general use case for MAESTRO is to process single-cell data with a streamlined pipeline. The basic idea of running MAESTRO is summarized as the following three steps.

Version Author Date
df5cd89 baigal628 2021-06-22
317af89 baigal628 2021-06-21

Required annotations for MAESTRO workflow

The full MAESTRO workflow requires extra annotation and reference files. If you want to take full advantage of the pipeline, please download the following:

  • MAESTRO depends on starsolo for mapping scRNA-seq and chromap or minimap2 for mapping scATAC-seq dataset. Users need to generate the reference files for the alignment software and specify the path of the annotations to MAESTRO through command line options. Here, we will provide the pre-built reference files.

For scRNA-seq, please download STARsolo index from link STAR human and STAR mouse. If the sequencing platform is Smart-seq2, please also download RSEM prefix from RSEM human and RSEM mouse.

For scATAC-seq, please download reference file from human and mouse. If using chromap as mapping tool (which is much faster), please also build the index file using chromap -i -r ref.fa -o ref.index.

Download index files for GRCh38

mkdir MAESTRO/references
cd MAESTRO/references/
tar xvf pbmc8k_fastqs.tar

mkdir scRNA
cd scRNA/
tar xvzf Refdata_scRNA_MAESTRO_GRCh38_1.2.2.tar.gz

cd ../
mkdir scATAC
cd scATAC/
tar xvzf Refdata_scATAC_MAESTRO_GRCh38_1.1.0.tar.gz

#build index for chromap. Only take a few minutes.
chromap -i -r Refdata_scATAC_MAESTRO_GRCh38_1.1.0/GRCh38_genome.fa -o GRCh38_chromap.index
  • MAESTRO utilizes LISA2 to evaluate the enrichment of transcription factors based on the marker genes from scRNA-seq clusters. If users want to use LISA2, they need to download and install reference data either for human or for mouse. The input gene set can be constituted of only official gene symbols, only RefSeq ids, only Ensembl ids, only Entrez ids, or a mixture of these identifiers.

Download lisa2 data files for GRCh38

cd ../
mkdir annotation
cd annotation/
  • MAESTRO utilizes giggle to identify enrichment of transcription factor peaks in scATAC-seq cluster-specific peaks. By default giggle is installed in MAESTRO environment. The giggle index for Cistrome database can be downloaded here. Users need to download the file and provide the location of the giggle annotation to MAESTRO if want to predict TFs on scATAC-seq.

Download giggle annotation files for GRCh38

tar xvzf giggle.all.tar.gz

Test data

We also provide small data sets for users to try out the pipeline (sampling from 10x fastq files). Data can be downloaded from the link scRNA-seq and scATAC-seq.

