Gali Bai

Gali Bai

Computational Biologist

Dana-Farber Cancer Institute

Biography

I am a Computational Biologist from Dana-Farber Cancer Institute. My research focuses on developing and applying cutting-edge computational tools on high-throughput omic data to uncover biological information behind Tumor-Immune interactions. Now, I am working in the Cancer Immunologic Data Commons (CIDC) Bioinformatics Working group as part of the CIMAC-CIDC project.

I use snakemake to build bioinformatics pipelines and code in R and python for data analysis. In the past year, we have developed a pipeline tool for the streamlined analysis of ChIP-seq, ATAC-seq and DNase-seq called CHIPS. Also, I put most of my efforts into implementing a single-cell data analysis pipeline called MAESTRO. Within the CIMAC-CIDC network, we collaborate with oncologists to analyze clinical trial data through our pipeline.

Download my resumé.

Interests
  • Single-cell Omics
  • Epigenomics
  • Immuno-Oncology
Education
  • M.S. in Genomics and System Biology, 2020

    Texas A&M University

  • B.S. in Agriculture and Biotechnology, 2018

    China Agricultural University

Experience

 
 
 
 
 
Investigating Killer-cell Immunoglobulin-like Receptors (KIRs) allelic diversity from the public datasets using T1000
Dana-Farber Cancer Institute
Jun 2021 – Present Boston, MA

Responsibilities include:

  • Compared T-1000 genotyping results with popular genotyping tools such as arcasHLA and HISAT.
  • Evaluated T-1000 HLA genotyping robustness by comparing allele variations in paired tumor and normal adjacent samples from TCGA RNA-seq data.
  • Improved T-1000 sensitivity in KIR genotyping by tuning parameters for screening dominant and recessive alleles.
 
 
 
 
 
CHIPS A Snakemake pipeline for quality control and reproducible processing of chromatin profiling data.
Dana-Farber Cancer Institute
Oct 2020 – Nov 2021 Boston, MA

Responsibilities include:

  • Analyzed high-throughput chromatin profiling data by performing read alignment, peak calling, peak annotations, motif finding, and regulatory potential calculation for all genes.
  • Developed a snakemake pipeline for reproducible data analysis of ATAC-seq, ChIP-seq, and DNase-seq.
  • Implemented an interactive HTML report that visualizes all CHIPS analysis results.
  • Built the google cloud computing environment for large cohort data analysis.
  • Processed ATAC-seq data of 81 GD2+ solid tumors treated with anti-GD2 CAR-T cells.
 
 
 
 
 
Model-based AnalysEs of Transcriptome and RegulOme (MAESTRO) for single-cell analysi
Dana-Farber Cancer Institute
Oct 2020 – Oct 2021 Boston, MA

Responsibilities include:

  • Implemented analysis pipeline for multi-sample scRNA-seq, multi-sample scATAC-seq, and multiome.
  • Integrated Chromap as the default scATAC-seq aligner.
  • Built a comprehensive MAESTRO documentation using Workflowr.
  • Improved 7x time and memory usage by optimizing alignment and count matrix generating steps for the whole.

Recent & Upcoming Talks

Projects

*
MAESTRO

MAESTRO

Model-based AnalysEs of Transcriptome and RegulOme (MAESTRO) for single-cell analysis

CHIPS

CHIPS

CHIPS (CHromatin enrIchment ProceSsor) An analysis pipeline in snakemake to streamline the processing of ChIP-seq, ATAC-seq, and DNase-seq data

Skills

R

90%

Python

85%

Statistics

90%

Accomplish­ments

Coursera
Tidyverse Skills for Data Science in R Specialization
See certificate
Coursera
Python for Everybody Specialization
See certificate

Contact