Make sure you have R version 4.2.0 or later installed. We recommend installing CITEViz through the RStudio IDE using the following commands:
devtools::install_github("https://github.com/maxsonBraunLab/CITEViz.git")
CITEViz accepts files in the RDS (.rds) format. The
preferred RDS file should include a Seurat object or a SingleCellExperiment
object. If a list of a single Seurat
object is used, only
the object labeled “integrated” will be used.
The input Seurat
or SingleCellExperiment
object must contain cell embeddings data for at least one
dimensional reduction method (e.g. PCA, UMAP, tSNE, etc.). If
the object contains data for more than one reduction, each reduction
must have a unique name. For a SingleCellExperiment
object,
the main experiment and any alternative experiments in the
SingleCellExperiment
object must contain assay count data
stored as “logcounts” (log-transformed, normalized
counts) or “normcounts” (normalized counts). Please
refer to the SingleCellExperiment
reference manual on Bioconductor for help on how to determine the types
of count data present in your SCE object. By default, CITEViz will use
the logcounts for plotting expression data. If logcounts are not
available, then CITEViz will use normcounts. If using a
SingleCellExperiment
object that was generated from
Seurat-processed data or converted from a Seurat
object,
then the logcounts in the SingleCellExperiment
object are
analogous to the counts in the “data” slot for a given assay in a
Seurat
object.
In this vignette, we used a downsampled version of the PBMC dataset from Hao et. al. 2021. The sample PBMC dataset for this vignette can be downloaded as a 10K cell subset from Google Drive here.
To run CITEViz, use the following commands:
After calling run_app()
to start CITEViz, a file can be
uploaded using the file upload box at the top of any page in CITEViz.
After clicking the file upload box, a file explorer window containing
files in the user’s local file system will open. The user can select an
RDS file containing CITE-seq data to upload, and the same dataset will
be retained for exploration across all tabs.
The first step of analysis is to assess the quality of sequencing
data. Here we provide QC plots that display data for common metrics such
as gene or antibody-derived tag (ADT) counts per assay, number of unique
ADTs, and mitochondrial expression, which can be visualized by any
categorical metadata in the user’s Seurat
or
SingleCellExperiment
object. Dotted lines in each QC plot
represent the values at which 50%, 75%, and 95% of the data falls at or
below that value.
The Clustering page allows the user to view cell clusters in two- and three-dimensional space. These clusters can be colored by any categorical metadata, and the user can select dimensionality reductions (e.g., UMAP, PCA, etc.) to view from a dropdown menu.
When the user’s cursor hovers over the 2D reduction plot, a Plotly toolbox with labeled options will appear. From this toolbox, the user can zoom, pan, download, and reset a plot by selecting an option. The user can also use the box or lasso selection tool to select specific cells in a plot. The metadata for selected cells appears in the scrollable, interactive data table below the plots, and the user can print or copy this data to their clipboard.
CITEViz supports RNA and ADT feature expression visualizations in both one-dimensional and two-dimensional formats. On the feature expression tabs of CITEViz, cell clusters are displayed in a dimensional reduction plot. These cell clusters can be colored by expression levels of selected RNA and/or ADT features. Similar to the clustering page, the user can select the type of dimensional reduction to view.
In 1D feature expression, cells in a dimensional reduction plot are colored by expression levels for one RNA or ADT feature. The user can select a specific RNA or ADT feature from a dropdown menu.
Co-expression of features greatly facilitates a holistic view of single-cell multi-omic datasets. In 2D feature expression, cells in a dimensional reduction plot are colored by expression levels for two gene and/or ADT features simultaneously. The user can select specific gene and/or ADT features from dropdown menus. CITEViz can visualize co-expression of features from the same assay (i.e. two genes or two ADTs), as shown below, or two features from different assays (i.e. one gene and one ADT).
A key feature of CITEViz is that users can iteratively gate (subset) cells using antibody markers, and the selected cells are immediately highlighted in the dimensional reduction space (e.g. UMAP, PCA, tSNE, etc.). Users can subset cells using one or more gates to explore specific cell populations similar to flow cytometry.
In the following example, we demonstrate a 2-layer gate to isolate cells with specific levels of CD11b-1 and CD45-1 (mixture of myeloid and lymphoid cells) from the whole cell population, followed by the isolation of CD8+ CD4- cells from this first subset to view the CD8+ T-cell subpopulation:
CITEViz uses interactive data visualization and exploration packages
such as Plotly and DT. Plotly
plots
can be exported as PNG files, and DT
datatables can be
copied, printed, or exported as files for downstream analysis.
After gating for cell populations of interest (see Gating section of this vignette), the metadata for each gate is saved in an internal Gate object. These metadata include cell barcodes that can facilitate downstream analysis such as differential gene expression. To download this gating data as a list of gates in RDS format, the user can click the “List (.rds)” button at the bottom of the gating page.
Users can back-gate on a selection of cells in a reduction plot and highlight them in feature space. On the back-gating page, users can highlight cells in feature space from a “labels-first” or “top-down” workflow. For example, users can select a range of cells from the reduction plot (e.g. UMAP, PCA etc.) and locate them in the feature scatter plot for more extensive data exploration.
Plot settings, cell population selection and naming methods are analogous to the gating page. However, back-gating requires the selection of a region of interest in the dimension reduction space. These cells will be highlighted in the feature scatter plot, where different the axes can be configured to the user’s preferences.
The gating data obtained from CITEViz can be read back into Seurat to facilitate differential expression. In the following example, we find differentially expressed genes between gated CD14 and CD16 monocytes. Gated cells can be found in the following Google Drive link.
library(Seurat)
# import original data
pbmc = readRDS("~/Downloads/small_pbmc.rds")
# import gate information
cd14_gate = readRDS("~/Downloads/CD14-Monocytes.rds")
cd16_gate = readRDS("~/Downloads/CD16-Monocytes.rds")
# extract cell barcodes in each gate
cd14_barcodes <- cd14_gate$gate_1@subset_cells[[1]]
cd16_barcodes <- cd16_gate$gate_1@subset_cells[[1]]
# make sure no overlapping cells between gates
cd14_barcodes <- setdiff(cd14_barcodes, cd16_barcodes)
# differential expression
diff_exp <- FindMarkers(pbmc, cells.1 = cd14_barcodes, cells.2 = cd16_barcodes)
diff_exp will contain results like the following:
p_val | avg_log2FC | pct.1 | pct.2 | p_val_adj |
---|---|---|---|---|
FCGR3A 3.551593e-133 | -3.4302631 | 0.157 | 0.907 | 7.362098e-129 |
CDKN1C 5.972331e-123 | -2.6071919 | 0.088 | 0.729 | 1.238005e-118 |
HES4 1.920188e-107 | -1.4098503 | 0.083 | 0.682 | 3.980357e-103 |
CKB 6.424118e-102 | -0.8753508 | 0.018 | 0.372 | 1.331655e-97 |
RHOC 7.989362e-95 | -1.5387006 | 0.139 | 0.775 | 1.656115e-90 |
ADA 1.289123e-93 | -0.8313632 | 0.072 | 0.612 | 2.672222e-89 |
PTP4A3 3.844392e-82 | -0.6077669 | 0.049 | 0.496 | 7.969039e-78 |
MS4A4A 1.986837e-74 | -1.0672453 | 0.124 | 0.682 | 4.118514e-70 |
CD79B 6.349774e-69 | -0.8747941 | 0.075 | 0.527 | 1.316245e-64 |
CTSL 3.869449e-66 | -1.0800221 | 0.150 | 0.690 | 8.020980e-62 |
Gating metadata for each monocytic gate (gate counter, upstream
gates, x and y selection coordinates, etc.) are found in the
cd14_barcodes
and cd16_barcodes
variables.
While differential expression may be a relatively simple example to
re-use gated cells, users can use CITEViz to facilitate more
sophisticated downstream analyses e.g. batch correction or pseudo-time
inference of specific cell types of interest.
sessionInfo()
#> R version 4.4.0 (2024-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] CITEViz_0.1
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 viridisLite_0.4.2
#> [3] golem_0.4.1 dplyr_1.1.4
#> [5] lazyeval_0.2.2 fastmap_1.2.0
#> [7] SingleCellExperiment_1.26.0 promises_1.3.0
#> [9] digest_0.6.35 dotCall64_1.1-1
#> [11] mime_0.12 lifecycle_1.0.4
#> [13] SeuratObject_5.0.2 magrittr_2.0.3
#> [15] compiler_4.4.0 config_0.3.2
#> [17] rlang_1.1.3 sass_0.4.9
#> [19] tools_4.4.0 utf8_1.2.4
#> [21] yaml_2.3.8 data.table_1.15.4
#> [23] knitr_1.47 S4Arrays_1.4.1
#> [25] htmlwidgets_1.6.4 sp_2.1-4
#> [27] DelayedArray_0.30.1 abind_1.4-5
#> [29] purrr_1.0.2 BiocGenerics_0.50.0
#> [31] desc_1.4.3 grid_4.4.0
#> [33] stats4_4.4.0 fansi_1.0.6
#> [35] xtable_1.8-4 colorspace_2.1-0
#> [37] future_1.33.2 progressr_0.14.0
#> [39] ggplot2_3.5.1 globals_0.16.3
#> [41] scales_1.3.0 SummarizedExperiment_1.34.0
#> [43] cli_3.6.2 rmarkdown_2.27
#> [45] crayon_1.5.2 ragg_1.3.2
#> [47] generics_0.1.3 future.apply_1.11.2
#> [49] httr_1.4.7 cachem_1.1.0
#> [51] zlibbioc_1.50.0 parallel_4.4.0
#> [53] XVector_0.44.0 matrixStats_1.3.0
#> [55] vctrs_0.6.5 Matrix_1.7-0
#> [57] jsonlite_1.8.8 IRanges_2.38.0
#> [59] S4Vectors_0.42.0 listenv_0.9.1
#> [61] systemfonts_1.1.0 attempt_0.3.1
#> [63] plotly_4.10.4 tidyr_1.3.1
#> [65] jquerylib_0.1.4 glue_1.7.0
#> [67] parallelly_1.37.1 spam_2.10-0
#> [69] pkgdown_2.0.9 codetools_0.2-20
#> [71] DT_0.33 gtable_0.3.5
#> [73] later_1.3.2 GenomeInfoDb_1.40.1
#> [75] GenomicRanges_1.56.0 UCSC.utils_1.0.0
#> [77] munsell_0.5.1 tibble_3.2.1
#> [79] pillar_1.9.0 htmltools_0.5.8.1
#> [81] GenomeInfoDbData_1.2.12 R6_2.5.1
#> [83] textshaping_0.4.0 evaluate_0.23
#> [85] shiny_1.8.1.1 lattice_0.22-6
#> [87] Biobase_2.64.0 highr_0.11
#> [89] memoise_2.0.1 httpuv_1.6.15
#> [91] bslib_0.7.0 Rcpp_1.0.12
#> [93] SparseArray_1.4.8 xfun_0.44
#> [95] fs_1.6.4 MatrixGenerics_1.16.0
#> [97] pkgconfig_2.0.3