Read/write SingleCellExperiment objects using anndataR
Source:vignettes/usage_singlecellexperiment.Rmd
usage_singlecellexperiment.Rmd
This vignette demonstrates how to read and write
SingleCellExperiment
objects using the
{anndataR} package, leveraging the interoperability
between SingleCellExperiment
and the AnnData
format.
Check out ?anndataR
for a full list of the functions
provided by this package.
Introduction
SingleCellExperiment is a widely used class for storing single-cell
data in R, especially within the Bioconductor ecosystem.
{anndataR} enables conversion between
SingleCellExperiment
objects and AnnData
objects, allowing you to leverage the strengths of both the scverse and
Bioconductor ecosystems.
Prerequisites
Before you begin, make sure you have both SingleCellExperiment and {anndataR} installed. You can install them using the following code:
if (!requireNamespace("pak", quietly = TRUE)) {
install.packages("pak")
}
pak::pak(c("SingleCellExperiment", "SummarizedExperiment"))
pak::pak("scverse/anndataR")
Converting an AnnData Object to a SingleCellExperiment Object
Using an example .h5ad
file included in the package, we
will demonstrate how to read an .h5ad
file and convert it
to a SingleCellExperiment
object.
library(anndataR)
library(SingleCellExperiment)
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#>
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#>
#> as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#> setequal, union
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#> get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
#> order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#> rbind, Reduce, rownames, sapply, saveRDS, table, tapply, unique,
#> unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#>
#> rowMedians
#> The following objects are masked from 'package:matrixStats':
#>
#> anyMissing, rowMedians
h5ad_file <- system.file("extdata", "example.h5ad", package = "anndataR")
Read the .h5ad
file:
adata <- read_h5ad(h5ad_file)
adata
#> AnnData object with n_obs × n_vars = 50 × 100
#> obs: 'Float', 'FloatNA', 'Int', 'IntNA', 'Bool', 'BoolNA', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'leiden'
#> var: 'String', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
#> uns: 'Bool', 'BoolNA', 'Category', 'DataFrameEmpty', 'Int', 'IntNA', 'IntScalar', 'Sparse1D', 'String', 'String2D', 'StringScalar', 'hvg', 'leiden', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'umap'
#> obsm: 'X_pca', 'X_umap'
#> varm: 'PCs'
#> layers: 'counts', 'csc_counts', 'dense_X', 'dense_counts'
#> obsp: 'connectivities', 'distances'
Convert to a SingleCellExperiment
object:
sce_obj <- adata$as_SingleCellExperiment()
sce_obj
#> class: SingleCellExperiment
#> dim: 100 50
#> metadata(18): Bool BoolNA ... rank_genes_groups umap
#> assays(5): counts csc_counts dense_X dense_counts X
#> rownames(100): Gene000 Gene001 ... Gene098 Gene099
#> rowData names(11): String n_cells_by_counts ... dispersions
#> dispersions_norm
#> colnames(50): Cell000 Cell001 ... Cell048 Cell049
#> colData names(11): Float FloatNA ... log1p_total_counts leiden
#> reducedDimNames(2): X_pca X_umap
#> mainExpName: NULL
#> altExpNames(0):
Note that there is no one-to-one mapping possible between the AnnData and SingleCellExperiment data structures, so some information might be lost during conversion. It is recommended to carefully inspect the converted object to ensure that all necessary information has been transferred.
Customizing the conversion
You can customize the conversion process by providing specific
mappings for each slot in the SingleCellExperiment
object.
Each of the mapping arguments can be provided with one of the following:
- TRUE
: all items in the slot will be copied using the
default mapping - FALSE
: the slot will not be copied - a
(named) character vector: the names are the names of the slot in the
SingleCellExperiment
object, the values are the names of
the slot in the AnnData
object.
See ?as_SingleCellExperiment
for more details on how to
customize the conversion process. For instance:
sce_obj <- adata$as_SingleCellExperiment(
x_mapping = "counts",
assays_mapping = c("csc_counts"),
colData_mapping = c("Int", "IntNA"),
rowData_mapping = c(rowdata1 = "String", rowdata2 = "total_counts"),
reducedDims_mapping = list(
"pca" = c(sampleFactors = "X_pca", featureLoadings = "PCs"),
"umap" = c(sampleFactors = "X_umap")
),
colPairs_mapping = TRUE,
rowPairs_mapping = FALSE,
metadata_mapping = c(value1 = "Bool", value2 = "IntScalar")
)
sce_obj
Convert a SingleCellExperiment Object to an AnnData Object
The reverse conversion is also possible, allowing you to convert a
SingleCellExperiment
object back to an AnnData
object, or to just write out the SingleCellExperiment
object as an .h5ad
file.
write_h5ad(sce_obj, tempfile(fileext = ".h5ad"))
This is equivalent to converting the
SingleCellExperiment
object to an AnnData
object and then writing it out:
adata <- as_AnnData(sce_obj)
write_h5ad(adata, tempfile(fileext = ".h5ad"))
You can again customize the conversion process by providing specific
mappings for each slot in the AnnData
object. For more
details, see ?as_AnnData
.
Here’s an example:
as_AnnData(
sce_obj,
x_mapping = "counts",
layers_mapping = c("csc_counts"),
obs_mapping = c(metadata1 = "Int", metadata2 = "IntNA"),
var_mapping = FALSE,
obsm_mapping = list(X_pca = "X_pca", X_umap = "X_umap"),
obsp_mapping = TRUE,
uns_mapping = c("Bool", "IntScalar")
)
#> AnnData object with n_obs × n_vars = 50 × 100
#> obs: 'metadata1', 'metadata2'
#> uns: 'Bool', 'IntScalar'
#> obsm: 'X_pca', 'X_umap'
#> varm: 'X_pca'
#> layers: 'csc_counts'
#> obsp: 'connectivities', 'distances'
Session info
sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] SingleCellExperiment_1.30.1 SummarizedExperiment_1.38.1
#> [3] Biobase_2.68.0 GenomicRanges_1.60.0
#> [5] GenomeInfoDb_1.44.0 IRanges_2.42.0
#> [7] S4Vectors_0.46.0 BiocGenerics_0.54.0
#> [9] generics_0.1.4 MatrixGenerics_1.20.0
#> [11] matrixStats_1.5.0 anndataR_0.1.0.9004
#> [13] BiocStyle_2.36.0
#>
#> loaded via a namespace (and not attached):
#> [1] sass_0.4.10 SparseArray_1.8.0 lattice_0.22-7
#> [4] digest_0.6.37 magrittr_2.0.3 evaluate_1.0.4
#> [7] grid_4.5.1 bookdown_0.43 fastmap_1.2.0
#> [10] jsonlite_2.0.0 Matrix_1.7-3 BiocManager_1.30.26
#> [13] httr_1.4.7 purrr_1.0.4 UCSC.utils_1.4.0
#> [16] textshaping_1.0.1 jquerylib_0.1.4 abind_1.4-8
#> [19] cli_3.6.5 rlang_1.1.6 crayon_1.5.3
#> [22] XVector_0.48.0 bit64_4.6.0-1 cachem_1.1.0
#> [25] DelayedArray_0.34.1 yaml_2.3.10 S4Arrays_1.8.1
#> [28] tools_4.5.1 GenomeInfoDbData_1.2.14 vctrs_0.6.5
#> [31] R6_2.6.1 lifecycle_1.0.4 fs_1.6.6
#> [34] htmlwidgets_1.6.4 bit_4.6.0 hdf5r_1.3.12
#> [37] ragg_1.4.0 desc_1.4.3 pkgdown_2.1.3
#> [40] bslib_0.9.0 glue_1.8.0 systemfonts_1.2.3
#> [43] xfun_0.52 knitr_1.50 htmltools_0.5.8.1
#> [46] rmarkdown_2.29 compiler_4.5.1