Read/write SingleCellExperiment objects using anndataR
Source:vignettes/usage_singlecellexperiment.Rmd
usage_singlecellexperiment.Rmd
This vignette demonstrates how to read and write
SingleCellExperiment
objects using the
{anndataR} package, leveraging the interoperability
between SingleCellExperiment
and the AnnData
format.
Introduction
SingleCellExperiment
is a widely used class for storing
single-cell data in R, especially within the
Bioconductor ecosystem. {anndataR}
enables conversion between SingleCellExperiment
objects and
AnnData
objects, allowing you to leverage the strengths of
both the scverse and Bioconductor
ecosystems.
Prerequisites
This vignette requires {SingleCellExperiment} in addition to {anndataR}. You can install them using the following code:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("SingleCellExperiment")
Converting an AnnData
object to a
SingleCellExperiment
object
Using an example .h5ad
file included in the package, we
will demonstrate how to read an .h5ad
file and convert it
to a SingleCellExperiment
object.
library(anndataR)
library(SingleCellExperiment)
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#>
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#>
#> as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#> setequal, union
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#> get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
#> order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#> rbind, Reduce, rownames, sapply, saveRDS, table, tapply, unique,
#> unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#>
#> rowMedians
#> The following objects are masked from 'package:matrixStats':
#>
#> anyMissing, rowMedians
h5ad_file <- system.file("extdata", "example.h5ad", package = "anndataR")
Read the .h5ad
file as a
SingleCellExperiment
object:
sce_obj <- read_h5ad(h5ad_file, as = "SingleCellExperiment")
sce_obj
#> class: SingleCellExperiment
#> dim: 100 50
#> metadata(18): Bool BoolNA ... rank_genes_groups umap
#> assays(5): counts csc_counts dense_X dense_counts X
#> rownames(100): Gene000 Gene001 ... Gene098 Gene099
#> rowData names(11): String n_cells_by_counts ... dispersions
#> dispersions_norm
#> colnames(50): Cell000 Cell001 ... Cell048 Cell049
#> colData names(11): Float FloatNA ... log1p_total_counts leiden
#> reducedDimNames(2): X_pca X_umap
#> mainExpName: NULL
#> altExpNames(0):
This is equivalent to reading in the .h5ad
file and
explicitly converting.
adata <- read_h5ad(h5ad_file)
sce <- adata$as_SingleCellExperiment()
sce
#> class: SingleCellExperiment
#> dim: 100 50
#> metadata(18): Bool BoolNA ... rank_genes_groups umap
#> assays(5): counts csc_counts dense_X dense_counts X
#> rownames(100): Gene000 Gene001 ... Gene098 Gene099
#> rowData names(11): String n_cells_by_counts ... dispersions
#> dispersions_norm
#> colnames(50): Cell000 Cell001 ... Cell048 Cell049
#> colData names(11): Float FloatNA ... log1p_total_counts leiden
#> reducedDimNames(2): X_pca X_umap
#> mainExpName: NULL
#> altExpNames(0):
Note that there is no one-to-one mapping possible between the
AnnData
and SingleCellExperiment
data
structures, so some information might be lost during conversion. It is
recommended to carefully inspect the converted object to ensure that all
necessary information has been transferred.
Customizing the conversion
You can customize the conversion process by providing specific
mappings for each slot in the SingleCellExperiment
object.
Each of the mapping arguments can be provided with one of the
following: - TRUE
: all items in the slot will be copied
using the default mapping - FALSE
: the slot will not be
copied - A (named) character vector: the names are the names of the slot
in the SingleCellExperiment
object, the values are the
names of the slot in the AnnData
object.
See ?as_SingleCellExperiment
for more details on how to
customize the conversion process. For instance:
sce_obj <- adata$as_SingleCellExperiment(
x_mapping = "counts",
assays_mapping = c("csc_counts"),
colData_mapping = c("Int", "IntNA"),
rowData_mapping = c(rowdata1 = "String", rowdata2 = "total_counts"),
reducedDims_mapping = list(
"pca" = c(sampleFactors = "X_pca", featureLoadings = "PCs"),
"umap" = c(sampleFactors = "X_umap")
),
colPairs_mapping = TRUE,
rowPairs_mapping = FALSE,
metadata_mapping = c(value1 = "Bool", value2 = "IntScalar")
)
sce_obj
The mapping arguments can also be passed directly to
read_h5ad()
.
Convert a SingleCellExperiment
object to an
AnnData
object
The reverse conversion is also possible, allowing you to convert a
SingleCellExperiment
object back to an AnnData
object, or to just write out the SingleCellExperiment
object as an .h5ad
file.
write_h5ad(sce_obj, tempfile(fileext = ".h5ad"))
This is equivalent to converting the
SingleCellExperiment
object to an AnnData
object and then writing it out:
adata <- as_AnnData(sce_obj)
write_h5ad(adata, tempfile(fileext = ".h5ad"))
You can again customize the conversion process by providing specific
mappings for each slot in the AnnData
object. For more
details, see ?as_AnnData
.
Here’s an example:
as_AnnData(
sce_obj,
x_mapping = "counts",
layers_mapping = c("csc_counts"),
obs_mapping = c(metadata1 = "Int", metadata2 = "IntNA"),
var_mapping = FALSE,
obsm_mapping = list(X_pca = "X_pca", X_umap = "X_umap"),
obsp_mapping = TRUE,
uns_mapping = c("Bool", "IntScalar")
)
#> AnnData object with n_obs × n_vars = 50 × 100
#> obs: 'metadata1', 'metadata2'
#> uns: 'Bool', 'IntScalar'
#> obsm: 'X_pca', 'X_umap'
#> varm: 'X_pca'
#> layers: 'csc_counts'
#> obsp: 'connectivities', 'distances'
#> varp: 'test_varp'
The mapping arguments can also be passed directly to
write_h5ad()
.
Session info
sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] SingleCellExperiment_1.30.1 SummarizedExperiment_1.38.1
#> [3] Biobase_2.68.0 GenomicRanges_1.60.0
#> [5] GenomeInfoDb_1.44.1 IRanges_2.42.0
#> [7] S4Vectors_0.46.0 BiocGenerics_0.54.0
#> [9] generics_0.1.4 MatrixGenerics_1.20.0
#> [11] matrixStats_1.5.0 anndataR_0.99.0
#> [13] BiocStyle_2.36.0
#>
#> loaded via a namespace (and not attached):
#> [1] Matrix_1.7-3 jsonlite_2.0.0 crayon_1.5.3
#> [4] compiler_4.5.1 BiocManager_1.30.26 rhdf5filters_1.20.0
#> [7] jquerylib_0.1.4 systemfonts_1.2.3 textshaping_1.0.1
#> [10] yaml_2.3.10 fastmap_1.2.0 lattice_0.22-7
#> [13] XVector_0.48.0 R6_2.6.1 S4Arrays_1.8.1
#> [16] knitr_1.50 htmlwidgets_1.6.4 DelayedArray_0.34.1
#> [19] bookdown_0.43 desc_1.4.3 GenomeInfoDbData_1.2.14
#> [22] bslib_0.9.0 rlang_1.1.6 cachem_1.1.0
#> [25] xfun_0.52 fs_1.6.6 sass_0.4.10
#> [28] SparseArray_1.8.1 cli_3.6.5 Rhdf5lib_1.30.0
#> [31] pkgdown_2.1.3 magrittr_2.0.3 digest_0.6.37
#> [34] grid_4.5.1 rhdf5_2.52.1 lifecycle_1.0.4
#> [37] vctrs_0.6.5 evaluate_1.0.4 ragg_1.4.0
#> [40] abind_1.4-8 rmarkdown_2.29 purrr_1.1.0
#> [43] httr_1.4.7 tools_4.5.1 htmltools_0.5.8.1
#> [46] UCSC.utils_1.4.0