This vignette demonstrates how to read and write .h5ad
files using the {anndataR} package.
Check out ?anndataR
for a full list of the functions
provided by this package.
Example data
A great place for finding single-cell datasets as .h5ad
files is the CELLxGENE
website.
We will use an example file included in the package for demonstration.
library(anndataR)
h5ad_path <- system.file("extdata", "example.h5ad", package = "anndataR")
Reading in memory
To read an h5ad file into memory, use the read_h5ad
function. By default, the data will be read entirely into memory:
adata <- read_h5ad(h5ad_path)
This reads the entire .h5ad
file into memory as an
AnnData
object. You can then inspect its structure:
adata
#> AnnData object with n_obs × n_vars = 50 × 100
#> obs: 'Float', 'FloatNA', 'Int', 'IntNA', 'Bool', 'BoolNA', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'leiden'
#> var: 'String', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
#> uns: 'Bool', 'BoolNA', 'Category', 'DataFrameEmpty', 'Int', 'IntNA', 'IntScalar', 'Sparse1D', 'String', 'String2D', 'StringScalar', 'hvg', 'leiden', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'umap'
#> obsm: 'X_pca', 'X_umap'
#> varm: 'PCs'
#> layers: 'counts', 'csc_counts', 'dense_X', 'dense_counts'
#> obsp: 'connectivities', 'distances'
Reading backed by HDF5
For large datasets that do not fit into memory, you can read the h5ad file in a “backed” mode. This means that the data remains on disk, and only parts that are actively being used are loaded into memory.
To do this, set the to
parameter in the
read_h5ad
to HDF5AnnData
:
adata <- read_h5ad(h5ad_path, to = "HDF5AnnData")
The structure of the object will look similar to the in-memory representation, but the data is stored on disk.
adata
#> AnnData object with n_obs × n_vars = 50 × 100
#> obs: 'Float', 'FloatNA', 'Int', 'IntNA', 'Bool', 'BoolNA', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'leiden'
#> var: 'String', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
#> uns: 'Bool', 'BoolNA', 'Category', 'DataFrameEmpty', 'Int', 'IntNA', 'IntScalar', 'Sparse1D', 'String', 'String2D', 'StringScalar', 'hvg', 'leiden', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'umap'
#> obsm: 'X_pca', 'X_umap'
#> varm: 'PCs'
#> layers: 'counts', 'csc_counts', 'dense_X', 'dense_counts'
#> obsp: 'connectivities', 'distances'
Note that any changes made to the object will be reflected in the
.h5ad
file!
Writing h5ad files
You can write an AnnData
object to an h5ad file using
the write_h5ad
function:
# Create a temporary file for demonstration
temp_h5ad <- tempfile(fileext = ".h5ad")
adata$write_h5ad(temp_h5ad)
Accessing AnnData slots
The AnnData
object is a list-like object containing
various slots. Here’s how you can access some of them:
dim(adata$X)
#> [1] 50 100
adata$obs[1:5, 1:6]
#> Float FloatNA Int IntNA Bool BoolNA
#> Cell000 42.42 NaN 0 NA FALSE FALSE
#> Cell001 42.42 42.42 1 42 TRUE NA
#> Cell002 42.42 42.42 2 42 TRUE TRUE
#> Cell003 42.42 42.42 3 42 TRUE TRUE
#> Cell004 42.42 42.42 4 42 TRUE TRUE
adata$var[1:5, 1:6]
#> String n_cells_by_counts mean_counts log1p_mean_counts
#> Gene000 String0 44 1.94 1.078410
#> Gene001 String1 42 2.04 1.111858
#> Gene002 String2 43 2.12 1.137833
#> Gene003 String3 41 1.72 1.000632
#> Gene004 String4 42 2.06 1.118415
#> pct_dropout_by_counts total_counts
#> Gene000 12 97
#> Gene001 16 102
#> Gene002 14 106
#> Gene003 18 86
#> Gene004 16 103
You can also access other slots like layers
,
uns
, obsm
, varm
, and
obsp
in a similar way.
Session info
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] anndataR_0.99.0 BiocStyle_2.34.0
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.6.5 cli_3.6.3 knitr_1.49
#> [4] rlang_1.1.4 xfun_0.50 purrr_1.0.2
#> [7] textshaping_0.4.1 jsonlite_1.8.9 bit_4.5.0.1
#> [10] htmltools_0.5.8.1 ragg_1.3.3 sass_0.4.9
#> [13] rmarkdown_2.29 grid_4.4.2 evaluate_1.0.3
#> [16] jquerylib_0.1.4 fastmap_1.2.0 yaml_2.3.10
#> [19] lifecycle_1.0.4 bookdown_0.42 BiocManager_1.30.25
#> [22] compiler_4.4.2 fs_1.6.5 htmlwidgets_1.6.4
#> [25] lattice_0.22-6 systemfonts_1.1.0 digest_0.6.37
#> [28] R6_2.5.1 magrittr_2.0.3 bslib_0.8.0
#> [31] Matrix_1.7-1 bit64_4.5.2 hdf5r_1.3.11
#> [34] tools_4.4.2 pkgdown_2.1.1 cachem_1.1.0
#> [37] desc_1.4.3