Skip to contents

This vignette demonstrates how to read and write .h5ad files using the {anndataR} package.

Check out ?anndataR for a full list of the functions provided by this package.

Example data

A great place for finding single-cell datasets as .h5ad files is the CELLxGENE website.

We will use an example file included in the package for demonstration.

library(anndataR)

h5ad_path <- system.file("extdata", "example.h5ad", package = "anndataR")

Reading in memory

To read an h5ad file into memory, use the read_h5ad function. By default, the data will be read entirely into memory:

adata <- read_h5ad(h5ad_path)

This reads the entire .h5ad file into memory as an AnnData object. You can then inspect its structure:

adata
#> AnnData object with n_obs × n_vars = 50 × 100
#>     obs: 'Float', 'FloatNA', 'Int', 'IntNA', 'Bool', 'BoolNA', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'leiden'
#>     var: 'String', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
#>     uns: 'Bool', 'BoolNA', 'Category', 'DataFrameEmpty', 'Int', 'IntNA', 'IntScalar', 'Sparse1D', 'String', 'String2D', 'StringScalar', 'hvg', 'leiden', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'umap'
#>     obsm: 'X_pca', 'X_umap'
#>     varm: 'PCs'
#>     layers: 'counts', 'csc_counts', 'dense_X', 'dense_counts'
#>     obsp: 'connectivities', 'distances'

Reading backed by HDF5

For large datasets that do not fit into memory, you can read the h5ad file in a “backed” mode. This means that the data remains on disk, and only parts that are actively being used are loaded into memory.

To do this, set the to parameter in the read_h5ad to HDF5AnnData:

adata <- read_h5ad(h5ad_path, to = "HDF5AnnData")

The structure of the object will look similar to the in-memory representation, but the data is stored on disk.

adata
#> AnnData object with n_obs × n_vars = 50 × 100
#>     obs: 'Float', 'FloatNA', 'Int', 'IntNA', 'Bool', 'BoolNA', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'leiden'
#>     var: 'String', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
#>     uns: 'Bool', 'BoolNA', 'Category', 'DataFrameEmpty', 'Int', 'IntNA', 'IntScalar', 'Sparse1D', 'String', 'String2D', 'StringScalar', 'hvg', 'leiden', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'umap'
#>     obsm: 'X_pca', 'X_umap'
#>     varm: 'PCs'
#>     layers: 'counts', 'csc_counts', 'dense_X', 'dense_counts'
#>     obsp: 'connectivities', 'distances'

Note that any changes made to the object will be reflected in the .h5ad file!

Writing h5ad files

You can write an AnnData object to an h5ad file using the write_h5ad function:

# Create a temporary file for demonstration
temp_h5ad <- tempfile(fileext = ".h5ad")

adata$write_h5ad(temp_h5ad)

Accessing AnnData slots

The AnnData object is a list-like object containing various slots. Here’s how you can access some of them:

dim(adata$X)
#> [1]  50 100
adata$obs[1:5, 1:6]
#>         Float FloatNA Int IntNA  Bool BoolNA
#> Cell000 42.42     NaN   0    NA FALSE  FALSE
#> Cell001 42.42   42.42   1    42  TRUE     NA
#> Cell002 42.42   42.42   2    42  TRUE   TRUE
#> Cell003 42.42   42.42   3    42  TRUE   TRUE
#> Cell004 42.42   42.42   4    42  TRUE   TRUE
adata$var[1:5, 1:6]
#>          String n_cells_by_counts mean_counts log1p_mean_counts
#> Gene000 String0                44        1.94          1.078410
#> Gene001 String1                42        2.04          1.111858
#> Gene002 String2                43        2.12          1.137833
#> Gene003 String3                41        1.72          1.000632
#> Gene004 String4                42        2.06          1.118415
#>         pct_dropout_by_counts total_counts
#> Gene000                    12           97
#> Gene001                    16          102
#> Gene002                    14          106
#> Gene003                    18           86
#> Gene004                    16          103

You can also access other slots like layers, uns, obsm, varm, and obsp in a similar way.

Session info

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] anndataR_0.99.0  BiocStyle_2.34.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] vctrs_0.6.5         cli_3.6.3           knitr_1.49         
#>  [4] rlang_1.1.4         xfun_0.50           purrr_1.0.2        
#>  [7] textshaping_0.4.1   jsonlite_1.8.9      bit_4.5.0.1        
#> [10] htmltools_0.5.8.1   ragg_1.3.3          sass_0.4.9         
#> [13] rmarkdown_2.29      grid_4.4.2          evaluate_1.0.3     
#> [16] jquerylib_0.1.4     fastmap_1.2.0       yaml_2.3.10        
#> [19] lifecycle_1.0.4     bookdown_0.42       BiocManager_1.30.25
#> [22] compiler_4.4.2      fs_1.6.5            htmlwidgets_1.6.4  
#> [25] lattice_0.22-6      systemfonts_1.1.0   digest_0.6.37      
#> [28] R6_2.5.1            magrittr_2.0.3      bslib_0.8.0        
#> [31] Matrix_1.7-1        bit64_4.5.2         hdf5r_1.3.11       
#> [34] tools_4.4.2         pkgdown_2.1.1       cachem_1.1.0       
#> [37] desc_1.4.3