Skip to contents

Introduction

anndataR allows users to work with .h5ad files, interact with AnnData objects and convert to/from SingleCellExperiment or Seurat objects. This enables users to move data easily between the different programming languages and analysis ecosystems needed to perform single-cell data analysis.

This package builds on our experience developing and using other interoperability packages and aims to provide a first-class R AnnData experience.

Relationship to other packages

Existing packages provide similar functionality to anndataR but there are some important differences:

  • zellkonverter provides conversion of SingleCellExperiment objects to/from AnnData and reading/writing of .h5ad files. This is facilitated via reticulate using basilisk to manage Python environments (native reading of .h5ad files is also possible). In contrast, anndataR provides a native R H5AD interface, removing the need for Python dependencies. Conversion to/from Seurat objects is also supported.
  • anndata (on CRAN) is a wrapper around the Python anndata package. It provides a nicer interface from within R but still requires a Python environment. anndataR provides a pure R implementation of the AnnData data structure with different back ends and conversion to more common R objects.
  • Other interoperability packages typically also require Python dependencies, have limited features or flexibility and/or are not available from a central package repository

There is significant overlap in functionality between these packages. We anticipate that over time they will become more specialised and work better together or be deprecated in favour of anndataR.

Installation

Install anndataR using BiocManager:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("anndataR")

Usage

These sections briefly show how to use anndataR.

Read from disk

First, we fetch an example .h5ad file included in the package:

library(anndataR)

h5ad_path <- system.file("extdata", "example.h5ad", package = "anndataR")

By default, a H5AD is read to an in-memory AnnData object:

adata <- read_h5ad(h5ad_path)

It can also be read as a SingleCellExperiment object (see vignette("usage_singlecellexperiment")):

sce <- read_h5ad(h5ad_path, as = "SingleCellExperiment")

Or as a Seurat object (see vignette("usage_seurat")):

obj <- read_h5ad(h5ad_path, as = "Seurat")

There is also a HDF5-backed AnnData object:

adata <- read_h5ad(h5ad_path, as = "HDF5AnnData")

See for interacting with a Python AnnData via reticulate.

Using AnnData objects

View structure:

adata
#> HDF5AnnData object with n_obs × n_vars = 50 × 100
#>     obs: 'Float', 'FloatNA', 'Int', 'IntNA', 'Bool', 'BoolNA', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'leiden'
#>     var: 'String', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
#>     uns: 'Bool', 'BoolNA', 'Category', 'DataFrameEmpty', 'Int', 'IntNA', 'IntScalar', 'Sparse1D', 'String', 'String2D', 'StringScalar', 'hvg', 'leiden', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'umap'
#>     obsm: 'X_pca', 'X_umap'
#>     varm: 'PCs'
#>     layers: 'counts', 'csc_counts', 'dense_X', 'dense_counts'
#>     obsp: 'connectivities', 'distances'
#>     varp: 'test_varp'

Access AnnData slots:

dim(adata$X)
#> [1]  50 100
adata$obs[1:5, 1:6]
#>         Float FloatNA Int IntNA  Bool BoolNA
#> Cell000 42.42     NaN   0    NA FALSE  FALSE
#> Cell001 42.42   42.42   1    42  TRUE     NA
#> Cell002 42.42   42.42   2    42  TRUE   TRUE
#> Cell003 42.42   42.42   3    42  TRUE   TRUE
#> Cell004 42.42   42.42   4    42  TRUE   TRUE
adata$var[1:5, 1:6]
#>          String n_cells_by_counts mean_counts log1p_mean_counts
#> Gene000 String0                44        1.94          1.078410
#> Gene001 String1                42        2.04          1.111858
#> Gene002 String2                43        2.12          1.137833
#> Gene003 String3                41        1.72          1.000632
#> Gene004 String4                42        2.06          1.118415
#>         pct_dropout_by_counts total_counts
#> Gene000                    12           97
#> Gene001                    16          102
#> Gene002                    14          106
#> Gene003                    18           86
#> Gene004                    16          103

Subsetting AnnData objects is covered below. See ?`AnnData-usage`for more details on how to work with AnnData objects.

Interoperability

Convert the AnnData object to a SingleCellExperiment object (see vignette("usage_singlecellexperiment")):

sce <- adata$as_SingleCellExperiment()
sce
#> class: SingleCellExperiment 
#> dim: 100 50 
#> metadata(18): Bool BoolNA ... rank_genes_groups umap
#> assays(5): counts csc_counts dense_X dense_counts X
#> rownames(100): Gene000 Gene001 ... Gene098 Gene099
#> rowData names(11): String n_cells_by_counts ... dispersions
#>   dispersions_norm
#> colnames(50): Cell000 Cell001 ... Cell048 Cell049
#> colData names(11): Float FloatNA ... log1p_total_counts leiden
#> reducedDimNames(2): X_pca X_umap
#> mainExpName: NULL
#> altExpNames(0):

Convert the AnnData object to a Seurat object (see vignette("usage_seurat")):

obj <- adata$as_Seurat()
obj
#> An object of class Seurat 
#> 100 features across 50 samples within 1 assay 
#> Active assay: RNA (100 features, 0 variable features)
#>  5 layers present: counts, csc_counts, dense_X, dense_counts, X
#>  2 dimensional reductions calculated: X_pca, X_umap

Convert a SingleCellExperiment object to an AnnData object (see vignette("usage_singlecellexperiment")):

adata <- as_AnnData(sce)
adata
#> InMemoryAnnData object with n_obs × n_vars = 50 × 100
#>     obs: 'Float', 'FloatNA', 'Int', 'IntNA', 'Bool', 'BoolNA', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'leiden'
#>     var: 'String', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
#>     uns: 'Bool', 'BoolNA', 'Category', 'DataFrameEmpty', 'Int', 'IntNA', 'IntScalar', 'Sparse1D', 'String', 'String2D', 'StringScalar', 'hvg', 'leiden', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'umap'
#>     obsm: 'X_pca', 'X_umap'
#>     varm: 'X_pca'
#>     layers: 'counts', 'csc_counts', 'dense_X', 'dense_counts', 'X'
#>     obsp: 'connectivities', 'distances'
#>     varp: 'test_varp'

Convert a Seurat object to an AnnData object (see vignette("usage_seurat")):

adata <- as_AnnData(obj)
adata
#> InMemoryAnnData object with n_obs × n_vars = 50 × 100
#>     obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'Float', 'FloatNA', 'Int', 'IntNA', 'Bool', 'BoolNA', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'leiden'
#>     var: 'String', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
#>     uns: 'Bool', 'BoolNA', 'Category', 'DataFrameEmpty', 'Int', 'IntNA', 'IntScalar', 'Sparse1D', 'String', 'String2D', 'StringScalar', 'hvg', 'leiden', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'umap'
#>     obsm: 'X_pca', 'X_umap'
#>     layers: 'counts', 'csc_counts', 'dense_X', 'dense_counts', 'X'
#>     obsp: 'connectivities', 'distances'

Manually create an AnnData object

adata <- AnnData(
  X = matrix(rnorm(100), nrow = 10),
  obs = data.frame(
    cell_type = factor(rep(c("A", "B"), each = 5))
  ),
  var = data.frame(
    gene_name = paste0("gene_", 1:10)
  )
)

adata
#> InMemoryAnnData object with n_obs × n_vars = 10 × 10
#>     obs: 'cell_type'
#>     var: 'gene_name'

Write to disk

Write an AnnData object to disk:

tmpfile <- tempfile(fileext = ".h5ad")
adata$write_h5ad(tmpfile) # Alternatively, write_h5ad(adata, tmpfile)

Write a SingleCellExperiment object to disk:

tmpfile <- tempfile(fileext = ".h5ad")
write_h5ad(sce, tmpfile)

Write a Seurat object to disk:

tmpfile <- tempfile(fileext = ".h5ad")
write_h5ad(obj, tmpfile)

Subsetting AnnData objects

anndataR provides standard R subsetting methods that work with familiar bracket notation. These methods return AnnDataView objects that provide lazy evaluation for efficient memory usage.

Basic subsetting

Subset observations (rows) using logical conditions:

# Create some sample data
adata <- AnnData(
  X = matrix(rnorm(50), nrow = 10, ncol = 5),
  obs = data.frame(
    cell_type = factor(rep(c("A", "B", "C"), c(3, 4, 3))),
    score = runif(10)
  ),
  var = data.frame(
    gene_name = paste0("gene_", 1:5),
    highly_variable = c(TRUE, FALSE, TRUE, TRUE, FALSE)
  )
)

# Subset to cell type "A"
view1 <- adata[adata$obs$cell_type == "A", ]
view1
#> View of InMemoryAnnData object with n_obs × n_vars = 3 × 5
#>     obs: 'cell_type', 'score'
#>     var: 'gene_name', 'highly_variable'

Subset variables (columns) using logical conditions:

# Subset to highly variable genes
view2 <- adata[, adata$var$highly_variable]
view2
#> View of InMemoryAnnData object with n_obs × n_vars = 10 × 3
#>     obs: 'cell_type', 'score'
#>     var: 'gene_name', 'highly_variable'

Combined subsetting

Subset both observations and variables simultaneously:

# Subset to cell type "A" and highly variable genes
view3 <- adata[adata$obs$cell_type == "A", adata$var$highly_variable]
view3
#> View of InMemoryAnnData object with n_obs × n_vars = 3 × 3
#>     obs: 'cell_type', 'score'
#>     var: 'gene_name', 'highly_variable'

Using different index types

# Numeric indices
view4 <- adata[1:5, 1:3]
view4
#> View of InMemoryAnnData object with n_obs × n_vars = 5 × 3
#>     obs: 'cell_type', 'score'
#>     var: 'gene_name', 'highly_variable'

# Character names (if available)
rownames(adata) <- paste0("cell_", 1:10)
colnames(adata) <- paste0("gene_", 1:5)
view5 <- adata[c("cell_1", "cell_2"), c("gene_1", "gene_3")]
view5
#> View of InMemoryAnnData object with n_obs × n_vars = 2 × 2
#>     obs: 'cell_type', 'score'
#>     var: 'gene_name', 'highly_variable'

Working with views

Views maintain access to all original data slots:

# Access dimensions
dim(view3)
#> [1] 3 3
nrow(view3)
#> [1] 3
ncol(view3)
#> [1] 3

# Access names
rownames(view3)
#> [1] "cell_1" "cell_2" "cell_3"
colnames(view3)
#> [1] "gene_1" "gene_3" "gene_4"

# Access data matrices and metadata
view3$X
#>            gene_1      gene_3     gene_4
#> cell_1 -0.6905379  0.02229473  0.4543416
#> cell_2 -0.5585420  0.60361101 -0.8552025
#> cell_3 -0.5366633 -0.26265057 -0.2868952
view3$obs
#>        cell_type     score
#> cell_1         A 0.3492990
#> cell_2         A 0.9473183
#> cell_3         A 0.2161000
view3$var
#>        gene_name highly_variable
#> gene_1    gene_1            TRUE
#> gene_3    gene_3            TRUE
#> gene_4    gene_4            TRUE

# Views can be converted to concrete implementations
concrete <- view3$as_InMemoryAnnData()
concrete
#> InMemoryAnnData object with n_obs × n_vars = 3 × 3
#>     obs: 'cell_type', 'score'
#>     var: 'gene_name', 'highly_variable'

Citing anndataR

If you use anndataR in your work, please cite “anndataR improves interoperability between R and Python in single-cell transcriptomics”:

citation("anndataR")
#> To cite package 'anndataR' in publications use:
#> 
#>   Deconinck L, Zappia L, Cannoodt R, Morgan M, scverse core, Virshup I,
#>   Sang-aram C, Bredikhin D, Seurinck R, Saeys Y (2025). "anndataR
#>   improves interoperability between R and Python in single-cell
#>   transcriptomics." _bioRxiv_, 2025.08.18.669052.
#>   doi:10.1101/2025.08.18.669052
#>   <https://doi.org/10.1101/2025.08.18.669052>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {{anndataR} improves interoperability between R and Python in single-cell transcriptomics},
#>     author = {Louise Deconinck and Luke Zappia and Robrecht Cannoodt and Martin Morgan and {scverse core} and Isaac Virshup and Chananchida Sang-aram and Danila Bredikhin and Ruth Seurinck and Yvan Saeys},
#>     journal = {bioRxiv},
#>     year = {2025},
#>     pages = {2025.08.18.669052},
#>     doi = {10.1101/2025.08.18.669052},
#>   }

Session info

sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] anndataR_0.99.5             SingleCellExperiment_1.30.1
#>  [3] SummarizedExperiment_1.38.1 Biobase_2.68.0             
#>  [5] GenomicRanges_1.60.0        GenomeInfoDb_1.44.3        
#>  [7] IRanges_2.42.0              S4Vectors_0.46.0           
#>  [9] BiocGenerics_0.54.1         generics_0.1.4             
#> [11] MatrixGenerics_1.20.0       matrixStats_1.5.0          
#> [13] SeuratObject_5.2.0          sp_2.2-0                   
#> [15] BiocStyle_2.36.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] RColorBrewer_1.1-3      jsonlite_2.0.0          magrittr_2.0.4         
#>   [4] spatstat.utils_3.2-0    farver_2.1.2            rmarkdown_2.30         
#>   [7] fs_1.6.6                ragg_1.5.0              vctrs_0.6.5            
#>  [10] ROCR_1.0-11             spatstat.explore_3.5-3  htmltools_0.5.8.1      
#>  [13] S4Arrays_1.8.1          Rhdf5lib_1.30.0         SparseArray_1.8.1      
#>  [16] rhdf5_2.52.1            sass_0.4.10             sctransform_0.4.2      
#>  [19] parallelly_1.45.1       KernSmooth_2.23-26      bslib_0.9.0            
#>  [22] htmlwidgets_1.6.4       desc_1.4.3              ica_1.0-3              
#>  [25] plyr_1.8.9              plotly_4.11.0           zoo_1.8-14             
#>  [28] cachem_1.1.0            igraph_2.2.0            mime_0.13              
#>  [31] lifecycle_1.0.4         pkgconfig_2.0.3         Matrix_1.7-3           
#>  [34] R6_2.6.1                fastmap_1.2.0           GenomeInfoDbData_1.2.14
#>  [37] fitdistrplus_1.2-4      future_1.67.0           shiny_1.11.1           
#>  [40] digest_0.6.37           patchwork_1.3.2         tensor_1.5.1           
#>  [43] Seurat_5.3.0            RSpectra_0.16-2         irlba_2.3.5.1          
#>  [46] textshaping_1.0.4       progressr_0.17.0        spatstat.sparse_3.1-0  
#>  [49] polyclip_1.10-7         httr_1.4.7              abind_1.4-8            
#>  [52] compiler_4.5.1          S7_0.2.0                fastDummies_1.7.5      
#>  [55] MASS_7.3-65             DelayedArray_0.34.1     tools_4.5.1            
#>  [58] lmtest_0.9-40           httpuv_1.6.16           future.apply_1.20.0    
#>  [61] goftest_1.2-3           glue_1.8.0              nlme_3.1-168           
#>  [64] rhdf5filters_1.20.0     promises_1.3.3          grid_4.5.1             
#>  [67] Rtsne_0.17              cluster_2.1.8.1         reshape2_1.4.4         
#>  [70] spatstat.data_3.1-8     gtable_0.3.6            tidyr_1.3.1            
#>  [73] data.table_1.17.8       XVector_0.48.0          spatstat.geom_3.6-0    
#>  [76] RcppAnnoy_0.0.22        ggrepel_0.9.6           RANN_2.6.2             
#>  [79] pillar_1.11.1           stringr_1.5.2           spam_2.11-1            
#>  [82] RcppHNSW_0.6.0          later_1.4.4             splines_4.5.1          
#>  [85] dplyr_1.1.4             lattice_0.22-7          deldir_2.0-4           
#>  [88] survival_3.8-3          tidyselect_1.2.1        miniUI_0.1.2           
#>  [91] pbapply_1.7-4           knitr_1.50              gridExtra_2.3          
#>  [94] bookdown_0.45           scattermore_1.2         xfun_0.53              
#>  [97] stringi_1.8.7           UCSC.utils_1.4.0        lazyeval_0.2.2         
#> [100] yaml_2.3.10             evaluate_1.0.5          codetools_0.2-20       
#> [103] tibble_3.3.0            BiocManager_1.30.26     cli_3.6.5              
#> [106] uwot_0.2.3              xtable_1.8-4            reticulate_1.43.0      
#> [109] systemfonts_1.3.1       jquerylib_0.1.4         Rcpp_1.1.0             
#> [112] spatstat.random_3.4-2   globals_0.18.0          png_0.1-8              
#> [115] spatstat.univar_3.1-4   parallel_4.5.1          pkgdown_2.1.3          
#> [118] ggplot2_4.0.0           dotCall64_1.2           listenv_0.9.1          
#> [121] viridisLite_0.4.2       scales_1.4.0            ggridges_0.5.7         
#> [124] purrr_1.1.0             crayon_1.5.3            rlang_1.1.6            
#> [127] cowplot_1.2.0