Detailed look into the mapping between AnnData and SingleCellExperiment objects

{anndataR} allows users to convert to and from SingleCellExperiment and AnnData objects. This can be done with or without extra user input as to which fields and slots of the respective objects should be converted and should be put where. Please take into account that lossless conversion is not always possible between AnnData and SingleCellExperiment, and please inspect the object before and after inspection to ensure that all data is correctly converted.

library(anndataR)
library(SingleCellExperiment)

We first generate a sample dataset to work with.

ad <- generate_dataset(
  n_obs = 10L,
  n_var = 20L,
  x_type = "numeric_matrix",
  layer_types = c("integer_matrix", "numeric_rsparse"),
  obs_types = c("integer", "numeric", "factor"),
  var_types = c("character", "numeric", "logical"),
  obsm_types = c("numeric_matrix", "numeric_csparse"),
  varm_types = c("numeric_matrix", "numeric_csparse"),
  obsp_types = c("numeric_matrix", "numeric_csparse"),
  varp_types = c("integer_matrix", "numeric_matrix"),
  uns_types = c("vec_integer", "vec_character", "df_integer"),
  format = "AnnData"
)

# add PCA reduction
ad$obsm[["X_pca"]] <- matrix(1:50, 10, 5)
ad$varm[["PCs"]] <- matrix(1:100, 20, 5)

ad$obsm[["X_umap"]] <- matrix(1:20, 10, 2)

Convert AnnData objects to SingleCellExperiment objects

Implicit conversion

{anndataR} will try to make a reasonable guess of which AnnData slots should end up in which SingleCellExperiment slots. A SingleCellExperiment object (this is converted from an AnnData object) consists of assays, colData, rowData, metadata, reducedDims, colPairs, rowPairs and metadata.

Each of these slots can be customized by the user by providing a mapping. We will go more into detail on these user-specified mappings in the mapping section.

By default, anndataR will try to guess a reasonable mapping. If you do not want this to happen, and you want nothing to be converted to a slot, you can pass an empty list.

Here, we showcase what happens if you do not provide any mapping for the conversion.

sce <- ad$as_SingleCellExperiment()
sce
#> class: SingleCellExperiment 
#> dim: 20 10 
#> metadata(3): vec_integer vec_character df_integer
#> assays(3): integer_matrix numeric_rsparse X
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(3): character numeric logical
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(3): integer numeric factor
#> reducedDimNames(4): numeric_matrix numeric_csparse X_pca X_umap
#> mainExpName: NULL
#> altExpNames(0):

In the following subsections, we detail how each of these implicit conversions work.

assays and x_mapping

In an AnnData object, count matrices can be present in the X slot or in the layers slot. In a SingleCellExperiment object, count matrices are stored in the assays slot, as a named list.

By default, the X slot and all the elements of the layers slot will be stored in the assays slot of the SingleCellExperiment object, with the same names as in the AnnData object.

In the below example, we will convert an AnnData object with a counts layer and two other layers to a SingleCellExperiment object. In order for the implicit conversion to work, we use the default assays_mapping = TRUE and do not set the x_mapping argument. We explicitly pass FALSE to the other mapping arguments for clarity in the resulting object. This ensures that nothing gets converted to the respective slots.

sce_layers <- ad$as_SingleCellExperiment(
  colData_mapping = FALSE,
  rowData_mapping = FALSE,
  reducedDims_mapping = FALSE,
  colPairs_mapping = FALSE,
  rowPairs_mapping = FALSE,
  metadata_mapping = FALSE
)

sce_layers
#> class: SingleCellExperiment 
#> dim: 20 10 
#> metadata(0):
#> assays(3): integer_matrix numeric_rsparse X
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(0):
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(0):
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

We can see that indeed, the X slot got stored as counts and the layers got stored as assays with the same name.

reductions

A dimensionality reduction can consist of multiple parts that are stored separately in the AnnData object. Take for example the very common PCA reduction. Usually, the principal components are stored in the obsm slot (usually called X_pca), the loadings in the varm slot (usually called PCs) and the explained variance in the uns slot (usually called pca).

In a SingleCellExperiment object, the reduced dimensions are usually stored in the reducedDims slot, as either an element of a named list, or as a LinearEmbeddingMatrix. In the first case, only the reduced dimensions are stored, in the second case, the loadings and associated metadata are stored as well.

We guess the mapping of the reductions the same way as we do for the Seurat conversion: - All items in the obsm slot are stored as a matrix in the reducedDims slot - If the obsm slot contains a X_pca slot, we will also store the associated loadings (in varm) as a LinearEmbeddingMatrix

sce_dimred <- ad$as_SingleCellExperiment(
  colData_mapping = FALSE,
  rowData_mapping = FALSE,
  colPairs_mapping = FALSE,
  rowPairs_mapping = FALSE,
  metadata_mapping = FALSE
)

reducedDims(sce_dimred)
#> List of length 4
#> names(4): numeric_matrix numeric_csparse X_pca X_umap

We can see that indeed, the pca dimred got converted to a LinearEmbeddingMatrix, comprising of the information in the obsm and varm slots. The umap dimred got stored as a reducedDims slot, and consists only of the reduced dimensions in the obsm slot.

colData, rowData, colPairs, rowPairs, metadata

The other SingleCellExperiment slots are easy one-to-one mappings of AnnData slots. We will assume that all colData is stored in the obs slot, all rowData is stored in the var slot, all colPairs are stored in the obsp slot and all rowPairs are stored in the varp slot, and all metadata is stored in the uns slot.

sce_implicit <- ad$as_SingleCellExperiment(
  assays_mapping = FALSE,
  reducedDims_mapping = FALSE
)

sce_implicit
#> class: SingleCellExperiment 
#> dim: 20 10 
#> metadata(3): vec_integer vec_character df_integer
#> assays(1): X
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(3): character numeric logical
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(3): integer numeric factor
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

Explicit conversion

Each of the conversions can be customized, up to a point, by providing a mapping. You provide this in the form of a named list, where the names are the names of the SingleCellExperiment slots and the values are the names of the AnnData slots.

We will give an example for each of the mappings.

assays_mapping

sce_assays <- ad$as_SingleCellExperiment(
  assays_mapping = c(counts = NA, layer1 = "integer_matrix", layer2 = "numeric_rsparse"),
  colData_mapping = FALSE,
  rowData_mapping = FALSE,
  reducedDims_mapping = FALSE,
  colPairs_mapping = FALSE,
  rowPairs_mapping = FALSE,
  metadata_mapping = FALSE
)
sce_assays
#> class: SingleCellExperiment 
#> dim: 20 10 
#> metadata(0):
#> assays(4): counts layer1 layer2 X
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(0):
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(0):
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

You can see that the X slot got stored as counts, and the layers got stored as assays with the names layer1 and layer2. We can also provide the x_mapping argument, which will dictate where the X slot gets stored, but then we should omit it from the assays_mapping argument.

sce_assays_x <- ad$as_SingleCellExperiment(
  x_mapping = "counts",
  assays_mapping = c(layer1 = "integer_matrix", layer2 = "numeric_rsparse"),
  colData_mapping = c(),
  rowData_mapping = c(),
  reducedDims_mapping = c(),
  colPairs_mapping = c(),
  rowPairs_mapping = c(),
  metadata_mapping = c()
)
#> Warning: The `colData_mapping` argument is empty, setting it to
#> FALSE
#> Warning: The `rowData_mapping` argument is empty, setting it to
#> FALSE
#> Warning: The `reducedDims_mapping` argument is empty, setting it to
#> FALSE
#> Warning: The `colPairs_mapping` argument is empty, setting it to
#> FALSE
#> Warning: The `rowPairs_mapping` argument is empty, setting it to
#> FALSE
#> Warning: The `metadata_mapping` argument is empty, setting it to
#> FALSE
sce_assays_x
#> class: SingleCellExperiment 
#> dim: 20 10 
#> metadata(0):
#> assays(3): counts layer1 layer2
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(0):
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(0):
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

colData, rowData, colPairs, rowPairs, metadata

sce <- ad$as_SingleCellExperiment(
  assays_mapping = FALSE,
  colData_mapping = c(coldata1 = "integer", coldata2 = "numeric"),
  rowData_mapping = c(rowdata1 = "character", rowdata2 = "logical"),
  reducedDims_mapping = FALSE,
  colPairs_mapping = c(
    colPairs_dense = "numeric_matrix",
    colPairs_sparse = "numeric_csparse"
  ),
  rowPairs_mapping = c(
    rowPairs1 = "integer_matrix",
    rowPairs2 = "numeric_matrix"
  ),
  metadata_mapping = c(
    vector1 = "vec_integer",
    vector2 = "vec_character",
    df = "df_integer"
  )
)

sce
#> class: SingleCellExperiment 
#> dim: 20 10 
#> metadata(3): vector1 vector2 df
#> assays(1): X
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(2): rowdata1 rowdata2
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(2): coldata1 coldata2
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

Here you can see that the AnnData slots (specified as values in the mapping) get stored in the corresponding SingleCellExperiment slots (the columns of which are the names of the mapping). e.g. the integer column of the AnnData obs in the AnnData object gets stored in the coldata1 column of the colData of the SingleCellExperiment object.

reductions_mapping

sce <- ad$as_SingleCellExperiment(
  x_mapping = "counts",
  assays_mapping = FALSE,
  colData_mapping = FALSE,
  rowData_mapping = FALSE,
  reducedDims_mapping = list(
    "pca" = c(sampleFactors = "X_pca", featureLoadings = "PCs"),
    "umap" = c(sampleFactors = "X_umap")
  ),
  colPairs_mapping = FALSE,
  rowPairs_mapping = FALSE,
  metadata_mapping = FALSE
)
sce
#> class: SingleCellExperiment 
#> dim: 20 10 
#> metadata(0):
#> assays(1): counts
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(0):
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(0):
#> reducedDimNames(2): pca umap
#> mainExpName: NULL
#> altExpNames(0):

Here, we explicitly provide a mapping for the reductions slot. We specify that we will store the reduction, characterized by the X_pca data in the obsm slot and the PCs data in the varm slot, as a LinearEmbeddingMatrix in the reducedDims slot under the name pca. We will also store the reduction characterized by the X_umap data in the obsm slot as a reducedDims slot under the name umap.

Convert SingleCellExperiment objects to AnnData objects

The reverse, converting SingleCellExperiment objects to AnnData objects works in a similar way. There’s an implicit conversion, where we attempt a standard conversion, but the user can always provide an explicit mapping as well.

ad <- as_AnnData(sce)

Implicit conversion

layers

If there is no layer_mapping or x_mapping provided, we will try to guess the mapping. We will simply map all the assays in the SingleCellExperiment object to the layers slot of the AnnData object. Watch out: if there is no x_mapping provided, none of the assays will be stored in the X slot of the AnnData object, and it will remain empty.

ad_assays <- as_AnnData(
  sce,
  obs_mapping = FALSE,
  var_mapping = FALSE,
  obsm_mapping = FALSE,
  varm_mapping = FALSE,
  obsp_mapping = FALSE,
  varp_mapping = FALSE,
  uns_mapping = FALSE
)
ad_assays
#> AnnData object with n_obs × n_vars = 10 × 20
#>     layers: 'counts'

obsm and varm

If there is no reducedDims_mapping provided, we will try to guess the mapping. This considers both the obsm_mapping and the varm_mapping arguments. By default, we will not map anything to the varm slot, as there is no direct equivalent in the SingleCellExperiment object. However, if the reducedDims slot contains a LinearEmbeddingMatrix, we will store the loadings in the varm slot.

We will store the reduced dimensions in the obsm slot, with the name of the reducedDims prepended by an X_ as the name of the obsm slot.

ad_reductions <- as_AnnData(
  sce,
  obs_mapping = FALSE,
  var_mapping = FALSE,
  obsp_mapping = FALSE,
  varp_mapping = FALSE,
  uns_mapping = FALSE
)
ad_reductions
#> AnnData object with n_obs × n_vars = 10 × 20
#>     obsm: 'pca', 'umap'
#>     varm: 'pca'
#>     layers: 'counts'

obs, var, obsp, varp, uns

The conversion of obs, var, obsp, varp and uns is straightforward: there’s a one-to-one mapping between the SingleCellExperiment slots and the AnnData slots. We assume that all colData is stored in the obs slot, all rowData is stored in the var slot, all colPairs are stored in the obsp slot and all rowPairs are stored in the varp slot, and all metadata is stored in the uns slot.

ad <- as_AnnData(
  sce
)
ad
#> AnnData object with n_obs × n_vars = 10 × 20
#>     obsm: 'pca', 'umap'
#>     varm: 'pca'
#>     layers: 'counts'

Explicit conversion

It’s also possible to provide an explicit mapping for the conversion from SingleCellExperiment to AnnData. For all of the mappings (layers_mapping, obs_mapping, var_mapping, obsp_mapping, varp_mapping, and uns_mapping), you can provide a named vector where the names are the names in the AnnData object and the values are the names in the SingleCellExperiment object.

The obsm_mapping and varm_mapping work in the same way - they’re named vectors where each name corresponds to a key in AnnData’s obsm/varm, and each value corresponds to the name of a reducedDim in SCE.

ad_obsm <- as_AnnData(
  sce,
  layers_mapping = FALSE,
  obs_mapping = FALSE,
  obsm_mapping = c(X_pca = "pca", X_umap = "umap"),
  varm_mapping = c(PCs = "pca"),
  obsp_mapping = FALSE,
  varp_mapping = FALSE,
  uns_mapping = FALSE
)

ad_obsm
#> AnnData object with n_obs × n_vars = 10 × 20
#>     obsm: 'X_pca', 'X_umap'
#>     varm: 'PCs'

Louise Deconinck

Convert AnnData objects to SingleCellExperiment objects

Implicit conversion

assays and x_mapping

reductions

colData, rowData, colPairs, rowPairs, metadata

Explicit conversion

assays_mapping

colData, rowData, colPairs, rowPairs, metadata

reductions_mapping

Convert SingleCellExperiment objects to AnnData objects

Implicit conversion

layers

obsm and varm

obs, var, obsp, varp, uns

Explicit conversion