Detailed look into the mapping between AnnData and SingleCellExperiment objects
Louise Deconinck
Source:vignettes/singlecellexperiment_mapping.Rmd
singlecellexperiment_mapping.Rmd
{anndataR} allows users to convert to and from SingleCellExperiment and AnnData objects. This can be done with or without extra user input as to which fields and slots of the respective objects should be converted and should be put where. Please take into account that lossless conversion is not always possible between AnnData and SingleCellExperiment, and please inspect the object before and after inspection to ensure that all data is correctly converted.
We first generate a sample dataset to work with.
ad <- generate_dataset(
n_obs = 10L,
n_var = 20L,
x_type = "numeric_matrix",
layer_types = c("integer_matrix", "numeric_rsparse"),
obs_types = c("integer", "numeric", "factor"),
var_types = c("character", "numeric", "logical"),
obsm_types = c("numeric_matrix", "numeric_csparse"),
varm_types = c("numeric_matrix", "numeric_csparse"),
obsp_types = c("numeric_matrix", "numeric_csparse"),
varp_types = c("integer_matrix", "numeric_matrix"),
uns_types = c("vec_integer", "vec_character", "df_integer"),
format = "AnnData"
)
# add PCA reduction
ad$obsm[["X_pca"]] <- matrix(1:50, 10, 5)
ad$varm[["PCs"]] <- matrix(1:100, 20, 5)
ad$obsm[["X_umap"]] <- matrix(1:20, 10, 2)
Convert AnnData objects to SingleCellExperiment objects
Implicit conversion
{anndataR} will try to make a reasonable guess of
which AnnData slots should end up in which SingleCellExperiment slots. A
SingleCellExperiment object (this is converted from an AnnData object)
consists of assays
, colData
,
rowData
, metadata
, reducedDims
,
colPairs
, rowPairs
and
metadata
.
Each of these slots can be customized by the user by providing a mapping. We will go more into detail on these user-specified mappings in the mapping section.
By default, anndataR will try to guess a reasonable mapping. If you do not want this to happen, and you want nothing to be converted to a slot, you can pass an empty list.
Here, we showcase what happens if you do not provide any mapping for the conversion.
sce <- ad$as_SingleCellExperiment()
sce
#> class: SingleCellExperiment
#> dim: 20 10
#> metadata(3): vec_integer vec_character df_integer
#> assays(3): integer_matrix numeric_rsparse X
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(3): character numeric logical
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(3): integer numeric factor
#> reducedDimNames(4): numeric_matrix numeric_csparse X_pca X_umap
#> mainExpName: NULL
#> altExpNames(0):
In the following subsections, we detail how each of these implicit conversions work.
assays and x_mapping
In an AnnData object, count matrices can be present in the
X
slot or in the layers
slot. In a
SingleCellExperiment object, count matrices are stored in the
assays
slot, as a named list.
By default, the X
slot and all the elements of the
layers
slot will be stored in the assays
slot
of the SingleCellExperiment object, with the same names as in the
AnnData object.
In the below example, we will convert an AnnData object with a
counts
layer and two other layers to a
SingleCellExperiment
object. In order for the implicit
conversion to work, we use the default
assays_mapping = TRUE
and do not set the
x_mapping
argument. We explicitly pass FALSE
to the other mapping arguments for clarity in the resulting object. This
ensures that nothing gets converted to the respective slots.
sce_layers <- ad$as_SingleCellExperiment(
colData_mapping = FALSE,
rowData_mapping = FALSE,
reducedDims_mapping = FALSE,
colPairs_mapping = FALSE,
rowPairs_mapping = FALSE,
metadata_mapping = FALSE
)
sce_layers
#> class: SingleCellExperiment
#> dim: 20 10
#> metadata(0):
#> assays(3): integer_matrix numeric_rsparse X
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(0):
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(0):
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
We can see that indeed, the X
slot got stored as
counts
and the layers
got stored as
assays
with the same name.
reductions
A dimensionality reduction can consist of multiple parts that are
stored separately in the AnnData object. Take for example the very
common PCA
reduction. Usually, the principal components are
stored in the obsm
slot (usually called
X_pca
), the loadings in the varm
slot (usually
called PCs
) and the explained variance in the
uns
slot (usually called pca
).
In a SingleCellExperiment object, the reduced dimensions are usually
stored in the reducedDims
slot, as either an element of a
named list, or as a LinearEmbeddingMatrix
. In the first
case, only the reduced dimensions are stored, in the second case, the
loadings and associated metadata are stored as well.
We guess the mapping of the reductions the same way as we do for the
Seurat
conversion: - All items in the obsm
slot are stored as a matrix in the reducedDims
slot - If
the obsm
slot contains a X_pca
slot, we will
also store the associated loadings (in varm
) as a
LinearEmbeddingMatrix
sce_dimred <- ad$as_SingleCellExperiment(
colData_mapping = FALSE,
rowData_mapping = FALSE,
colPairs_mapping = FALSE,
rowPairs_mapping = FALSE,
metadata_mapping = FALSE
)
reducedDims(sce_dimred)
#> List of length 4
#> names(4): numeric_matrix numeric_csparse X_pca X_umap
We can see that indeed, the pca
dimred got converted to
a LinearEmbeddingMatrix
, comprising of the information in
the obsm
and varm
slots. The umap
dimred got stored as a reducedDims
slot, and consists only
of the reduced dimensions in the obsm
slot.
colData, rowData, colPairs, rowPairs, metadata
The other SingleCellExperiment
slots are easy one-to-one
mappings of AnnData
slots. We will assume that all
colData
is stored in the obs
slot, all
rowData
is stored in the var
slot, all
colPairs
are stored in the obsp
slot and all
rowPairs
are stored in the varp
slot, and all
metadata
is stored in the uns
slot.
sce_implicit <- ad$as_SingleCellExperiment(
assays_mapping = FALSE,
reducedDims_mapping = FALSE
)
sce_implicit
#> class: SingleCellExperiment
#> dim: 20 10
#> metadata(3): vec_integer vec_character df_integer
#> assays(1): X
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(3): character numeric logical
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(3): integer numeric factor
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
Explicit conversion
Each of the conversions can be customized, up to a point, by providing a mapping. You provide this in the form of a named list, where the names are the names of the SingleCellExperiment slots and the values are the names of the AnnData slots.
We will give an example for each of the mappings.
assays_mapping
sce_assays <- ad$as_SingleCellExperiment(
assays_mapping = c(counts = NA, layer1 = "integer_matrix", layer2 = "numeric_rsparse"),
colData_mapping = FALSE,
rowData_mapping = FALSE,
reducedDims_mapping = FALSE,
colPairs_mapping = FALSE,
rowPairs_mapping = FALSE,
metadata_mapping = FALSE
)
sce_assays
#> class: SingleCellExperiment
#> dim: 20 10
#> metadata(0):
#> assays(4): counts layer1 layer2 X
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(0):
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(0):
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
You can see that the X
slot got stored as
counts
, and the layers
got stored as assays
with the names layer1
and layer2
. We can also
provide the x_mapping
argument, which will dictate where
the X
slot gets stored, but then we should omit it from the
assays_mapping
argument.
sce_assays_x <- ad$as_SingleCellExperiment(
x_mapping = "counts",
assays_mapping = c(layer1 = "integer_matrix", layer2 = "numeric_rsparse"),
colData_mapping = c(),
rowData_mapping = c(),
reducedDims_mapping = c(),
colPairs_mapping = c(),
rowPairs_mapping = c(),
metadata_mapping = c()
)
#> Warning: The `colData_mapping` argument is empty, setting it to
#> FALSE
#> Warning: The `rowData_mapping` argument is empty, setting it to
#> FALSE
#> Warning: The `reducedDims_mapping` argument is empty, setting it to
#> FALSE
#> Warning: The `colPairs_mapping` argument is empty, setting it to
#> FALSE
#> Warning: The `rowPairs_mapping` argument is empty, setting it to
#> FALSE
#> Warning: The `metadata_mapping` argument is empty, setting it to
#> FALSE
sce_assays_x
#> class: SingleCellExperiment
#> dim: 20 10
#> metadata(0):
#> assays(3): counts layer1 layer2
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(0):
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(0):
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
colData, rowData, colPairs, rowPairs, metadata
sce <- ad$as_SingleCellExperiment(
assays_mapping = FALSE,
colData_mapping = c(coldata1 = "integer", coldata2 = "numeric"),
rowData_mapping = c(rowdata1 = "character", rowdata2 = "logical"),
reducedDims_mapping = FALSE,
colPairs_mapping = c(
colPairs_dense = "numeric_matrix",
colPairs_sparse = "numeric_csparse"
),
rowPairs_mapping = c(
rowPairs1 = "integer_matrix",
rowPairs2 = "numeric_matrix"
),
metadata_mapping = c(
vector1 = "vec_integer",
vector2 = "vec_character",
df = "df_integer"
)
)
sce
#> class: SingleCellExperiment
#> dim: 20 10
#> metadata(3): vector1 vector2 df
#> assays(1): X
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(2): rowdata1 rowdata2
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(2): coldata1 coldata2
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
Here you can see that the AnnData
slots (specified as
values in the mapping) get stored in the corresponding
SingleCellExperiment
slots (the columns of which are the
names of the mapping). e.g. the integer
column of the
AnnData
obs
in the AnnData
object
gets stored in the coldata1
column of the
colData
of the SingleCellExperiment
object.
reductions_mapping
sce <- ad$as_SingleCellExperiment(
x_mapping = "counts",
assays_mapping = FALSE,
colData_mapping = FALSE,
rowData_mapping = FALSE,
reducedDims_mapping = list(
"pca" = c(sampleFactors = "X_pca", featureLoadings = "PCs"),
"umap" = c(sampleFactors = "X_umap")
),
colPairs_mapping = FALSE,
rowPairs_mapping = FALSE,
metadata_mapping = FALSE
)
sce
#> class: SingleCellExperiment
#> dim: 20 10
#> metadata(0):
#> assays(1): counts
#> rownames(20): gene1 gene2 ... gene19 gene20
#> rowData names(0):
#> colnames(10): cell1 cell2 ... cell9 cell10
#> colData names(0):
#> reducedDimNames(2): pca umap
#> mainExpName: NULL
#> altExpNames(0):
Here, we explicitly provide a mapping for the reductions
slot. We specify that we will store the reduction, characterized by the
X_pca
data in the obsm
slot and the
PCs
data in the varm
slot, as a
LinearEmbeddingMatrix
in the reducedDims
slot
under the name pca
. We will also store the reduction
characterized by the X_umap
data in the obsm
slot as a reducedDims
slot under the name
umap
.
Convert SingleCellExperiment objects to AnnData objects
The reverse, converting SingleCellExperiment
objects to
AnnData
objects works in a similar way. There’s an implicit
conversion, where we attempt a standard conversion, but the user can
always provide an explicit mapping as well.
ad <- as_AnnData(sce)
Implicit conversion
layers
If there is no layer_mapping
or x_mapping
provided, we will try to guess the mapping. We will simply map all the
assays in the SingleCellExperiment object to the layers
slot of the AnnData object. Watch out: if there is no
x_mapping
provided, none of the assays
will be
stored in the X
slot of the AnnData object, and it will
remain empty.
ad_assays <- as_AnnData(
sce,
obs_mapping = FALSE,
var_mapping = FALSE,
obsm_mapping = FALSE,
varm_mapping = FALSE,
obsp_mapping = FALSE,
varp_mapping = FALSE,
uns_mapping = FALSE
)
ad_assays
#> AnnData object with n_obs × n_vars = 10 × 20
#> layers: 'counts'
obsm and varm
If there is no reducedDims_mapping
provided, we will try
to guess the mapping. This considers both the obsm_mapping
and the varm_mapping
arguments. By default, we will not map
anything to the varm
slot, as there is no direct equivalent
in the SingleCellExperiment object. However, if the
reducedDims
slot contains a
LinearEmbeddingMatrix
, we will store the loadings in the
varm
slot.
We will store the reduced dimensions in the obsm
slot,
with the name of the reducedDims
prepended by an
X_
as the name of the obsm
slot.
ad_reductions <- as_AnnData(
sce,
obs_mapping = FALSE,
var_mapping = FALSE,
obsp_mapping = FALSE,
varp_mapping = FALSE,
uns_mapping = FALSE
)
ad_reductions
#> AnnData object with n_obs × n_vars = 10 × 20
#> obsm: 'pca', 'umap'
#> varm: 'pca'
#> layers: 'counts'
obs, var, obsp, varp, uns
The conversion of obs
, var
,
obsp
, varp
and uns
is
straightforward: there’s a one-to-one mapping between the
SingleCellExperiment slots and the AnnData slots. We assume that all
colData
is stored in the obs
slot, all
rowData
is stored in the var
slot, all
colPairs
are stored in the obsp
slot and all
rowPairs
are stored in the varp
slot, and all
metadata
is stored in the uns
slot.
ad <- as_AnnData(
sce
)
ad
#> AnnData object with n_obs × n_vars = 10 × 20
#> obsm: 'pca', 'umap'
#> varm: 'pca'
#> layers: 'counts'
Explicit conversion
It’s also possible to provide an explicit mapping for the conversion
from SingleCellExperiment to AnnData. For all of the mappings
(layers_mapping
, obs_mapping
,
var_mapping
, obsp_mapping
,
varp_mapping
, and uns_mapping
), you can
provide a named vector where the names are the names in the AnnData
object and the values are the names in the SingleCellExperiment
object.
The obsm_mapping
and varm_mapping
work in
the same way - they’re named vectors where each name corresponds to a
key in AnnData’s obsm/varm, and each value corresponds to the name of a
reducedDim in SCE.
ad_obsm <- as_AnnData(
sce,
layers_mapping = FALSE,
obs_mapping = FALSE,
obsm_mapping = c(X_pca = "pca", X_umap = "umap"),
varm_mapping = c(PCs = "pca"),
obsp_mapping = FALSE,
varp_mapping = FALSE,
uns_mapping = FALSE
)
ad_obsm
#> AnnData object with n_obs × n_vars = 10 × 20
#> obsm: 'X_pca', 'X_umap'
#> varm: 'PCs'