Guidelines to use scanpy layers

polklin · October 13, 2022, 12:44pm

Hello everyone,

When using scanpy, I am frequently facing issues about what exact data should I use (raw counts, CPM, log, z-score …) to apply tools / plots function. By default, these functions will apply on adata.X (or on adata.raw if is has been stored beforehand, and we select use_raw=True).

Ideally I would like to have the choice on which exact data I want to apply a function. I am starting to use layers for this purpose but I do not find documentation on how exactly they should be used ?

Let’s say I have raw counts stored in adata.X. Does it make sense to store different layers of my data, with different preprocessing strategies ?

# store CPM counts 
xd.layers["CPM"] = xd.X.copy()
sc.pp.normalize_total(xd, target_sum=1e6, layer="CPM")

# store log on CPM counts
xd.layers["log(CPM)"] = xd.layers["CPM"].copy()
sc.pp.log1p(xd, layer="log(CPM)")

Is it the right way to do ? Do you have any other suggestions or guidelines ?

Thanks you very much !
Best,
Paul

Zethson · October 17, 2022, 10:37am

Yes, this makes sense and is the current intended way to do it.

To answer the question “when to use which version of the data”, I’d encourage you to read this paper: https://www.embopress.org/doi/full/10.15252/msb.20188746

polklin · October 17, 2022, 12:50pm

Thank you very much @Zethson

natalia.nutella · February 2, 2023, 5:59am

Is this really the intended way?

What is in adata.X before running the code below? Is it the raw counts or normalized counts (CPM?)

Quoting OP’s code with some comments:

# Store CPM counts  

# My comment: (assuming `adata.X` already contains CPMs, but then why normalize again?)
adata.layers["CPM"] = adata.X.copy() 

# Normalize adata.layers['CPM'] 
# My comment: (adata.layers['CPM']  is currently exquivalent to `adata.X.`
# This will update `adata.X` to the result of `normalize_total`.
sc.pp.normalize_total(adata, target_sum=1e6, layer="CPM")

# store log on CPM counts

# My comment: This will create a new layer ["log(CPM)"] which has the exact contents of ['CPM']
adata.layers["log(CPM)"] = adata.layers["CPM"].copy()
# T# My comment: This will log transform the contents of ["log(CPM)"], which is equal to ['CPM'].
sc.pp.log1p(adata, layer="log(CPM)")

I don’t know, but this seems very redundant to and unclear to me. My guess is OP is interpreting the creation of layers incorrectly.

I think what OP is thinking is more along the lines of:

# Save untransformed counts in new layer 'counts' so we have a backup once we start processing the data below:

adata.layers['counts'] = adata.X.copy()  # Assuming adata.X is unprocessed counts.

# Normalize adata.X (counts) and save result in layer['CPM']:

anndata.layers['CPM'] = sc.pp.normalize_total(adata, target_sum=1e6, copy=True) # copy=True: do not update adata.X

# Log-transformed normalized data in `.layers['CPM']` and save result in layer['log(CPM)']:
adata.layers['log(CPM)'] = sc.pp.log1p(adata, layer='CPM', copy=True) # copy=True: do not update adata.X

Topic		Replies	Views
Does processing the data always overwrite the raw counts? scanpy	3	2257	April 22, 2024
Correct use of .copy() when working with layers scanpy	0	1000	February 2, 2023
Differences between .X, .raw.X, and .raw in anndata object anndata	6	6633	May 10, 2024
How could `adata.raw.X` contain non-integer values? scanpy anndata	2	38	July 13, 2025
Layer (counts) loss after adata.raw.to_adata() scanpy	0	945	April 13, 2023

Guidelines to use scanpy layers

Related topics