Documentation request, import formats for TPM_txt files

DRSEI · March 17, 2022, 6:52pm

I have a TPM file that I am importing in my colab script using pandas and scanpy. The txt file starts with 1 row X 55737 columns or another way around. Scanpy/panda is considering the first-row value as a heading. How to read TXT?
I have tried multiple ways to read the files such as

data1 = pd.read_csv(GSE120575_Sade_Feldman_melanoma_single_cells_TPM_GEO.txt’, sept=’\t’)
1 rows × 55737 columns
test1 = pd.read_csv(‘GSE120575_Sade_Feldman_melanoma_single_cells_TPM_GEO.txt’)

adata = sc.read_text(GSE120575_Sade_Feldman_melanoma_single_cells_TPM_GEO.txt).transpose()

adata = sc.read(GSE120575_Sade_Feldman_melanoma_single_cells_TPM_GEO.txt, ext='txt').transpose()

Warning: Total number of columns (55737) exceeds max_columns (20) limiting to first (20) columns.

same as this file formate too “PP001swap.filtered.matrix.txt”
the file was downloaded from the below links :

GEO Accession viewer
GEO Accession viewer
I really appreciate your help.

I am very new to bioinformatics/scanpy

Valentine_Svensson · March 20, 2022, 4:42am

Hi,

The file you are trying to read in the screenshot is a ‘tab separated values’ (TSV) file. The ‘borders’ between entries in the table are separated by tab characters ('\t'), while the pd.read_csv() function assumes that entries are separated by comma characters (','). So for the pd.read_csv() function it looks like there is just one table entry on each row. You can tell pd.read_csv() to separate values by tab characters instead by doing e.g. pd.read_csv(file_path, sep = '\t').

/Valentine

DRSEI · March 22, 2022, 3:38am

Thank you @Valentine_Svensson

Topic		Replies	Views
Reading in data with scanpy.read_10x_mtx gives back KeyError:1 when features.tsv contains only one column (gene symbols) and ValueError when adding 2 columns (gene ids and feature types)( scanpy	0	864	May 17, 2023
Converting tab-delimited files to adata in a memory-efficient way scanpy	3	214	February 21, 2024
Does AnnData read cell (column) by genes (rows) or the other way around? scanpy	2	615	August 5, 2022
Data fomr new spatial transcriptomics from 10x squidpy	5	1025	March 8, 2023
Difference between tl.rank_genes_groups and pl.rank_genes_groups_heatmap scanpy	0	114	November 20, 2024

Documentation request, import formats for TPM_txt files

Related topics