1

My dataset is very strange. When I create the Seurat object and load the metadata for it, all of the values in the nCount_RNA are decimal values instead of integers. How should I interpret this? Is there an issue with the data itself or something I can do to work around this? I ask because later on in my analysis, the functions can't seem to find the nCount_RNA object, and I believe the decimal values are the reason why.

Metadata for Seurat Object

Here is the code I used to create this object:

#Loading in the data ----------------------------------------------------------

filePaths = getGEOSuppFiles("GSE124395") 
tarF <- list.files(path = "./GSE124395/", pattern = "*.tar", full.names = TRUE) 
untar(tarF, exdir = "./GSE124395/") 
gzipF <- list.files(path = "./GSE124395/", pattern = "*.gz", full.names = TRUE) 
ldply(.data = gzipF, .fun = gunzip) 


# Creating the matrix -----------------------------------------------------------

P301_3_matrix <- read.delim(file = './GSE124395//GSM3531672_P301_3_CRYOMIXED11.coutt.csv') 
P301_3_matrix <- data.frame(P301_3_matrix[,-1], row.names=P301_3_matrix[,1]) 
P301_3_matrix <- as.matrix(P301_3_matrix) #<- makes the excel file into a matrix 

P301_3_colname <- read.table(file = './GSE124395//GSE124395_celseq_barcodes.192.txt', header = FALSE, row.names = 1) 
P301_3_colname <- data.frame(P301_3_colname[,-1], col=P301_3_colname[,1]) 
P301_3_colname <- as.matrix(P301_3_colname) 
colnames(P301_3_matrix) <- P301_3_colname[,1] 

colnames(P301_3_matrix) <- paste(colnames(P301_3_matrix), "CryoMixed11", sep = "_") 
P301_3_pdat <- data.frame("samples" = colnames(P301_3_matrix), "treatment" = "CryoMixed") 


#Creating the Seurat object ----------------------------------------------------

sobj<- CreateSeuratObject(counts = P301_3_matrix, min.cells = 0, min.features=1, project = "Liver_Cell_Atlas")
sobj <- saveRDS(sobj,file="JoinedMatrixNoFilters.rds")

Hopefully this isn't too vague, and thanks for reading!

4

1 回答 1

1

The file you read in, it is normalized somehow, and is definitely not the count data:

P301_3_matrix = read.delim('GSM3531672_P301_3_CRYOMIXED11.coutt.csv.gz',row.names=1)

head(colSums(P301_3_matrix))
        X1         X2         X3         X4         X5         X6 
  205.2744 22457.6142  1232.4626 14193.6406 15372.4642 18808.8838 

If you read the details on processing, it writes:

Based on binomial statistics, the number of observed UMIs were converted into transcript counts (Grün et al., 2014)

So most likely you have to ask the authors for the count table or simply convert to integer, proceed and hope nothing goes wrong.

于 2021-03-20T13:53:19.687 回答