I'm looking to read some marker data into data structures using Python. So far, I have successfully read every Marker name into a single list (there are 2,000 of those).
The data I have was originally in Excel, but I converted it into a .txt file.
The header data in the file was removed and assigned to variables using readline().
Every line with a marker name begins with a double quotation mark (") so I was able to easily gain that information and store it as a list.
Each line with the data for that marker is indented 2 spaces and there are lines that begin with either "a" , "b" , or "h". I want to get these into a data structure. I've tried both lists and strings, but both are returned as empty. The data under each marker name is a block with the three letters "a", "b", and "h" with each letter representing an individual in a population (there are 250). The tricky thing is that there are 5 letters separated by a single space, but then those 5-letter blocks are separated from other 5-letter blocks by two spaces.
Example:
"BK_12 (a,h,b) ; 1"
b a a a b a b a a a b a b a a a a a a a a a a b b a a b a h b
a a a a a a a a a a a a a a a a b a a a a h a a a a a a a a h
a a b a a a h a a a a h a h a a a a a a a a b a a a a a a h a
a a a b a a a a a a a a b a a b b a b a h a b a a a b a a a h
a a a a
That part I don't really need help with, but just included for reference of how the file looks. My ultimate goal is to use phenotype data to find markers associated with a specific phenotype.
I used a for loop to accomplish this so far. My code is below. EDIT: I tried indexing from position 2, rather an searching from position 0 for an empty space. I thought this would work. The else: statement was meant to tell me whether or not it was recognizing the elif statements. Nothing was returned, so I'm assuming it is working in that regard, but it isn't appending.
Markers = []
Genotype_Data = []
for line in infile:
line=line.rstrip()
if (line[0] == '"'):
line=line.rstrip()
Markers.append(line)
elif (line[2] == 'a'):
line=line.rstrip()
Genotype_Data.append(line)
elif (line[2] == 'b'):
line=line.rstrip()
Genotype_Data.append(line)
elif (line[2] == 'h'):
line=line.rstrip()
Genotype_Data.append(line)
else:
print("Something isn't right!")