2

我有两组描述原子位置的数据。它们位于我想比较的单独文件中,旨在通过它们的坐标识别匹配的原子。两种情况下的数据都如下所示,最多有 1000 个左右的条目。这些文件具有不同的长度,因为它们描述了不同大小的系统并具有以下格式:

   1   ,    0.000000000000E+00  0.000000000000E+00    
   2   ,   0.000000000000E+00  2.468958660000E+00  
   3   ,    0.000000000000E+00 -2.468958660000E+00  
   4   ,   2.138180920454E+00 -1.234479330000E+00  
   5   ,    2.138180920454E+00  1.234479330000E+00

第一列是条目 ID,第二列是 x,y 中的一组坐标。

我想做的是比较两组数据中的坐标,识别匹配项和相应的 ID,例如“文件 1 中的条目 3 对应于文件 2 中的条目 6”。我将使用此信息来更改文件 2 中的坐标值。

我已经逐行阅读了这些文件,并使用命令将它们分成每行两个条目,然后将它们放入一个列表中,但是对于如何指定比较位有点困惑 - 特别是告诉它比较第二个条目,同时能够调用第一个条目。我想它需要循环吗?

到目前为止,代码看起来像这样:

open1 = open('./3x3supercell_coord_clean','r')
openA = open('./6x6supercell_coord_clean','r')

small_list=[]

for line in open1:
    stripped_small_line = line.strip()
    column_small = stripped_small_line.split(",") 
    small_list.append(column_small)

big_list=[]

for line in openA:
    stripped_big_line = line.strip()
    column_big = stripped_big_line.split(",")
    big_list.append(column_big)

print small_list[2][1] #prints out coords only
4

4 回答 4

2

使用以坐标为键的字典。

data1 = """1   ,    0.000000000000E+00  0.000000000000E+00    
   2   ,   0.000000000000E+00  2.468958660000E+00  
   3   ,    0.000000000000E+00 -2.468958660000E+00  
   4   ,   2.138180920454E+00 -1.234479330000E+00  
   5   ,    2.138180920454E+00  1.234479330000E+00"""

# Read data1 into a list of tupes (id, x, y)
coords1 = [(int(line[0]), float(line[2]), float(line[3])) for line in
           (line.split() for line in data1.split("\n"))]

# This dictionary will map (x, y) -> id
coordsToIds = {}

# Add coords1 to this dictionary.
for id, x, y in coords1:
    coordsToIds[(x, y)] = id

# Read coords2 the same way.
# Left as an exercise to the reader.

# Look up each of coords2 in the dictionary.
for id, x, y in coords2:
    if (x, y) in coordsToIds:
        print(coordsToIds[(x, y)] # the ID in coords1

请注意,比较浮点数始终是一个问题。

于 2013-05-21T14:39:52.230 回答
1

If all you are doing is trying to compare the second element of each element in two lists, that can be done by having each coord compared against each coord in the opposite file. This is definitely not the fastest way to go about it, but it should get you the results you need.It scans through small list, and checks every small_entry[1] (the coordinate) against every coordinate for each entry in big_list

for small_entry in small_list:
    for big_entry in big_list:
        if small_entry[1] == big_entry[1] :
            print(small_entry[0] + "matches" +  big_entry[0])

something like this?

于 2013-05-21T14:32:01.837 回答
0

这是一种使用字典的方法:

coords = {}

with open('first.txt', 'r') as first_list:
    for i in first_list:
        pair = [j for j in i.split(' ') if j]
        coords[','.join(pair[2:4])] = pair[0]
        #reformattted coords used as key "2.138180920454E+00,-1.234479330000E+00"

with open('second.txt', 'r') as second_list:
    for i in second_list:
        pair = [j for j in i.split(' ') if j]
        if ','.join(pair[2:4]) in coords:
            #reformatted coords from second list checked for presence in keys of dictionary
            print coords[','.join(pair[2:4])], pair[0]

这里发生的事情是文件 A 中的每个坐标(您已声明将是不同的),作为键存储到字典中。然后,关闭第一个文件并打开第二个文件。第二个列表的坐标被打开,重新格式化以匹配字典键的保存方式并检查成员资格。如果列表 B 中的坐标字符串在字典中coords,则该对存在于两个列表中。然后它从第一个和第二个列表中打印关于该匹配的 ID。

字典查找要快得多 O(1)。这种方法还具有不需要将所有数据都保存在内存中以检查(仅一个列表)以及不担心类型转换的优点,例如浮点/整数转换。

于 2013-05-21T15:55:54.047 回答
0

通过以下方式构建两个字典:

# do your splitting to populate two dictionaries of this format:
# mydata1[Coordinate] = ID

# i.e.
for line in data1.split():
    coord = line[2] + ' ' + line[3]
    id = line[0]
    mydata1[coord] = id
for line in data2.split():
    coord = line[2] + ' ' + line[3]
    id = line[0]
    mydata2[coord] = id


#then we can use set intersection to find all coordinates in both key sets
set1=set(mydata1.keys())
set2=set(mydata2.keys())
intersect = set1.intersection(set2)

for coordinate in intersect:
  print ' '.join(["Coordinate", str(coordinate), "found in set1 id", set1[coordinate]), "and set2 id", set2[coordinate])])
于 2013-05-21T15:32:40.397 回答