python - 将组合的索引与值相关联

Question

我正在开发一个程序，我需要组合原子之间的距离或 3D 空间中的各个点。这是一个例子：

文件“测试”包含以下信息：

Ti 1.0 1.0 1.0

O 0.0 2.0 0.0

O 0.0 0.0 0.0

Ti 1.0 3.0 4.0

O 2.0 5.0 0.0

我希望我的代码计算点之间距离的所有组合（我已经完成了！），然后，我需要计算一个原子与另一个原子之间的距离小于 2.2 的次数。

这用词令人困惑，所以我将向您展示我到目前为止所得到的。

#!/usr/bin/env python
import sys, math, scipy, itertools
import numpy as np

try:
    infile = sys.argv[1]

except:
    print "Needs file name"
    sys.exit(1)

#opening files for first part
ifile = open(infile, 'r')
coordslist = []

#Creating a file of just coordinates that can be 'mathed on'
for line in ifile:
    pair = line.split()
    atom = (pair[0]); x = float(pair[1]); y = float(pair[2]); z = float(pair[3])
    coordslist += [(x,y,z)]
ifile.close()

#Define distance
def distance(p0,p1):
    return math.sqrt((p0[0] - p1[0])**2 + (p0[1] - p1[1])**2 + (p0[2] - p1[2])**                                          2)

#Initializing for next section
dislist = []
bondslist = []

#Compute distances between all points 1-2, 1-3, 1-4, etc.
for p0, p1 in itertools.combinations(coordslist,2):
    print p0, p1, distance(p0,p1)
    dislist += [distance(p0, p1)]
    if distance(p0,p1) < 2.2:
        bondslist += [(p0, distance(p0,p1))]
print bondslist
print dislist

我不确定列出这些清单是否对我有帮助。到目前为止，他们还没有。

输出是：

(1.0, 1.0, 1.0) (0.0, 2.0, 0.0) 1.73205080757

(1.0, 1.0, 1.0) (0.0, 0.0, 0.0) 1.73205080757

(1.0, 1.0, 1.0) (1.0, 3.0, 4.0) 3.60555127546

(1.0, 1.0, 1.0) (2.0, 5.0, 0.0) 4.24264068712

(0.0, 2.0, 0.0) (0.0, 0.0, 0.0) 2.0

(0.0, 2.0, 0.0) (1.0, 3.0, 4.0) 4.24264068712

(0.0, 2.0, 0.0) (2.0, 5.0, 0.0) 3.60555127546

(0.0, 0.0, 0.0) (1.0, 3.0, 4.0) 5.09901951359

(0.0, 0.0, 0.0) (2.0, 5.0, 0.0) 5.38516480713

(1.0, 3.0, 4.0) (2.0, 5.0, 0.0) 4.58257569496

[((1.0, 1.0, 1.0), 1.7320508075688772), ((1.0, 1.0, 1.0), 1.7320508075688772), ((0.0, 2.0, 0.0), 2.0)]

[1.7320508075688772, 1.7320508075688772, 3.605551275463989, 4.242640687119285, 2.0, 4.242640687119285, 3.605551275463989, 5.0990195135927845, 5.385164807134504, 4.58257569495584]

我需要从这个输出中得到的一件事是每个原子的距离小于 2.2 的次数，例如：

1 2 (because atom 1 has two distances less than 2.2 associated with it)

2 2

3 2 

4 0

5 0

我还需要看看是什么两个原子使小于 2.2 的距离。我这样做是为了计算鲍林费用；这是你需要查看一个原子的地方，确定它有多少键（原子距离小于 2.2 埃），然后查看连接到该原子的原子，看看有多少原子连接到那些。这非常令人沮丧，但这一切都将取决于跟踪每个原子，而不仅仅是它们的组合。数组可能会非常有用。

我已经在这里和这里检查以寻求帮助，我认为我需要以某种方式组合这些方法。任何帮助都将不胜感激！

score 0 · Accepted Answer

在我们开始之前，让我注意在晶体的情况下（我有点怀疑你不是在处理 Ti2O3分子）你应该小心周期性边界条件，即最后两个距离较远的原子每个人都可能更接近相邻细胞中的原子。

如果您知道要使用哪些工具，那么您尝试做的事情就非常简单。您正在寻找一种方法来告诉您一组中所有点之间的成对距离。准确地说pdist，执行此操作的函数称为。scipy.spatial.distance.pdist这可以计算任意维度的任意点集的成对距离，具有任意距离。在您的特定情况下，默认的欧几里得距离就可以了。

一组点的成对矩阵距离（元素[i,j]告诉你点i和之间的距离j）是构造对称的，对角线为零。由于这个原因，通常的实现pdist只返回对角线一侧的非对角线元素，scipy' 的版本也不例外。但是，有一个方便的scipy.spatial.distance.squareform函数可以将包含这种压缩版本的纯非对角对称矩阵的数组转换为完整的。从那里很容易进行后期处理。

这是我要做的：

import numpy as np
import scipy.spatial as ssp

# atoms and positions:
# Ti 1.0 1.0 1.0
# O  0.0 2.0 0.0
# O  0.0 0.0 0.0
# Ti 1.0 3.0 4.0
# O  2.0 5.0 0.0

# define positions as m*n array, where n is the dimensionality (3)
allpos = np.array([[1.,1,1],  # 1. is lazy for dtype=float64
                   [0,2,0], 
                   [0,0,0],
                   [1,3,4],
                   [2,5,0]])

# compute pairwise distances
alldist_condensed = ssp.distance.pdist(allpos)       # vector of off-diagonal elements on one side
alldist = ssp.distance.squareform(alldist_condensed) # full symmetric distance matrix

# set diagonals to nan (or inf) to avoid tainting our output later
fancy_index = np.arange(alldist.shape[0])
alldist[fancy_index,fancy_index] = np.nan

# find index of "near" neighbours
thresh = 2.2
neighbslist = [np.where(alldist[k,:]<thresh)[0] for k in range(alldist.shape[0])]  # the k'th element is an array containing the indices which are "close" to atom number k

# find total number of "near" neighbours
nearnum = [neighbs.size for neighbs in neighbslist] # the k'th element is the number of atoms which are "close" to atom number k

因此，对于您的具体情况，alldist包含完整的距离矩阵：

array([[        nan,  1.73205081,  1.73205081,  3.60555128,  4.24264069],
       [ 1.73205081,         nan,  2.        ,  4.24264069,  3.60555128],
       [ 1.73205081,  2.        ,         nan,  5.09901951,  5.38516481],
       [ 3.60555128,  4.24264069,  5.09901951,         nan,  4.58257569],
       [ 4.24264069,  3.60555128,  5.38516481,  4.58257569,         nan]])

如您所见，我手动将对角线元素设置为np.nan. 这是必要的，因为我打算检查该矩阵中小于的元素thresh，并且对角线中的零肯定符合条件。在我们的例子np.inf中，这些元素同样是一个不错的选择，但是如果你想要得到彼此之间的距离比更远thresh的点怎么办？显然对于那种情况-np.inf或者np.nan是可以接受的（所以我选择了后者）。

近邻的后处理使我们脱离了 numpy 的领域（你应该尽可能地坚持使用 numpy，这通常是最快的）。对于每个原子，您想要获取靠近它的那些原子的列表。好吧，这不是每个原子都具有恒定长度的对象，因此您不能将其很好地存储在数组中。合乎逻辑的结论是使用 a list，但是您可以使用所有 python 并使用列表推导来构造此列表（上面的提醒）：

neighbslist = [np.where(alldist[k,:]<thresh)[0] for k in range(alldist.shape[0])]  # the k'th element is an array containing the indices which are "close" to atom number k

这里np.where将找到k距离足够小的行内的索引，并且索引的 1d 数组存储在k结果列表的第 th 个元素中neighbslist。然后检查每个原子的这些数组的长度是微不足道的，为您提供“附近的邻居数”列表。请注意，我们可以将输出np.where转换为列表 comp 中的 alist以完全保留 numpy，但随后我们将不得不在下一行中使用len(neighbs)而不是使用。neighbs.size

所以，你有两个关键变量，准确地说是两个列表；nearnum[k]是 atom 的“近”邻居数k（其中kin range(allpos.shape[0])，并且neighbslist[k]是列出 atom 的近索引的一维 numpy 数组k，因此neighbslist[k][j](for jin range(nearnum[k])) 是 inrange(allpos.shape[0])不等于的数字k。想一想，这个列表-arrays 构造可能有点难看，因此您可能应该在构造期间将此对象转换为适当的列表列表（即使这意味着一些开销）。

我最后才注意到您的输入数据在文件中。不用担心，也可以使用 numpy 轻松读取！假设这些空行不在您的输入名称test中，您可以调用

allpos = np.loadtxt('test',usecols=(1,2,3))

将位置矩阵读入您的变量。该usecols选项允许numpy忽略数据的第一列，它不是数字，并且会导致问题。反正我们真的不需要那个。

python - 将组合的索引与值相关联

1 回答 1

Related

Reference