python - Python：如何从文本文件中提取字符串以用作数据

Question

这是我第一次编写 python 脚本，我在开始时遇到了一些麻烦。假设我有一个名为 Test.txt 的 txt 文件，其中包含此信息。

                                   x          y          z      Type of atom
ATOM   1     C1  GLN D  10      26.395      3.904      4.923    C
ATOM   2     O1  GLN D  10      26.431      2.638      5.002    O
ATOM   3     O2  GLN D  10      26.085      4.471      3.796    O 
ATOM   4     C2  GLN D  10      26.642      4.743      6.148    C

我想要做的是最终编写一个脚本来找到这三个原子的质心。所以基本上我想总结那个txt文件中的所有x值，每个数字乘以一个给定的值，具体取决于原子的类型。

我知道我需要为每个 x 值定义位置，但是我无法弄清楚如何将这些 x 值表示为数字而不是字符串中的 txt。我必须记住，我需要将这些数字乘以原子类型，所以我需要一种方法来为每种原子类型定义它们。谁能把我推向正确的方向？

score 1 · Accepted Answer

mass_dictionary = {'C':12.0107,
                   'O':15.999
                   #Others...?
                  }

# If your files are this structured, you can just
# hardcode some column assumptions.
coords_idxs = [6,7,8]
type_idx = 9

# Open file, get lines, close file.
# Probably prudent to add try-except here for bad file names.
f_open = open("Test.txt",'r')
lines = f_open.readlines()
f_open.close()

# Initialize an array to hold needed intermediate data.
output_coms = []; total_mass = 0.0;

# Loop through the lines of the file.
for line in lines:

    # Split the line on white space.
    line_stuff = line.split()

    # If the line is empty or fails to start with 'ATOM', skip it.
    if (not line_stuff) or (not line_stuff[0]=='ATOM'):
        pass

    # Otherwise, append the mass-weighted coordinates to a list and increment total mass.
    else:
        output_coms.append([mass_dictionary[line_stuff[type_idx]]*float(line_stuff[i]) for i in coords_idxs])
        total_mass = total_mass + mass_dictionary[line_stuff[type_idx]]

# After getting all the data, finish off the averages.
avg_x, avg_y, avg_z = tuple(map( lambda x: (1.0/total_mass)*sum(x), [[elem[i] for elem in output_coms] for i in [0,1,2]]))


# A lot of this will be better with NumPy arrays if you'll be using this often or on
# larger files. Python Pandas might be an even better option if you want to just
# store the file data and play with it in Python.

score 0 · Accepted Answer

如果您已pandas安装，请检查read_fwf导入固定宽度文件并创建 DataFrame（二维表格数据结构）的功能。如果您想进行任何额外的数据操作，它将在导入时为您节省几行代码，并为您提供大量数据处理功能。

score 0 · Accepted Answer

基本上使用 python 中的open函数，您可以打开任何文件。因此，您可以执行以下操作： --- 以下代码段不是解决整个问题的方法，而是一种方法。

def read_file():
    f = open("filename", 'r')
    for line in f:
        line_list = line.split()
        ....
        ....
    f.close()

从这一点开始，您可以很好地设置您可以使用这些值做什么。基本上第二行只是打开文件进行阅读。第三行定义了一个 for 循环，每次读取一行文件，每一行都进入line变量。

该片段中的最后一行基本上将字符串 - 在每个空白处 - 分解为一个列表。所以 line_list[0] 将是您第一列的值，依此类推。从这一点开始，如果您有任何编程经验，您可以使用 if 语句等来获得您想要的逻辑。

** 还要记住，存储在该列表中的值的类型都是字符串，所以如果你想执行任何算术运算，比如加法，你必须小心。

*为语法更正而编辑

python - Python：如何从文本文件中提取字符串以用作数据

3 回答 3

Related

Reference