python - 如何在python中获得化合物的分子量？

Question

用户输入一个公式，例如：C12H2COOH

我们必须计算它的分子量，因为 C = 12.01，H = 1.008 和 O = 16。我们被告知要小心后面有两位数的元素和后面没有数字的元素。该程序还不断询问化学式，并在您按 Enter 时退出。

我尝试过使用字典、for 循环和 while 循环。我已经在元素之后用个位数计算化合物，C2H2但如果我在元素旁边放两位数或没有数字，它会失败。我还在研究如何在不删除分隔符的情况下分隔字符串作为可能的路线？你们将如何解决这个问题？任何帮助将不胜感激，谢谢！

这是我到目前为止所拥有的。这很乱。

xxx = ["H", "C", "O"]
elements = set(xxx)
while(True):
    chemical_formula = input("Enter chemical formula, or enter to quit: ")
    if chemical_formula == "":
        break
    else:
        characters = list(chemical_formula)
        n = 0
        print(characters)
        for i in characters:
            if characters[n] == "C":
                c = 12.0107
                if elements.intersection(set(characters[n+1])):
                    print(c)
                else:
                    number = int(characters[n+1])
                    print(number*c)

            elif characters[n] == "H":
                h = 1.00794
                if elements.intersection(set(characters[n+1])):
                    print(h)
                else:
                    number = int(characters[n+1])
                    print(number*h)

            elif characters[n] == "O":
                o = 15.9994
                if elements.intersection(set(characters[n+1])):
                    print(c)
                else:
                    number = int(characters[n+1])
                    print(number*o) 
            else:
                numero = int(i)
                print(i*0)

            n = n+1

score 6 · Accepted Answer

我要做的第一件事是将输入字符串中出现的每个字母替换为前面带有“+”的相同字母，所以

C12H2COOH => +C12+H2+C+O+O+H

接下来，我将用相同的字母后跟一个“*”然后是数字替换每个出现的字母后跟一个数字

+C12+H2+C+O+O+H => +C*12+H*2+C+O+O+H

然后我会用它所代表的元素的分子量替换每个出现的字母

+C*12+H*2+C+O+O+H => +12.0107*12+1.00794*2+12.0107+15.9994+15.9994+1.00794

最后我会评估那个表达式。我可以想到 2 或 3 种方法来执行这些修改，因为这是你的功课，如果它对你有吸引力，我会让你选择如何实施这种方法。但请注意，正则表达式的字符串操作以及 eval 的邪恶并不是唯一的实现选项。

然后我会开始研究如何处理缩写超过一个字母的元素。

score 6 · Accepted Answer

编辑： 更新 GitHub Gist

整个夏天我都在上 12 年级的化学课程，我也想这样做。我想到了一种不同的方法，这里是第 1 版（'ZERO' 只是一个占位符，我只是没有用 '' 测试）我检查了 C12H2COOH，它给出了正确的答案（191.16 g/mol）。希望这可以帮助某人：

__version__ = '1.2.1'
"""
=================================
Molar Mass Calculator
Author: Elijah Lopez
Version: 1.2.1
Last Updated: April 4th 2020
Created: July 8th 2017
Python Version: 3.6+
=================================
"""
MM_of_Elements = {'H': 1.00794, 'He': 4.002602, 'Li': 6.941, 'Be': 9.012182, 'B': 10.811, 'C': 12.0107, 'N': 14.0067,
                  'O': 15.9994, 'F': 18.9984032, 'Ne': 20.1797, 'Na': 22.98976928, 'Mg': 24.305, 'Al': 26.9815386,
                  'Si': 28.0855, 'P': 30.973762, 'S': 32.065, 'Cl': 35.453, 'Ar': 39.948, 'K': 39.0983, 'Ca': 40.078,
                  'Sc': 44.955912, 'Ti': 47.867, 'V': 50.9415, 'Cr': 51.9961, 'Mn': 54.938045,
                  'Fe': 55.845, 'Co': 58.933195, 'Ni': 58.6934, 'Cu': 63.546, 'Zn': 65.409, 'Ga': 69.723, 'Ge': 72.64,
                  'As': 74.9216, 'Se': 78.96, 'Br': 79.904, 'Kr': 83.798, 'Rb': 85.4678, 'Sr': 87.62, 'Y': 88.90585,
                  'Zr': 91.224, 'Nb': 92.90638, 'Mo': 95.94, 'Tc': 98.9063, 'Ru': 101.07, 'Rh': 102.9055, 'Pd': 106.42,
                  'Ag': 107.8682, 'Cd': 112.411, 'In': 114.818, 'Sn': 118.71, 'Sb': 121.760, 'Te': 127.6,
                  'I': 126.90447, 'Xe': 131.293, 'Cs': 132.9054519, 'Ba': 137.327, 'La': 138.90547, 'Ce': 140.116,
                  'Pr': 140.90465, 'Nd': 144.242, 'Pm': 146.9151, 'Sm': 150.36, 'Eu': 151.964, 'Gd': 157.25,
                  'Tb': 158.92535, 'Dy': 162.5, 'Ho': 164.93032, 'Er': 167.259, 'Tm': 168.93421, 'Yb': 173.04,
                  'Lu': 174.967, 'Hf': 178.49, 'Ta': 180.9479, 'W': 183.84, 'Re': 186.207, 'Os': 190.23, 'Ir': 192.217,
                  'Pt': 195.084, 'Au': 196.966569, 'Hg': 200.59, 'Tl': 204.3833, 'Pb': 207.2, 'Bi': 208.9804,
                  'Po': 208.9824, 'At': 209.9871, 'Rn': 222.0176, 'Fr': 223.0197, 'Ra': 226.0254, 'Ac': 227.0278,
                  'Th': 232.03806, 'Pa': 231.03588, 'U': 238.02891, 'Np': 237.0482, 'Pu': 244.0642, 'Am': 243.0614,
                  'Cm': 247.0703, 'Bk': 247.0703, 'Cf': 251.0796, 'Es': 252.0829, 'Fm': 257.0951, 'Md': 258.0951,
                  'No': 259.1009, 'Lr': 262, 'Rf': 267, 'Db': 268, 'Sg': 271, 'Bh': 270, 'Hs': 269, 'Mt': 278,
                  'Ds': 281, 'Rg': 281, 'Cn': 285, 'Nh': 284, 'Fl': 289, 'Mc': 289, 'Lv': 292, 'Ts': 294, 'Og': 294,
                  '': 0}


def molar_mass(compound: str, decimal_places=None) -> float:
    is_polyatomic = end = multiply = False
    polyatomic_mass, m_m, multiplier = 0, 0, 1
    element = ''

    for e in compound:
        if is_polyatomic:
            if end:
                is_polyatomic = False
                m_m += int(e) * polyatomic_mass if e.isdigit() else polyatomic_mass + MM_of_Elements[e]
            elif e.isdigit():
                multiplier = int(str(multiplier) + e) if multiply else int(e)
                multiply = True
            elif e.islower():
                element += e
            elif e.isupper():
                polyatomic_mass += multiplier * MM_of_Elements[element]
                element, multiplier, multiply = e, 1, False
            elif e == ')':
                polyatomic_mass += multiplier * MM_of_Elements[element]
                element, multiplier = '', 1
                end, multiply = True, False
        elif e == '(':
            m_m += multiplier * MM_of_Elements[element]
            element, multiplier = '', 1
            is_polyatomic, multiply = True, False
        elif e.isdigit():
            multiplier = int(str(multiplier) + e) if multiply else int(e)
            multiply = True
        elif e.islower():
            element += e
        elif e.isupper():
            m_m += multiplier * MM_of_Elements[element]
            element, multiplier, multiply = e, 1, False
    m_m += multiplier * MM_of_Elements[element]
    if decimal_places is not None:
        return round(m_m, decimal_places)
    return m_m

score 3 · Accepted Answer

您的代码是一团糟，例如，您不必要地将输入字符串转换为列表，然后对其进行迭代，但仍使用数字索引来访问字符。此外，动态地单独查看每个字符也没有多大用处，因为这显然会破坏多于一位的数字。此外，您单独输出每个遇到的元素的权重 - 您不应该输出总和吗？

以下代码使用小型状态机来解析输入字符串并输出组合权重。它假定每个公式都以一个元素开头，所有遇到的元素都包含在weights字典中，并且没有元素名称长于单个字符：

#use a dictionary to map elements to their weights
weights = {"H": 1.00794, "C": 12.0107, "O": 15.9994}

def getInt(clist):
    """helper for parsing a list of chars as an int (returns 1 for empty list)"""
    if not clist: return 1
    return int(''.join(clist))

def getWeight(formula):
    """ get the combined weight of the formula in the input string """
    formula = list(formula)
    #initialize the weight to zero, and a list as a buffer for numbers
    weight = 0
    num_buffer = []
    #get the first element weight
    el_weight = weights[formula.pop(0)]
    while formula:
        next = formula.pop(0)
        if next in weights:
            #next character is an element, add current element weight to total
            weight += el_weight * getInt(num_buffer)
            #get the new elements weight
            el_weight = weights[element]
            #clear the number buffer
            num_buffer = []
        else:
            #next character is not an element -> it is a number, append to buffer
            num_buffer.append(next)
    #add the last element's weight and return the value
    return weight + el_weight * getInt(num_buffer)

while 1:
    #main loop
    chemical_formula = input("Enter chemical formula, or enter to quit: ")
    if not chemical_formula:
        break
    print("Combined weight is %s" % getWeight(chemical_formula))

这可以很容易地扩展为处理多字符元素，方法是更改while循环中的条件，getWeight如果字符是数字，则将字符附加到 int 缓冲区，否则将其附加到包含当前元素名称的字符串；然后获取权重并将名称重置为''如果名称包含在weights字典中。

score 3 · Accepted Answer

第一的：

pip install molmass

然后：

from molmass import Formula

Formula('H2O').isotope.mass
>> 18.01056468403   #  monoisotopic mass

Formula('H2O').mass  
>> 18.015287        # molecular mass

score 1 · Accepted Answer

这是一个使用正则表达式解析公式的分子量 python 脚本。

包括一些调试代码

import re

#some element data

elements ={}
elements["H"] = 1
elements["C"] = 12
elements["O"] = 16
elements["Cl"] = 35.45


#DDT (1,1,1-trichloro-2,2-di(4-chlorophenyl)ethane)
formula = "(ClC6H4)2CH(CCl3))"

sFormula = formula

print("Original Formula: ", sFormula)

#Search data inside ()

myRegEx = re.compile(r"(\()(\w*)(\))(\d*)",re.I)

myMatches = myRegEx.findall(sFormula)

while myMatches:
    myMatches = myRegEx.findall(sFormula)
    for match in myMatches:
        print (match[1], match[3])
        count = match[3]
        text =""
        if (count == ""):
            count = 1
        else:
            count = int(match[3])
        while (count >= 1):
            text = text + match[1]
            count -= 1
            print(text)
        sFormula = sFormula.replace('(' + match[1] + ')' + match[3], text)
        print("Replaced formula: ",sFormula)

myRegEx = re.compile("(C[laroudsemf]?|Os?|N[eaibdpos]?|S[icernbmg]?|P[drmtboau]?|H[eofgas]?|A[lrsgutcm]|B[eraik]?|Dy|E[urs]|F[erm]?|G[aed]|I[nr]?|Kr?|L[iaur]|M[gnodt]|R[buhenaf]|T[icebmalh]|U|V|W|Xe|Yb?|Z[nr])(\d*)")

myMatches = myRegEx.findall(sFormula)

molecularFormula =""
MW = 0
text =""

for match in myMatches:
    #Search symbol
    symbol = match[0]
    #Search numbers
    number = match[1]
    print(symbol,number)
    if (number == ""):
        number = 1
    else:
        number = int(match[1])
    MW = MW + float(elements[symbol])*number
    while (number >=1):
        molecularFormula = molecularFormula + symbol
        number -= 1 
print(molecularFormula)
print("formula: " + formula + " MW = " + str(MW))

score 0 · Accepted Answer

我有类似的要求，并为此创建了纯 python 代码。它支持任意组合的括号以及最多 2 位的元素数。

git clone https://github.com/stardustcafe/molecularstats

示例代码

from molstats.molstats import Molecule
f1=Molecule('CH3CH4')
f1.getMolecularWeight()
31.07698
f1.getNumElements()
9

python - 如何在python中获得化合物的分子量？

6 回答 6

Related

Reference