1

我正在尝试使用 RDKit 枚举大型化合物库并将结果输出为 CSV 文件中的单列 SMILES 字符串。我能够成功使用以下代码:

import os
os.chdir('xxx')
from rdkit import Chem
from rdkit.Chem import rdChemReactions
from rdkit.Chem import AllChem
rxn = rdChemReactions.ReactionFromSmarts('xxx')
rct1 = Chem.SDMolSupplier('reactants_1.sdf')
rct2 = Chem.SDMolSupplier('reactants_2.sdf')
prods = AllChem.EnumerateLibraryFromReaction(rxn,[rct1,rct2])
prods2 = [Chem.MolToSmiles(x[0]) for x in list(prods)]
import csv
with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    for item in prods2:
        writer.writerow([item])

但是,内存使用率非常高。为了减少内存使用量,我尝试执行迭代枚举,在“reactants_1”时取一个分子,将其与“reactants_2”中的所有分子反应,将生成的化合物写入 CSV 文件,然后迭代:

import os
import csv
os.chdir('xxx')
from rdkit import Chem
from rdkit.Chem import rdChemReactions
from rdkit.Chem import AllChem
rxn = rdChemReactions.ReactionFromSmarts('xxx')
rct1 = Chem.SDMolSupplier('reactants_1.sdf')
rct2 = Chem.SDMolSupplier('reactants_2.sdf')
with open('output.csv', 'w', newline='') as f:
    for compound in rct1:
        prods = AllChem.EnumerateLibraryFromReaction(rxn,[compound,rct2])
        prods2 = [Chem.MolToSmiles(x[0]) for x in list(prods)]
        writer = csv.writer(f)
        for item in prods2:
            writer.writerow([item])

但是,在这种情况下,“prods2 = [Chem.MolToSmiles(x[0]) for x in list(prods)]”这一行出现以下错误:“TypeError: 'Mol' object is not iterable”。在第一个实例中,我能够毫无问题地迭代“Mol”对象。关于如何解决这个问题的任何想法,或者在枚举大型复合集时有任何其他方法可以显着降低 RAM 使用率?

4

1 回答 1

1

EnumerateLibraryFromReaction期望一个list.

所以这应该工作:

import os
import csv
os.chdir('xxx')
from rdkit import Chem
from rdkit.Chem import rdChemReactions
from rdkit.Chem import AllChem
rxn = rdChemReactions.ReactionFromSmarts('xxx')
rct1 = Chem.SDMolSupplier('reactants_1.sdf')
rct2 = Chem.SDMolSupplier('reactants_2.sdf')
with open('output.csv', 'w', newline='') as f:
    for compound in rct1:
        compound = [compound] # put the mol into a list
        prods = AllChem.EnumerateLibraryFromReaction(rxn,[compound,rct2])
        prods2 = [Chem.MolToSmiles(x[0]) for x in list(prods)]
        writer = csv.writer(f)
        for item in prods2:
            writer.writerow([item])
于 2022-01-29T19:07:46.237 回答