0

我是一个超级蟒蛇菜鸟。

我正在尝试确定名称列表的变音位代码。稍后将比较这些代码以找到潜在的相似名称。

jellyfish 模块很适合我的需求,我可以在创建列表的时候获取变音位码,如下:

import jellyfish
names = ['alexander','algoma','angel','antler']
for i in names:
        print(i, "metaphone value =", jellyfish.metaphone(i))

##OUTPUT: 
alexander metaphone value = ALKSNTR
algoma metaphone value = ALKM
angel metaphone value = ANJL
antler metaphone value = ANTLR

但是,我需要获取约 3000 个名称列表的变音位代码。我用我需要的列标题和现有的名称列表创建了一个 .csv。它看起来像这样:

RID *,ST_NAME,FirstWord,FirstWordMeta,StMeta
742,A F JOHNSON,A,,
1240,ABBEY,ABBEY,,
2133,ACES,ACES,,
362,ADAMS,ADAMS,,

因此,理想情况下,我需要 FirstWordMeta = 每行 FirstWord 列中单词的变音位代码,StMeta = 每行 ST_NAME 列中单词的变音位代码。我希望输出 .csv 看起来像这样:

RID *,ST_NAME,FirstWord,FirstWordMeta,StMeta
742,A F JOHNSON,A,A,A F JNSN
1240,ABBEY,ABBEY,SS,AB
2133,ACES,ACES,SS,SS
362,ADAMS,ADAMS,ATMS,ATMS

我已经尝试过 csv 模块,但我不明白在使用 jellyfish.metaphone() 时如何合并引用特定列

4

2 回答 2

0

您可以使用熊猫模块:

import pandas as pd
import jellyfish

data = pd.read_csv("test.csv")  # Your filename here

# Looping over the rows and calculating the metaphone
for i in range(data.shape[0]):
    data["FirstWordMeta"][i] = jellyfish.metaphone(data["FirstWord"][i])
    data["StMeta"][i] = jellyfish.metaphone(data["ST_NAME"][i])

# Save to csv
data.to_csv("result.csv")
于 2019-08-05T21:33:48.870 回答
0

你可以试试这个:

import csv
import jellyfish

with open('input.csv') as inputfile:
    reader = csv.reader(inputfile)
    headers = next(reader)
    inputdata = list(reader)

with open('output.csv', 'w') as outputfile:
    writer = csv.writer(outputfile)
    writer.writerow(headers)

    for row in inputdata:
        outputrow = row[:3] + [
            jellyfish.metaphone(row[2]),
            jellyfish.metaphone(row[1])
        ]    
        writer.writerow(outputrow)
于 2019-08-05T21:36:26.543 回答