-1

我在 csv 文件中有一个包含 2 列(“IdNo”、“skillsList”)的文件。当我阅读文件时。它将整个文件作为字符串读取。IdNo 有一个序列号,而 SkillList 有一个用户指定技能的列表。所以我想找出技能的词频。

但我的问题是如何将数据转换为可访问的形式。我的数据看起来像这样。

>>> a1

'IdNo, skillsList\nBAXA0000206_NEENA_TIWARI_0.htm,"[u\'Training\', u\'E-Learning\', u\'PowerPoint\', u\'Teaching\', u\'Accounting\', u\'Team Management\', u\'Team Building\', u\'Microsoft Excel\', u\'Microsoft Office\', u\'Financial Accounting\', u\'Microsoft Word\', u\'Customer Service\']"\nBAXA0000227_ABDUR_RAZZAQUE_0.htm,"[u\'Telecommunications\', u\'Data Center\', u\'ISO 27001\', u\'Management\', u\'BS25999\', u\'Technology\', u\'Information Technology...\', u\'Certified PMP\\xae\', u\'Certified BS25999 Lead...\']"\nBAXA0000261_Priya _ Lobo_0.htm,"[u\'Market Research\', u\'Segmentation\', u\'Marketing Strategy\', u\'Consumer Behavior\', u\'Experience Working with...\']"

需要帮忙。谢谢

4

1 回答 1

0

这是我以字符串形式处理数据的一般例程。它可能不太适合您的情况(您的字符串有很多符号),但看看不会有坏处,对吧?

split()函数将字符串拆分为字符串列表,例如:

>>> a1 = 'id1, skill1\nid2, skill2\nid3, skill3'
>>> a2 = a1.split('\n')
>>> a2
>>> ['id1, skill1', 'id2, skill2', 'id3, skill3']

在这种情况下,a2显示行列表。进一步分离两列:

>>> a3 = [row.split(', ') for row in a2]
>>> a3
>>> [['id1', 'skill1'], ['id2', 'skill2'], ['id3', 'skill3']]
>>> for row in a3:
...     for col in row:
...             print col,
...     print ''
...
id1 skill1
id2 skill2
id3 skill3

要访问每列中的所有元素,请使用以下zip()函数:

>>> a4 = zip(*a3)
>>> a4
>>> [('id1', 'id2', 'id3'), ('skill1', 'skill2', 'skill3')]
>>> for col in a4:
...     for row in col:
...             print row,
...     print ''
...
id1 id2 id3
skill1 skill2 skill3 
于 2013-08-23T15:44:54.640 回答