不知道大家有没有找到自己喜欢的解决方案,不过我用了大约 10 分钟就写好了下面的代码。您应该能够立即运行它,如下所示:
python process_data.py --file=data.txt
输出
G",'32','0','1.960000.2%',"1E0 ||||| G", '32', '0','2.000000.2%', "1E0 ||||| A”、“48”、“47”、“195.840000.2%”、“7.6145E-27 G”、“32”、“0”、“24.000000.2%”、“1E0 ||||| G", '32', '0','6.000000.2%', "1E0 ||||| A”、“1”、“47”、“195.800000.2%”、“7.6145E-27 G”、“32”、“0”、“0.000000.2%”、“1E0 ||||| G", '32', '32','0.000000.2%', "1E0 ||||| A"、'1'、'47'、'19.840000.2%'、"7.6145E-27
我只是将代码打印出来,但您可以在遍历每一行时轻松使用 csv.writer 或 file.writeline。下面代码的好处在于,只要您知道单元格分隔符和所需值的位置,一行上有多少个单元格并不重要,您可以在一行上放置任意数量的单元格单行。
删除 % 有点小技巧,但是如果您知道数据格式,那么您应该是安全的。您还可以在转换为 float() 的过程中抛出异常块,并跟踪您运行时遇到的错误数。
#!/usr/bin/python
import os, sys, pdb, re
# NOTE: This has been deprecated you should use ArgumentParser
from optparse import OptionParser
__version__ = '$Id$'
DEFAULT_CELL_SEP = '|||||'
DEFAULT_CELL_COL_SEP = ','
parser = OptionParser(version=__version__, usage='Usage: %prog --fille=data_file [-- csep=cell_seperator --ccsep=cell_col_seperator]')
parser.add_option('-f','--file',dest='data_file',
help='File to process')
parser.add_option('--csep',dest='cell_sep', default=DEFAULT_CELL_SEP,
help='Cell Separator')
parser.add_option('--ccseip', dest='cell_col_sep', default=DEFAULT_CELL_COL_SEP,
help='Column Separator for each cell')
def process_col_four(cell, cell_col_sep=DEFAULT_CELL_COL_SEP, suffix='%'):
cols = cell.split(cell_col_sep)
# Do what you need to do...
# pdb.set_trace()
col_4 = cols[3].replace(' ','')
col_4 = float(col_4[1:-2])
col_4 = col_4 *2
cols[3] = "'%f.2%s'" % (col_4, suffix)
return ','.join(cols)
# return cols
def main(data_file, cell_sep=DEFAULT_CELL_SEP, cell_col_sep=DEFAULT_CELL_COL_SEP):
data_dir = os.path.dirname(data_file)
output_file, ext = os.path.splitext(os.path.basename(data_file))
output_file = output_file + '_recode' + ext
output_path = os.path.join(data_dir, output_file)
with open(data_file, 'r') as data_reader:
for line in data_reader:
cells = line.strip().split(cell_sep)
new_cells = map(process_col_four, cells)
# pdb.set_trace()
new_line = cell_sep.join([cell for cell in new_cells])
print new_line
if __name__ == '__main__':
(options, args) = parser.parse_args()
if not options.data_file:
parser.print_usage()
sys.exit(1)
main(options.data_file, options.cell_sep, options.cell_col_sep)