python - Python解析文本文件和逻辑方法

Question

我对python逻辑有点坚持。
我想要一些关于如何解决我在 python 中遇到的问题以及解析数据的方法的建议。

我花了一些时间阅读 python 参考文档并浏览了这个站点，我知道有几种方法可以做我想要实现的目标，这就是我走的路。
我正在使用从某些卫星硬件生成的数据重新格式化一些文本文件，以将其上传到 MySQL 数据库中。

这是原始数据

TP N: 1   
Frequency: 12288.635 Mhz   
Symbol rate: 3000 KS  
Polarization: Vertical  
Spectrum: Inverted  
Standard/Modulation: DVB-S2/QPSK  
FEC: 1/2  
RollOff: 0.20  
Pilot: on  
Coding mode: ACM/VCM  
Short frame  
Transport stream
Single input stream  
RF-Level: -49 dBm  
Signal/Noise: 6.3 dB  
Carrier width: 3.600 Mhz  
BitRate: 2.967 Mbit/s

TP N对卫星上的每个转发器重复上述部分
我正在使用这个脚本来提取我需要的数据

strings = ("Frequency", "Symbol", "Polar", "Mod", "FEC", "RF", "Signal", "Carrier", "BitRate")  
sat_raw = open('/BLScan/reports/1520.txt', 'r') 
sat_out = open('1520out.txt', 'w') 
for line in sat_raw: 
    if any(s in line for s in strings): 
        for word in line.split(): 
            if ':' in word:
                sat_out.write(line.split(':')[-1])
sat_raw.close()
sat_out.close()

然后输出数据在发送到数据库之前像这样格式化

12288.635 Mhz
 3000 KS
 Vertical
 DVB-S2/QPSK
 1/2
 -49 dBm
 6.3 dB  
 3.600 Mhz
 2.967 Mbit/s

该脚本运行良好，但对于我想在 MySQL 上实现的某些功能，我需要进一步编辑它。

删除第一个“频率”行上的小数点和后面的 3 个数字和 MHz。
删除所有尾随测量参考KS, dBm, dB, Mhz, Mbit。
将 9 个字段加入逗号分隔的字符串中，这样每个转发器（每个文件大约 30 个）都在自己的行上

我不确定天气是否会继续沿着这条路径添加到这个现有的脚本（我被困在输出文件的写入点）。或者重新考虑我处理原始文件的方式。

score 1 · Accepted Answer

import math

strings = ("Frequency", "Symbol", "Polar", "Mod", "FEC", "RF", "Signal", "Carrier", "BitRate")  

files=['/BLScan/reports/1520.txt']
sat_out = open('1520out.txt', 'w') 
combineOutput=[]
for myfile in files:
    sat_raw = open(myfile, 'r') 
    singleOutput=[]
    for line in sat_raw: 
        if any(s in line for s in strings):
            marker=line.split(':')[1]
            try:
                data=str(int(math.floor(float(marker.split()[0]))))
            except:
                data=marker.split()[0]
            singleOutput.append(data)
    combineOutput.append(",".join(singleOutput))    

for rec in combineOutput:
    sat_out.write("%s\n"%rec)
sat_raw.close()
sat_out.close()

在列表中添加要解析的所有文件files。它将每个文件的输出写为单独的行，每个字段用逗号分隔。

score 1 · Accepted Answer

我的解决方案很粗糙，可能不适用于极端情况，但这是一个好的开始。

import re
import csv

strings = ("Frequency", "Symbol", "Polar", "Mod", "FEC", "RF", "Signal", "Carrier", "BitRate")  
sat_raw = open('/BLScan/reports/1520.txt', 'r') 
sat_out = open('1520out.txt', 'w')
csv_writer = csv.writer(sat_out)
csv_output = []
for line in sat_raw:
    if any(s in line for s in strings): 
        try:
            m = re.match(r'^.*:\s+(\S+)', line)
            value = m.groups()[0]
            # Attempt to convert to int, thus removing the decimal part
            value = int(float(value))
        except ValueError:
            pass # Ignore conversion
        except AttributeError:
            pass # Ignore case when m is None (no match)
        csv_output.append(value)
    elif line.startswith('TP N'):
        # Before we start a new set of values, write out the old set
        if csv_output:
            csv_writer.writerow(csv_output)
            csv_output=[]

# If we reach the end of the file, don't miss the last set of values
if csv_output:
    csv_writer.writerow(csv_output)

sat_raw.close()
sat_out.close()

讨论

csv 包有助于 CSV 输出
re（正则表达式）模块有助于解析行并从行中提取值。
在读取的行中value = int(...)，我们尝试将字符串值转换为整数，从而删除点和后面的数字。
当代码遇到以“TP N”开头的行时，这表示一组新值。我们将旧值集写入 CSV 文件。

python - Python解析文本文件和逻辑方法

2 回答 2

讨论

Related

Reference