python - Parsing a lisp file with Python

Question

I have the following lisp file, which is from the UCI machine learning database. I would like to convert it into a flat text file using python. A typical line looks like this:

(1 ((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0)))

I would like to parse this into a text file like:

time pitch duration keysig timesig fermata
8    67    4        1      12      0
12   67    8        1      12      0

Is there a python module to intelligently parse this? This is my first time seeing lisp.

score 23 · Accepted Answer

如this answer所示，pyparsing似乎是正确的工具：

inputdata = '(1 ((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0)))'

from pyparsing import OneOrMore, nestedExpr

data = OneOrMore(nestedExpr()).parseString(inputdata)
print data

# [['1', [['st', '8'], ['pitch', '67'], ['dur', '4'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']], [['st', '12'], ['pitch', '67'], ['dur', '8'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']]]]

为了完整起见，这是格式化结果的方法（使用texttable）：

from texttable import Texttable

tab = Texttable()
for row in data.asList()[0][1:]:
    row = dict(row)
    tab.header(row.keys())
    tab.add_row(row.values())
print tab.draw()

+---------+--------+----+--------+-----+---------+
| 时代| 密钥签名 | 圣 | 间距 | 杜尔 | fermata |
+=========+========+====+=======+=====+=========+
| 12 | 1 | 8 | 67 | 4 | 0 |
+---------+--------+----+--------+-----+---------+
| 12 | 1 | 12 | 67 | 8 | 0 |
+---------+--------+----+--------+-----+---------+

将该数据转换回 lisp 表示法：

def lisp(x):
    return '(%s)' % ' '.join(lisp(y) for y in x) if isinstance(x, list) else x

d = lisp(d[0])

score 2 · Accepted Answer

如果您知道数据是正确的并且格式统一（乍一看似乎如此），并且如果您只需要这些数据并且不需要解决一般问题......那么为什么不只是替换每个非数字有一个空格，然后分裂？

import re
data = open("chorales.lisp").read().split("\n")
data = [re.sub("[^-0-9]+", " ", x) for x in data]
for L in data:
    L = map(int, L.split())
    i = 1  # first element is chorale number
    while i < len(L):
        st, pitch, dur, keysig, timesig, fermata = L[i:i+6]
        i += 6
        ... your processing goes here ...

score 1 · Accepted Answer

用正则表达式将其分成对：

In [1]: import re

In [2]: txt = '(((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0)))'

In [3]: [p.split() for p in re.findall('\w+\s+\d+', txt)]
Out[3]: [['st', '8'], ['pitch', '67'], ['dur', '4'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0'], ['st', '12'], ['pitch', '67'], ['dur', '8'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']]

然后将其制成字典：

dct = {}
for p in data:
    if not p[0] in dct.keys():
        dct[p[0]] = [p[1]]
    else:
        dct[p[0]].append(p[1])

结果：

In [10]: dct
Out[10]: {'timesig': ['12', '12'], 'keysig': ['1', '1'], 'st': ['8', '12'], 'pitch': ['67', '67'], 'dur': ['4', '8'], 'fermata': ['0', '0']}

印刷：

print 'time pitch duration keysig timesig fermata'
for t in range(len(dct['st'])):
    print dct['st'][t], dct['pitch'][t], dct['dur'][t], 
    print dct['keysig'][t], dct['timesig'][t], dct['fermata'][t]

正确的格式留给读者练习……

score 1 · Accepted Answer

由于数据已经在 Lisp 中，所以使用 lisp 本身：

(let ((input '(1 ((ST 8) (PITCH 67) (DUR 4) (KEYSIG 1) (TIMESIG 12) (FERMATA 0))
            ((ST 12) (PITCH 67) (DUR 8) (KEYSIG 1) (TIMESIG 12) (FERMATA 0)))))

       (let ((row-headers (mapcar 'car (second input)))
          (row-data (mapcar (lambda (row) (mapcar 'second row)) (cdr input))))

     (format t "~{~A~^ ~}~%" row-headers)
     (format t "~{~{~A~^ ~}~^ ~%~}" row-data)))

python - Parsing a lisp file with Python

4 回答 4

Related

Reference