0

我有一个 LibreOffice writer 文档,其中包含表单的文本片段prefix<...>。在 writer 中,我可以通过搜索正则表达式轻松找到它们:

在此处输入图像描述

现在,我想在 LibreOffice 外部的独立 python 脚本中使用 pyuno 制作所有这些事件的 python 列表。

我从各种来源收集的代码看起来像这样,到目前为止似乎可以工作:

import uno, os, time

SOCKET = 'socket,host=localhost,port=2002;urp;'
file = '/home/jochen/Dokumente/regexp_find_test.odt'
office_proc = os.popen('/usr/lib/libreoffice/program/soffice ' + file + ' --accept="' + SOCKET + 'StarOffice.ServiceManager"')
time.sleep(3)

localContext = uno.getComponentContext()
resolver = localContext.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", localContext)

try:
    context = resolver.resolve('uno:' + SOCKET + 'StarOffice.ComponentContext')
except:
    raise Exception("failed to connect to LibreOffice.org with socket {}".format(SOCKET))
loffice_desktop = context.ServiceManager.createInstanceWithContext("com.sun.star.frame.Desktop", context)
comp = loffice_desktop.getCurrentComponent()
search_descr = comp.createSearchDescriptor()
search_descr.SearchRegularExpression = True
search_descr.setSearchString('prefix<[a-z_]+>')
res = comp.findAll(search_descr)
print(len(res))
for n in range(len(res)):
    print(40*'-')
    print(res[n].Text.getText().getString())

我得到的输出让我感到惊讶,因为我使用了与 writer 相同的表达式:

12
----------------------------------------
prefix<vorname> prefix<name>
prefix<ort> prefix<strasse> prefix<haus_nummer>

Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua. prefix<name> Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in prefix<ort> culpa qui officia deserunt mollit anim id est laborum.

Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis prefix<vorname> dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.

Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo prefix<vorname> consequat. Duis autem vel prefix<name> eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.

prefix<name> prefix<unterschrift>
----------------------------------------
prefix<vorname> prefix<name>
prefix<ort> prefix<strasse> prefix<haus_nummer>

我期待一些不错的东西

12
----------------------------------------
prefix<vorname>
----------------------------------------
prefix<name>
----------------------------------------
prefix<ort>
[...]

显然这个表达式表现得非常贪婪,有什么建议可以克服这个问题,还是我做错了什么?

4

1 回答 1

0

不是贪心,而是对搜索结果的简单错误处理。

线

print(res[n].Text.getText().getString())

必须改为

print(res[n].String

于 2021-04-15T11:13:43.340 回答