我希望使用 PDFregex 从 PDF 文档中提取索引,作为数据库中的参考条目。索引(正如预期的那样)都以单词 index 开头,但以 (moslty0 双回车符结尾。我可以使用什么正则表达式?
问问题
31 次
1 回答
0
尝试这个:
Index(.|\s)*?(?=(?:\n\r|\n|\r){2})
# Index --> Find `Index`
# (.|\s)*? --> Followed by any string including linefeeds (? to make it not greedy)
# (?=(?:\n\r|\n|\r){2}) --> Stop at (the first) double linefeed:
# (?=) --> Positive lookahead: Matches if the previous is followed by it's contents.
# (?:) --> Non-capturing group.
# \n\r|\n|\r --> Linefeeds: Windows or LF or CR
# {2} --> Exactly 2 of the previous.
确保在代码中指定点.
与换行符匹配。
于 2013-01-11T14:22:41.777 回答