我不完全确定您所说的“组合在一起”是什么意思,但您当然可以创建跨越每个“页面”内容的注释。假设您在每个“Page-1”、“Page-2”等上都有一个注释,那么您可以使用类似这样的东西来创建从一个到下一个的PageNumber
注释。PageNumber
我正在使用control = once
JAPE 来执行此操作,您可以等效地使用 Groovy 脚本或自定义 PR
Imports: { import static gate.Utils.*; }
Phase: PageSpans
Input: PageNumber
Options: control = once
Rule: PageSpan
({PageNumber})
-->
{
try {
List<Annotation> numbers = inDocumentOrder(inputAS.get("PageNumber"));
for(int i = 0; i < numbers.size(); i++) {
outputAS.add(start(numbers.get(i)), // from start of this PageNumber, to...
(i+1 < numbers.size()
? start(numbers.get(i+1)) // start of the next number, or...
: end(doc) // ...if no more PageNumbers then end of document
),
"Page",
// store the text under the PageNumber as a feature of Page
featureMap("id", stringFor(doc, numbers.get(i))));
}
} catch(InvalidOffsetException e) {
throw new JapeException("Invalid offset from existing annotation", e);
}
}
在您的评论中,您询问将每个“页面”下的所有注释移动到单独的注释集中。完成上述操作后,这将相对简单,并且如果您将页码作为Page
注释中的一个功能,就像我对“id”功能所做的那样。然后你可以定义另一个 JAPE 来做这样的事情:
Imports: { import static gate.Utils.*; }
Phase: SetPerPage
Input: Age X Y // and whatever other annotation types you want to copy
Options: control = all
Rule: MoveToPageSet
({Age}|{X}|{Y}):entity
-->
:entity {
try {
for(Annotation e : entityAnnots) {
// find the (only) Page annotation that covers this entity
Annotation thePage = getOnlyAnn(getCoveringAnnotations(inputAS, e, "Page"));
// get the corresponding annotation set
AnnotationSet pageSet = doc.getAnnotations(
(String)thePage.getFeatures().get("id"));
// and copy the annotation into it
pageSet.add(start(e), end(e), e.getType(), e.getFeatures());
}
} catch(InvalidOffsetException e) {
throw new JapeException("Invalid offset from existing annotation", e);
}
// optionally remove from input set
// inputAS.removeAll(entityAnnots);
}