6

The diff format is more or less the de facto standard for representing differences between texts and is widely used by programmers to distribute source code changes. Most version control systems can output diffs, and diffs are used to discuss proposed changes to text (e.g. source code) since they are very powerful in illustrating the changes.

However, I would often like to simply comment on a text without changing it, and would like a data format which can represent annotations to text in a way that is as powerful as diff is for changes. A typical use case would be a code review where I want to comment on the code but not (yet) propose any changes. Another use case would be to annotate an article with my own thoughts and reminders. In Word, I can annotate text by marking it and creating a comment balloon beside the text. But Word is cumbersome in other ways – I would like to have just the annotations in a separate file and keep the originals as they are.

What data formats exist that can represent annotations to text in a way that is as exact as a diff is for changes?

I'm not looking for general answers like "XML". I'm looking for formats that explicitly represent annotations to text. (Perhaps no such format exists except the application-specific formats of certain programs like Word.)

4

1 回答 1

4

很好的问题。

大多数人会在讨论中抛出 XML 或 HTML 之类的子集。标记语言使用存储在原始文本中的(数据)属性。但这不是你要找的。我不包括 XML/HTML 和 RDF 和微格式。

一般来说

您需要保留原始文本,克隆它,然后通过自定义标记语言添加注释。这允许原始文本与注释文本的文本差异。重要的是分开存储原始文本和文本注释的修订。

这允许多个差异:

  • “原始文本”和“注释文本修订版 1..n”之间的差异
  • “带注释的文本 rev n”和“带注释的文本 rev n+1”之间的差异。

这是相当强大的。

存在哪些数据格式?

在对峙表示中,文档的文本与注释分开,注释通过字符偏移连接到特定的文本范围。注释通过它们的基本名称(不带后缀的文件名)相同的文件命名约定与其文本相关联:例如,文件 PMID-1000.a1 包含文件 PMID-1000.txt 的注释。

如您所见,它是“基于文件名的注释关联”。学术研究和改进的空间很大,)

于 2014-10-05T16:05:48.317 回答