通常的做法是将字符存储在一个字符串中,但是因为在编写文本时,很多时候用户在文本中间删除或添加字符,也许最好使用std::list<char>
包含字符,然后添加字符在列表中间不是昂贵的操作。
4 回答
First word-processing do much more than string manipulation. You will need a rich-text data structure. If you need pagination you will also need some meta-data like page setup. Do some research on Word, you will have answer.
For the rich-text part, your data structure have to save two things: characters and attributes. In other words, you have to have some kind of markup language. HTML/DOM is a choice. But in most of time it's a overkill because of complexity.
There are many data structure can handle the character part: Rope, Gap Buffer, and Piece Table. But none of them provide attribute support directly. You have to build it by you self.
AbiWord using list based Piece Table before, but now using tree based Piece Table now. Go to the Wiki page of AbiWord you will find more.
OpenOffice use a different way. Basically, it keeps a list of paragraph, and inside the paragraph it keep a string (or other more effective data structure) and list of attributes. I prefer this way because Paragraph is a naturally small enough unit to edit, it's much easier than tree based piece table.
以下论文总结了文字处理器中使用的数据结构:http ://www.cs.unm.edu/~crowley/papers/sds.pdf
文本序列的数据结构。查尔斯克劳利,新墨西哥大学,1998 年
用于维护字符序列的数据结构是文本编辑器的重要组成部分。本文研究并评估了文本序列可能的数据结构范围。检查文本编辑器的文本序列组件的 ADT 接口。检查了六种常见的序列数据结构(数组、间隙、列表、行指针、固定大小的缓冲区和片段表),然后提出了包含所有六种结构的序列数据结构的通用模型。详细解释了计件表法,并介绍了它的优点。检查了序列数据结构的设计空间,并提出了上面列出的几种变体。这些序列数据结构经过实验比较,并根据许多标准进行评估。实验比较是通过在编辑模拟器中实现每个数据结构并使用数千个编辑的合成负载对其进行测试来完成的。我们还报告了结果对用于生成合成编辑负载的参数变化的敏感性的实验。
SGI STL 有一个 Rope 类,你可能想看看: http ://www.sgi.com/tech/stl/Rope.html
使用std::list<char>
每个字符需要比使用多九倍的存储空间std::string
。这可能不是一个好的权衡。我的第一个倾向是使用 a std::vector<std::string>
,其中每个string
对象都包含一个段落的文本。段落中的插入和删除将足够快。