unicode - ټ,ګ,ځ,څ,ڼ,ښ,ډ,ۍ,ړ,ې 在普什图语中的语境形式的 Unicode

Question

我正在开发一个程序，它给出正确的文本格式，例如如果我写سلام所以它给FEB3, FEE0, FE8E and FEE2女巫是سـ, ـلـ,ﺎ,ـم的 Unicode ，然后如果我写ټول字符 ټ 的 Unicode是，067C但是有不是字符ټـ的 Unicode，它是初始上下文形式。

所以我在维基百科中找到了 ټ,ګ,ځ,څ,ڼ,ښ,ډ,ۍ,ړ,ې 的Unicode，但我找不到上下文形式的Unicode。

例如ټـ ,ـټـ,ـټ的Unicode。

如果有人知道这个问题的解决方案，我正在等待回复。谢谢...

score 4 · Accepted Answer

Unicode 字符旨在抽象，因为它没有特定的表示形式。显示像阿拉伯语这样的草书脚本的首选方式是存储标准的、非上下文形式，并在显示时将它们转换为草书形式——也就是说，作为操作系统中文本显示系统的最后阶段之一，或者文字处理器。

草书形式通常作为字体中的字形提供，并使用字体文件中体现上下文规则的表格中的信息进行选择。

Unicode 存储了大量的阿拉伯语上下文形式，但只是为了与较旧的编码和传统的金属类型兼容，只能提供有限数量的物理字形。不幸的是，出于您的目的，这些上下文形式并未涵盖阿拉伯语以外的语言中使用的所有扩展字符，例如您给出的示例，即 U+067C ARABIC LETTER TEH WITH RING，用于普什图语。

在我看来，不太可能添加更多的语境阿拉伯语形式。因此，至少根据其当前设计，您提出的程序无法运行。

score 0 · Accepted Answer

Earlier Unicode versions included separate codes for the different forms of Arabic letters for all letters except some. Arabic letters are used to write Pashto, Farsi, Urdu, and few other languages. The letters that were used in Arabic, Farsi, and may be a couple more languages were assigned different codes for each form of the their letters. However, the letters used only by less taught languages like Pashto, which you are asking about, were assigned codes for only the isolated forms. In the later versions of the Unicode, it was decided to only assign a single code to each letter, leaving Pashto only letters to have codes for only the isolated forms. Actually there was no need to have a separate code for each form which was a bad decision made by the earlier Unicode versions. A rendering engine (editors, and other programs that deal with plain text) should account for the different forms of each letter and display the correct form according to its position.

unicode - ټ,ګ,ځ,څ,ڼ,ښ,ډ,ۍ,ړ,ې 在普什图语中的语境形式的 Unicode

2 回答 2

Related

Reference