javascript - JavaScript：如何检查字符是否为 RTL？

Question

如何以编程方式检查浏览器是否将某些字符视为 JavaScript 中的 RTL？

也许创建一些透明的 DIV 并查看文本的放置位置？

一点上下文。Unicode 5.2 添加了 Avestan 字母支持。因此，如果浏览器支持 Unicode 5.2，它会将 U+10B00 之类的字符视为 RTL（目前只有 Firefox 支持）。否则，它将这些字符视为 LTR，因为这是默认设置。

我如何以编程方式检查这个？我正在编写一个 Avestan 输入脚本，如果浏览器太笨，我想覆盖双向方向。但是，如果浏览器确实支持 Unicode，则不应覆盖双向设置（因为这将允许混合 Avestan 和 Cyrillic）。

我目前这样做：

var ua = navigator.userAgent.toLowerCase();

if (ua.match('webkit') || ua.match('presto') || ua.match('trident')) {
    var input = document.getElementById('orig');
    if (input) {
        input.style.direction = 'rtl';
        input.style.unicodeBidi = 'bidi-override';
    }
}

但是，很明显，在 Chrome 和 Opera 开始支持 Unicode 5.2 之后，这会降低脚本的可用性。

score 31 · Accepted Answer

function isRTL(s){           
    var ltrChars    = 'A-Za-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02B8\u0300-\u0590\u0800-\u1FFF'+'\u2C00-\uFB1C\uFDFE-\uFE6F\uFEFD-\uFFFF',
        rtlChars    = '\u0591-\u07FF\uFB1D-\uFDFD\uFE70-\uFEFC',
        rtlDirCheck = new RegExp('^[^'+ltrChars+']*['+rtlChars+']');

    return rtlDirCheck.test(s);
};

游乐场页面

score 8 · Accepted Answer

I realize this is quite a while after the original question was asked and answered but I found vsync's update to be rather useful and just wanted to add some observations. I would add this in comment to his answer but my reputation is not high enough yet.

Instead of a regular expression that searches from the start of the line zero or more non-LTR characters and then one RTL character, wouldn't it make more sense to search from the start of the line zero or more weak/neutral characters and then one RTL character? Otherwise you have the potential for matching many RTL characters unnecessarily. I would welcome a more thorough examination of my weak/neutral character group as I merely used the negation of the combined LTR and RTL character groups.

Additionally, shouldn't characters such as LTR/RTL marks, embeds, overrides be included in the appropriate character groupings?

I would think then that the final code should look something like:

function isRTL(s){           
    var weakChars       = '\u0000-\u0040\u005B-\u0060\u007B-\u00BF\u00D7\u00F7\u02B9-\u02FF\u2000-\u2BFF\u2010-\u2029\u202C\u202F-\u2BFF',
        rtlChars        = '\u0591-\u07FF\u200F\u202B\u202E\uFB1D-\uFDFD\uFE70-\uFEFC',
        rtlDirCheck     = new RegExp('^['+weakChars+']*['+rtlChars+']');

    return rtlDirCheck.test(s);
};

Update

There may be some ways to speed up the above regular expression. Using a negated character class with a lazy quantifier seems to help improve speed (tested on http://regexhero.net/tester/?id=6dab761c-2517-4d20-9652-6d801623eeec, site requires Silverlight 5)

Additionally, if the directionality of the string is unknown, my guess is that for most cases the string will be LTR instead of RTL and creating an isLTR function would return results faster if that is the case but as OP is asking for isRTL, will provide isRTL function:

function isRTL(s){           
    var rtlChars        = '\u0591-\u07FF\u200F\u202B\u202E\uFB1D-\uFDFD\uFE70-\uFEFC',
        rtlDirCheck     = new RegExp('^[^'+rtlChars+']*?['+rtlChars+']');

    return rtlDirCheck.test(s);
};

score 3 · Accepted Answer

测试希伯来语和阿拉伯语（我知道的唯一现代 RTL 语言/字符集从右到左流动，除了我没有研究过的任何与波斯相关的）：

/[\u0590-\u06FF]/.test(textarea.value)

更多的研究表明：

/[\u0590-\u07FF\u200F\u202B\u202E\uFB1D-\uFDFD\uFE70-\uFEFC]/.test(textarea.value)

score 2 · Accepted Answer

首先解决标题中的问题：

JavaScript 中没有用于访问字符的 Unicode 属性的工具。您需要为此目的找到一个库或服务（如果您需要可靠的东西，恐怕这可能很困难）或从 Unicode 字符“数据库”（特定格式的文本文件的集合）中提取相关信息) 并编写自己的代码来使用它。

然后是消息正文中的问题：

这似乎更加绝望了。但由于这可能适用于有限数量的知识渊博并了解 Avestan 的用户，因此以适当的方向显示一串 Avestan 字符以及它们的图像并要求用户单击一个可能不会太糟糕按钮，如果顺序错误。您可以将此选择保存在 cookie 中，以便用户只需执行一次（每个浏览器；尽管它应该是相对短暂的 cookie，因为浏览器可能会更新）。

score 2 · Accepted Answer

感谢您的评论，但似乎我自己做了这个：

function is_script_rtl(t) {
    var d, s1, s2, bodies;

    //If the browser doesn’t support this, it probably doesn’t support Unicode 5.2
    if (!("getBoundingClientRect" in document.documentElement))
        return false;

    //Set up a testing DIV
    d = document.createElement('div');
    d.style.position = 'absolute';
    d.style.visibility = 'hidden';
    d.style.width = 'auto';
    d.style.height = 'auto';
    d.style.fontSize = '10px';
    d.style.fontFamily = "'Ahuramzda'";
    d.appendChild(document.createTextNode(t));

    s1 = document.createElement("span");
    s1.appendChild(document.createTextNode(t));
    d.appendChild(s1);

    s2 = document.createElement("span");
    s2.appendChild(document.createTextNode(t));
    d.appendChild(s2);

    d.appendChild(document.createTextNode(t));

    bodies = document.getElementsByTagName('body');
    if (bodies) {
        var body, r1, r2;

        body = bodies[0];
        body.appendChild(d);
        var r1 = s1.getBoundingClientRect();
        var r2 = s2.getBoundingClientRect();
        body.removeChild(d);

        return r1.left > r2.left;
    }

    return false;   
}

使用示例：

Avestan in <script>document.write(is_script_rtl('') ? "RTL" : "LTR")</script>,
Arabic is <script>document.write(is_script_rtl('العربية') ? "RTL" : "LTR")</script>,
English is <script>document.write(is_script_rtl('English') ? "RTL" : "LTR")</script>.

它似乎工作。:)

javascript - JavaScript：如何检查字符是否为 RTL？

5 回答 5

Update

Related

Reference