3

I've gotten lost in an edge case of sorts. I'm working on a conversion of some old plaintext documentation to reST/Sphinx format, with the intent of outputting to a few formats (including HTML and text) from there. Some of the documented functions are for dealing with bitstrings, and a common case within these is a sentence like the following: Starting character is the blank " " which has the value 0.

I tried writing this as an inline literal the following ways: Starting character is the blank `` `` which has the value 0. or Starting character is the blank :literal:` ` which has the value 0. but there are a few problems with how these end up working:

  1. reST syntax objects to a whitespace immediately inside of the literal, and it doesn't get recognized.
  2. The above can be "fixed"--it looks correct in the HTML () and plaintext (" ") output--with a non-breaking space character inside the literal, but technically this is a lie in our case, and if a user copied this character, they wouldn't be copying what they expect.
  3. The space can be wrapped in regular quotes, which allows the literal to be properly recognized, and while the output in HTML is probably fine (" "), in plaintext it ends up double-quoted as "" "".
  4. In both 2/3 above, if the literal falls on the wrap boundary, the plaintext writer (which uses textwrap) will gladly wrap inside the literal and trim the space because it's at the start/end of the line.

I feel like I'm missing something; is there a good way to handle this?

4

2 回答 2

4

尝试使用unicode 字符代码。如果我理解你的问题,这应该有效。

Here is a "|space|" and a non-breaking space (|nbspc|)

.. |space| unicode:: U+0020 .. space
.. |nbspc| unicode:: U+00A0 .. non-breaking space

你应该看到:

这是一个“ ”和一个不间断的空格 ( )

于 2015-07-10T03:35:59.083 回答
1

我希望在不需要自定义代码来处理它的情况下摆脱这种情况,但是,唉,我还没有找到这样做的方法。我会再等几天才能接受这个答案,以防有人有更好的主意。下面的代码不完整,我也不确定它是否“完成”(将在我们的审查过程中准确地整理出它应该是什么样子),但基础是完整的。

该方法有两个主要组成部分:

  1. 引入一个char角色,该角色期望字符的 unicode 名称作为其参数,并在将字符本身包装在内联文字节点中时产生字符的内联描述。
  2. 修改 Sphinx 使用的文本包装器,使其不会在空格处中断。

这是代码:

class TextWrapperDeux(TextWrapper):
    _wordsep_re = re.compile(
    r'((?<!`)\s+(?!`)|'                       # whitespace not between backticks
    r'(?<=\s)(?::[a-z-]+:)`\S+|'              # interpreted text start
    r'[^\s\w]*\w+[a-zA-Z]-(?=\w+[a-zA-Z])|'   # hyphenated words
    r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))')   # em-dash

    @property
    def wordsep_re(self):
        return self._wordsep_re

def char_role(name, rawtext, text, lineno, inliner, options={}, content=[]):
    """Describe a character given by unicode name.

    e.g., :char:`SPACE` -> "char:` `(U+00020 SPACE)"
    """
    try:
        character = nodes.unicodedata.lookup(text)
    except KeyError:
        msg = inliner.reporter.error(
            ':char: argument %s must be valid unicode name at line %d' % (text, lineno))
        prb = inliner.problematic(rawtext, rawtext, msg)
        return [prb], [msg]
    app = inliner.document.settings.env.app
    describe_char = "(U+%05X %s)" % (ord(character), text)
    char = nodes.inline("char:", "char:", nodes.literal(character, character))
    char += nodes.inline(describe_char, describe_char)
    return [char], []

def setup(app):
    app.add_role('char', char_role)

上面的代码缺少一些胶水来实际强制使用新的 TextWrapper、导入等。当完整版本完成后,我可能会尝试找到一种有意义的方式来重新发布它;如果是这样,我会在这里链接它。

标记:Starting character is the :char:`SPACE` which has the value 0.

它会产生这样的纯文本输出:Starting character is the char:` `(U+00020 SPACE) which has the value 0.

和 HTML 输出,如:Starting character is the <span>char:<code class="docutils literal"> </code><span>(U+00020 SPACE)</span></span> which has the value 0.

HTML 输出最终看起来大致如下: 起始字符是 char: (U+00020 SPACE),其值为 0。

于 2015-07-12T16:59:10.510 回答