4

我正在阅读Sizzle源代码。当我阅读有关字符编码的常规时,我感到很困惑。在源代码中,characterEncoding 定义如下:

characterEncoding = "(?:\\\\.|[\\w-]|[^\\x00-\\xa0])+"

它看起来试图匹配 \\. 或 \w- 或 ^\x00-\xa0。我知道 [\w-] 表示 \ 或 w 或 -,我也知道 [^\x00-\xa0] 表示不在 \x00-\x20 中的任何内容。谁能告诉我\\是什么意思。和 \x00-\x20。

谢谢


我想我知道它是什么。characterEncoding 的类型是字符串。因此,如果我们分配如下:

characterEncoding = "(?:\\\\.|[\\w-]|[^\\x00-\\xa0])+"

characterEncoding 的值为:

(?:\\.|[\w-]|[^\x00-\xa0])+

因此,如果我像上面那样构建正则表达式,则意味着:

[\w-] // A symbol of Latin alphabet or a digit or an underscore '_' or '-'
[^\\x00-\\xa0] // ISO 10646 characters U+00A1 and higher
\\. // '\' and '.'

所以这一次,我的问题是这种模式什么时候会\\.起作用?

4

2 回答 2

4

该变量会更好地命名css3Identifier或其他东西。

转换[\w-]|[^\x00-\xa0]为更符合规范的等效形式:

[a-zA-Z0-9_-]|[\u00A1-\uFFFF]

考虑一下,A1是下划线,是破折号,然后 阅读161_-

在 CSS3 中,标识符(包括元素名称、类和选择器中的 ID(参见 [SELECT] [或者这仍然是真的]))只能包含字符[A-Za-z0-9]和 ISO 10646 字符161 及更高, 加上连字符 (-)下划线 (_)

“和更高”由-\uFFFF


"\\\\."匹配任何前面有反斜杠的单个字符。eg-\7B会匹配\7,然后B会被中间选项捕获。它还匹配\n,\r\t

于 2013-08-04T12:55:11.600 回答
1

It is just the valid regex format of CSS identifier, class, tag and attributes. A link is also in the source code comment. Following are the rules, including the possible use of backslashes which might answer your question:

4.1. Characters and case

The following rules always hold:

  • All CSS style sheets are case-insensitive, except for parts that are not under the control of CSS. For example, the case-sensitivity of values of the HTML attributes "id" and "class", of font names, and of URIs lies outside the scope of this specification. Note in particular that element names are case-insensitive in HTML, but case-sensitive in XML.

  • In CSS3, identifiers (including element names, classes, and IDs in selectors (see [SELECT] [or is this still true])) can contain only the characters [A-Za-z0-9] and ISO 10646 characters 161 and higher, plus the hyphen (-) and the underscore (_); they cannot start with a digit or a hyphen followed by a digit. They can also contain escaped characters and any ISO 10646 character as a numeric code (see next item). For instance, the identifier "B&W?" may be written as "B\&W\?" or "B\26 W\3F". (See [UNICODE310] and [ISO10646].)

  • In CSS3, a backslash () character indicates three types of character escapes.

    First, inside a string (see [CSS3VAL]), a backslash followed by a newline is ignored (i.e., the string is deemed not to contain either the backslash or the newline).

    Second, it cancels the meaning of special CSS characters. Any character (except a hexadecimal digit) can be escaped with a backslash to remove its special meaning. For example, "\"" is a string consisting of one double quote. Style sheet preprocessors must not remove these backslashes from a style sheet since that would change the style sheet's meaning.

    Third, backslash escapes allow authors to refer to characters they can't easily put in a style sheet. In this case, the backslash is followed by at most six hexadecimal digits (0..9A..F), which stand for the ISO 10646 ([ISO10646]) character with that number. If a digit or letter follows the hexadecimal number, the end of the number needs to be made clear. There are two ways to do that:

    1. with a space (or other whitespace character): "\26 B" ("&B"). In this case, user agents should treat a "CR/LF" pair (13/10) as a single whitespace character.
    2. by providing exactly 6 hexadecimal digits: "\000026B" ("&B")

    In fact, these two methods may be combined. Only one whitespace character is ignored after a hexadecimal escape. Note that this means that a "real" space after the escape sequence must itself either be escaped or doubled.

  • Backslash escapes are always considered to be part of an identifier or a string (i.e., "\7B" is not punctuation, even though "{" is, and "\32" is allowed at the start of a class name, even though "2" is not).

http://www.w3.org/TR/css3-syntax/#characters

于 2013-08-04T06:23:12.500 回答