1

In Rascal, how can I match a non breaking space in a syntax definition? I found some of them in code I am trying to analyze and would like to recognize it as layout.

However, just pasting the character into the Rascal file is not working (with or without escaping by prefixing with a backslash), nor are "\uc2a0" "\uC2A0", while "c2a0" really is the way the character is coded in the file I am reading (according to the Hex Viewer plugin for Sublime Text 3, that is).

4

1 回答 1

1

It's the UTF8 encoding you are looking at in the file; which is not unicode codepoints. If we interpret the UTF8 encoding of this character in the 24-bit unicode table, you get a "슠":

rascal>"\uc2a0"
str: "슠"

The non-breaking space in unicode Rascal escape style is: \u00A0.

I found out by starting a console and copy/pasting the character from sublime into the console between quotes:

rascal>" "
str: "\u00a0"

Matching it can be done like so using a regular expression:

rascal>/\u00a0/ := " "
bool: true

But you can also literally use the whitespace character:

rascal>/ / := " "
bool: true

Or using a syntax definition:

rascal>lexical WS = [\u00a0];
ok
rascal>parse(#WS, " ")
WS: (WS) ` `
于 2016-06-30T22:05:31.530 回答