1

我正在使用ParseKit来解析度量单位。为了做到这一点,我必须提供一个语法。我尝试使用谷歌搜索,但这并没有让我走得太远。虽然这对我自己来说是一个有趣的练习,但我想确保我做对了。ParseKit 期待这样的 BNF 语法:

@start  = number units;
units = unit+ | unit+ / unit+;
unit = prefix baseUnit | baseUnit;
prefix = '' | 'milli' | 'micro' | 'pico';
baseUnit = 'm' | 'meter' | 'g' | 'gram'

我希望支持输入,例如:

25 m²
25 m^-3
25 m**-5/kg**-2
25 m/s squared
25 mm² per second
25 m/s
5 kg meters per second squared
3 m-kg/s^2
3 m kilograms
4

2 回答 2

1

我在 unidata.ucar.edu 上找到的这个语法看起来很正式,虽然笨拙,并且不包含前缀或单位。

Unit-Spec:一无所有的 Shift-Spec

 Shift-Spec: one of
         Product-Spec
         Product-Spec SHIFT REAL
         Product-Spec SHIFT INT
         Product-Spec SHIFT Timestamp

 Product-Spec: one of
         Power-Spec
         Product-Spec Power-Spec
         Product-Spec MULTIPLY Power-Spec
         Product-Spec DIVIDE Power-Spec

 Power-Spec: one of
         Basic-Spec
         Basic-Spec INT
         Basic-Spec EXPONENT
         Basic-Spec RAISE INT

 Basic-Spec: one of
         ID
         "(" Shift-Spec ")"
         LOGREF Product_Spec ")"
         Number

 Number: one of
         INT
         REAL

 Timestamp: one of
         DATE
         DATE CLOCK
         DATE CLOCK CLOCK
         DATE CLOCK INT
         DATE CLOCK ID
         TIMESTAMP
         TIMESTAMP INT
         TIMESTAMP ID

 SHIFT:
         <space>* <shift_op> <space>*

 <shift_op>: one of
         "@"
         "after"
         "from"
         "since"
         "ref"

 REAL:
         the usual floating-point format

 INT:
         the usual integer format

 MULTIPLY: one of
         "-"
         "."
         "*"
         <space>+
         <centered middot>

 DIVIDE:
         <space>* <divide_op> <space>*

 <divide_op>: one of
         per
         PER
         "/"

 EXPONENT:
         ISO-8859-9 or UTF-8 encoded exponent characters

 RAISE: one of
         "^"
         "**"

 ID: one of
         <id>
         "%"
         "'"
         "\""
         degree sign
         greek mu character

 <id>:
         <alpha> <alphanum>*

 <alpha>:
         [A-Za-z_]
         ISO-8859-1 alphabetic characters
         non-breaking space

 <alphanum>: one of
         <alpha>
         <digit>

 <digit>:
         [0-9]

 LOGREF:
         <log> <space>* <logref>

 <log>: one of
         "log"
         "lg"
         "ln"
         "lb"

 <logref>:
         "(" <space>* <re> ":"? <space>*

 DATE:
         <year> "-" <month> ("-" <day>)?

 <year>:
         [+-]?[0-9]{1,4}

 <month>:
         "0"?[1-9]|1[0-2]

 <day>:
         "0"?[1-9]|[1-2][0-9]|"30"|"31"

 CLOCK:
         <hour> ":" <minute> (":" <second>)?

 TIMSTAMP:
         <year> (<month> <day>?)? "T" <hour> (<minute> <second>?)?

 <hour>:
         [+-]?[0-1]?[0-9]|2[0-3]

 <minute>:
         [0-5]?[0-9]

 <second>:
         (<minute>|60) (\.[0-9]*)?
于 2013-02-26T04:08:47.867 回答
1

ParseKit的开发者在这里。

我没有仔细查看您的示例输入以确定您的语法在语义上是否正确。

但是,我确实看到您现有语法的两个重要句法问题。


首先,此行包含 Left Recursion (以及未引用的语法错误/):

units = unit+ | unit+ / unit+;  // Incorrect. Will not work.

您必须更改此行以将左递归删除为以下内容:

units = unit ('/' unit)*;

有关在 ParseKit 语法中消除左递归的更多信息,请参阅我之前的答案。


其次,我相信这条线试图通过使用来允许“空”匹配''

prefix = '' | 'milli' | 'micro' | 'pico';   // Incorrect. Will not work.

ParseKit 语法中不支持这种语法。完全支持此功能,但语法Empty如下:

prefix = Empty | 'milli' | 'micro' | 'pico';

希望有帮助。

于 2013-02-26T22:49:42.207 回答