I have the following rule (taken from SMTP - RFC5321):
!path : "<" [ a_d_l ":" ] mailbox ">"
When I try to parse this line:
<test.com:test.test@testtest.com>
I get the following error:
No terminal defined for ':'
What's unusual is that if I simply change the ":"
for "_"
, it somehow works:
!path : "<" [ a_d_l "_" ] mailbox ">"
<test.com_test.test@testtest.com>
What also works is a line not including that part [ a_d_l ":" ]
(which is optional as indicated by []
)
!path : "<" [ a_d_l ":" ] mailbox ">"
<test.test@testtest.com>
I already tried to define a terminal rule for the colon but this did not work either:
!path : "<" [ a_d_l COLON ] mailbox ">"
COLON : ":"
<test.test@testtest.com>
Minimal reproducible example:
As requested in the comments.
from lark import Lark
grammar = r'''
!path : "<" [ a_d_l ":" ] mailbox ">"
a_d_l : at_domain ( "," at_domain )*
at_domain : "@" domain
domain : sub_domain ("." sub_domain)*
sub_domain : let_dig [ldh_str]
let_dig : ALPHA | DIGIT
!ldh_str : ( ALPHA | DIGIT | "-" )* let_dig
address_literal : "[" ( ipv4_address_literal | ipv6_address_literal | general_address_literal ) "]"
ipv4_address_literal : snum ("." snum)~3
snum : DIGIT~1..3
ipv6_address_literal : "ipv6:" ipv6_addr
ipv6_addr : ipv6_full | ipv6_comp | ipv6v4_full | ipv6v4_comp
ipv6_full : ipv6_hex (":" ipv6_hex)~7
ipv6_hex : HEXDIG~1..4
!ipv6_comp : [ipv6_hex (":" ipv6_hex)~0..5] "::" [ipv6_hex (":" ipv6_hex)~0..5]
!ipv6v4_full : ipv6_hex (":" ipv6_hex)~5 ":" ipv4_address_literal
!ipv6v4_comp : [ipv6_hex (":" ipv6_hex)~0..3] "::" [ipv6_hex (":" ipv6_hex)~0..3 ":"] ipv4_address_literal
!general_address_literal : standardized_tag ":" dcontent+
standardized_tag : ldh_str
dcontent : /[\x21-\x5A|\x5E-\x7E]/
mailbox : local_part /[\x40]/ ( domain | address_literal )
local_part : dot_string | quoted_string
dot_string : atom ("." atom)*
atom : atext+
quoted_string : /[\x22]/ qcontentsmtp* /[\x22]/
qcontentsmtp : qtextsmtp | quoted_pairsmtp
quoted_pairsmtp : /[\x5C\x5C]/ /[\x20-\x7E]/
qtextsmtp : /[\x20-\x21|\x23-\[\]-\x7E]/
atext : /[\x21|\x23-\x27|\x2A|\x2B|\x2D|\x2F-\x39|\x3D|\x3F|\x41-\x5A|\x5E-\x7E]/
command : [ path ]
%import common.WS -> SP
%import common.NEWLINE -> CRLF
%import common.DIGIT
%import common.LETTER -> ALPHA
%import common.HEXDIGIT -> HEXDIG'''
input = "<test.com:test.test@testtest.com>"
try:
result = Lark(grammar, start="command").parse(input)
except Exception as ex:
print('####### Parsing Failed')
print(ex)
traceback.print_exc()
result = None
return result