2

I'm very new in developing regular expressions. Can any one help me to build a common regular expression for the following two strings.

  1. offset=0&fileGuid=1014fc48-6eb3-4fff-8242-96f5a496b0ee
  2. file=Arnold+P+Gleit+Contract+Information.pdf&param=CommandType=1.1.2.1~ProjectID=13979~FolderID=1344972~UserID=13395&fileGuid=51c01e15-ac3c-4d2d-bdc9-4e63251a0364&location=CommandType=1.1.2.1~ProjectID=13979~FolderID=1344972~UserID=13395&size=28357151&title=&desc=&searchtags=&OnDuplicateAction=2&offset=1703936&first=True&last=False

My intention is to get the value of "offset" and "fileGuid" and load the data into a Hive table. I tried with the regular expression --

"input.regex" = "offset=([0-9]+).*\\&fileGuid=([a-zA-Z0-9]+\\-[a-zA-Z0-9]+\\-[a-zA-Z0-9]+\\-[a-zA-Z0-9]+\\-[a-zA-Z0-9]+)"

This will work for the first string but will fail for the 2nd string.

Thanks in Advance.

4

1 回答 1

0

这应该可以解决问题:/offset=([^&$]+)|fileGuid=([^&$]+)/

基本上它匹配offset=<string to next & or end of line>fileGuid=<string to next & or end of line>

示例:http ://regexr.com?34t57

将偏移量/fileGuid 字符串添加到匹配项中也可能很方便,这样更容易看出哪个是哪个,这样做使它像(offset)=([^&$]+)|(fileGuid)=([^&$]+).

于 2013-05-16T13:42:14.413 回答