我尝试使用以下方法解析 Firefox 书签(JSON 导出版本):
cat boo.json | grep '\"uri\"\:\"^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}\"'
cat boo.json | grep '"uri"\:"^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}'
cat boo.json | grep '"uri"\:"^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}"'
其他几个但都失败了,json 书签文件将如下所示:
.........."uri":"http://www.google.com/?"......"uri":"http://stackoverflow.com/"
所以,输出应该是这样的:
"uri":"http://www.google.com/?"
"uri":"http://stackoverflow.com/"
我的正则表达式缺少什么?
更新:
书签文件上的 URL 以以下特殊字符之一结尾:
/
,例如: "uri":"http://stackoverflow.com/"
"
,例如: "uri":"http://stackoverflow.com/questions/13148794/parsing-firefox-bookmarks-using-regular-expression"
}
,例如: "uri":"https://fr.add-ons.mozilla.com/fr/firefox/bookmarks/"}
使用这个修改后的正则表达式:
$ egrep -o "(http|https)://([^ ]*).(*\/)" boo.json
结果:
http://fr.fxfeeds.mozilla.com/fr/firefox/headlines.xml"},{"name":"livemark/siteURI","flags":0,"expires":4,"mimeType":null,"type":3,"value":"http://www.lemonde.fr/"}],"type":"text/x-moz-place-container","children":[]}]},{"index":2,"title":"Tags","id":4,"parent":1,"dateAdded":1344432674984000,"lastModified":1344432674984000,"type":"text/
http://stackoverflow.com/questions/13148794/parsing-firefox-bookmarks-using-regular-expression","charset":"UTF-8"},{"index":29,"title":"adrusi/
http://stackoverflow.com/
...
但这仍然不能让我只有网址。