json - 使用正则表达式解析 FireFox 书签

Question

我尝试使用以下方法解析 Firefox 书签（JSON 导出版本）：

cat boo.json | grep '\"uri\"\:\"^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}\"'
cat boo.json | grep '"uri"\:"^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}'
cat boo.json | grep '"uri"\:"^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}"'

其他几个但都失败了，json 书签文件将如下所示：

.........."uri":"http://www.google.com/?"......"uri":"http://stackoverflow.com/"

所以，输出应该是这样的：

"uri":"http://www.google.com/?"
"uri":"http://stackoverflow.com/"

我的正则表达式缺少什么？

更新：

书签文件上的 URL 以以下特殊字符之一结尾：

/，例如： "uri":"http://stackoverflow.com/"

"，例如： "uri":"http://stackoverflow.com/questions/13148794/parsing-firefox-bookmarks-using-regular-expression"

}，例如： "uri":"https://fr.add-ons.mozilla.com/fr/firefox/bookmarks/"}

使用这个修改后的正则表达式：

$ egrep -o "(http|https)://([^ ]*).(*\/)"  boo.json

结果：

http://fr.fxfeeds.mozilla.com/fr/firefox/headlines.xml"},{"name":"livemark/siteURI","flags":0,"expires":4,"mimeType":null,"type":3,"value":"http://www.lemonde.fr/"}],"type":"text/x-moz-place-container","children":[]}]},{"index":2,"title":"Tags","id":4,"parent":1,"dateAdded":1344432674984000,"lastModified":1344432674984000,"type":"text/
http://stackoverflow.com/questions/13148794/parsing-firefox-bookmarks-using-regular-expression","charset":"UTF-8"},{"index":29,"title":"adrusi/
http://stackoverflow.com/
...

但这仍然不能让我只有网址。

score 0 · Accepted Answer

0

你试过 JSON.sh 吗？它的作品很棒！

https://github.com/dominictarr/JSON.sh

于 2012-10-30T22:59:57.973 回答

score 0 · Accepted Answer

我使用这个正则表达式来提取网址，效果很好

cat *.html | grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" | sort | uniq

score -1 · Accepted Answer

Jeff Atwood 先生发表了一篇关于 urls 问题的文章，通过他提出的正则表达式，我设法从 FireFox 书签中提取了所有 url：

egrep -o "\(?\bhttp://[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]"  my-bookmark.json

json - 使用正则表达式解析 FireFox 书签

3 回答 3

Related

Reference