0

我有很多 Firefox 会话管理器保存的文件,名为 *.session.and 我想从文件中导出 url 和标题,我写了正则表达式:

(?<=entries":\[{"url":"(?<link>.*?(?="))","title":"(?<content>.*?)(?=",")) 

但似乎效果不佳。它匹配得太多了。

档案当事人

[SessionManager v2]
name=jjjjjjjjjjjjjjjjjj
timestamp=1368030038170
autosave=false  count=1/49  screensize=1366x768
{"windows":[{"tabs":[{"entries":[{"url":"http://blog.csdn.net/gisfarmer/article/details/4135975?1357376310","title":"图像相似度算法的C#实现及测评 - 老骆驼空间站 - 博客频道 - CSDN.NET","ID":1673113085,"docshellID":36,"referrer":"http://blog.csdn.net/gisfarmer/article/details/4135975","docIdentifier":80,"children":[{"url":"about:blank","ID":1673113086,"docshellID":34,"docIdentifier":81},{"url":"about:blank","ID":1673113087,"docshellID":168,"docIdentifier":82},{"url":"about:blank","ID":1673113088,"docshellID":55,"docIdentifier":83},{"url":"about:blank","ID":1673113089,"docshellID":37,"owner_b64":"CbflmEkNQj+opi5sTsh3UAAAAAAAAAAAwAAAAAAAAEYB3pRy0IA0EdOTmQAQS6D9QDlf4EV9GErbo/2vmMihrxEAAAAC/////wAAAFABAAAAQWh0dHA6Ly9ibG9nLmNzZG4ubmV0L2dpc2Zhcm1lci9hcnRpY2xlL2RldGFpbHMvNDEzNTk3NT8xMzU3Mzc2MzEwAAAAAAAAAAQAAAAHAAAADQAAAAf/////AAAAB/////8AAAAHAAAADQAAABQAAAAtAAAAFAAAACIAAAAUAAAAGwAAAC8AAAAHAAAAL/////8AAAAA/////wAAADcAAAAKAAAAFP////8BAAAAAAAAAAAAAQAAAAAAAA==","docIdentifier":84},{"url":"about:blank","ID":1673113090,"docshellID":31,"owner_b64":"CbflmEkNQj+opi5sTsh3UAAAAAAAAAAAwAAAAAAAAEYB3pRy0IA0EdOTmQAQS6D9QDlf4EV9GErbo/2vmMihrxEAAAAC/////wAAAFABAAAAQWh0dHA6Ly9ibG9nLmNzZG4ubmV0L2dpc2Zhcm1lci9hcnRpY2xlL2RldGFpbHMvNDEzNTk3NT8xMzU3Mzc2MzEwAAAAAAAAAAQAAAAHAAAADQAAAAf/////AAAAB/////8AAAAHAAAADQAAABQAAAAtAAAAFAAAACIAAAAUAAAAGwAAAC8AAAAHAAAAL/////8AAAAA/////wAAADcAAAAKAAAAFP////8BAAAAAAAAAAAAAQAAAAAAAA==","docIdentifier":85},{"url":"about:blank","ID":1673113091,"docshellID":63,"owner_b64":"CbflmEkNQj+opi5sTsh3UAAAAAAAAAAAwAAAAAAAAEYB3pRy0IA0EdOTmQAQS6D9QDlf4EV9GErbo/2vmMihrxEAAAAC/////wAAAFABAAAAQWh0dHA6Ly9ibG9nLmNzZG4ubmV0L2dpc2Zhcm1lci9hcnRpY2xlL2RldGFpbHMvNDEzNTk3NT8xMzU3Mzc2MzEwAAAAAAAAAAQAAAAHAAAADQAAAAf/////AAAAB/////8AAAAHAAAADQAAABQAAAAtAAAAFAAAACIAAAAUAAAAGwAAAC8AAAAHAAAAL/////8AAAAA/////wAAADcAAAAKAAAAFP////8BAAAAAAAAAAAAAQAAAAAAAA==","docIdentifier":86},{"url":"about:blank","ID":1673113092,"docshellID":22,"owner_b64":"CbflmEkNQj+opi5sTsh3UAAAAAAAAAAAwAAAAAAAAEYB3pRy0IA0EdOTmQAQS6D9QDlf4EV9GErbo/2vmMihrxEAAAAC/////wAAAFABAAAAQWh0dHA6Ly9ibG9nLmNzZG4ubmV0L2dpc2Zhcm1lci9hcnRpY2xlL2RldGFpbHMvNDEzNTk3NT8xMzU3Mzc2MzEwAAAAAAAAAAQAAAAHAAAADQAAAAf/////AAAAB/////8AAAAHAAAADQAAABQAAAAtAAAAFAAAACIAAAAUAAAAGwAAAC8AAAAHAAAAL/////8AAAAA/////wAAADcAAAAKAAAAFP////8BAAAAAAAAAAAAAQAAAAAAAA==","docIdentifier":87},{"url":"about:blank","ID":1673113093,"docshellID":118,"owner_b64":"CbflmEkNQj+opi5sTsh3UAAAAAAAAAAAwAAAAAAAAEYB3pRy0IA0EdOTmQAQS6D9QDlf4EV9GErbo/2vmMihrxEAAAAC/////wAAAFABAAAAQWh0dHA6Ly9ibG9nLmNzZG4ubmV0L2dpc2Zhcm1lci9hcnRpY2xlL2RldGFpbHMvNDEzNTk3NT8xMzU3Mzc2MzEwAAAAAAAAAAQAAAAHAAAADQAAAAf/////AAAAB/////8AAAAHAAAADQAAABQAAAAtAAAAFAAAACIAAAAUAAAAGwAAAC8AAAAHAAAAL/////8AAAAA/////wAAADcAAAAKAAAAFP////8BAAAAAAAAAAAAAQAAAAAAAA==","docIdentifier":88},{"url":"about:blank","ID":1673113094,"docshellID":59,"owner_b64":"CbflmEkNQj+opi5sTsh3UAAAAAAAAAAAwAAAAAAAAEYB3pRy0IA0EdOTmQAQS6D9QDlf4EV9GErbo/2vmMihrxEAAAAC/////wAAAFABAAAAQWh0dHA6Ly9ibG9nLmNzZG4ubmV0L2dpc2Zhcm1lci9hcnRpY2xlL2RldGFpbHMvNDEzNTk3NT8xMzU3Mzc2MzEwAAAAAAAAAAQAAAAHAAAADQAAAAf/////AAAAB/////8AAAAHAAAADQAAABQAAAAtAAAAFAAAACIAAAAUAAAAGwAAAC8AAAAHAAAAL/////8AAAAA/////wAAADcAAAAKAAAAFP////8BAAAAAAAAAAAAAQAAAAAAAA==","docIdentifier":89},{"url":"about:blank","ID":1673113095,"docshellID":137,"owner_b64":"CbflmEkNQj+opi5sTsh3UAAAAAAAAAAAwAAAAAAAAEYB3pRy0IA0EdOTmQAQS6D9QDlf4EV9GErbo/2vmMihrxEAAAAC/////wAAAFABAAAAQWh0dHA6Ly9ibG9nLmNzZG4ubmV0L2dpc2Zhcm1lci9hcnRpY2xlL2RldGFpbHMvNDEzNTk3NT8xMzU3Mzc2MzEwAAAAAAAAAAQAAAAHAAAADQAAAAf/////AAAAB/////8AAAAHAAAADQAAABQAAAAtAAAAFAAAACIAAAAUAAAAGwAAAC8AAAAHAAAAL/////8AAAAA/////wAAADcAAAAKAAAAFP////8BAAAAAAAAAAAAAQAAAAAAAA==","docIdentifier":90},{"url":"about:blank","ID":1673113096,"docshellID":254,"owner_b64":"CbflmEkNQj+opi5sTsh3UAAAAAAAAAAAwAAAAAAAAEYB3pRy0IA0EdOTmQAQS6D9QDlf4EV9GErbo/2vmMihrxEAAAAC/////wAAAFABAAAAQW

和结果 在此处输入图像描述

任何可以提供帮助的机构!

4

2 回答 2

7

为什么不解析 json 并在不使用正则表达式的情况下循环呢?

于 2013-07-09T04:32:30.367 回答
3

马特布莱恩特的方式似乎是最好的。对于您的正则表达式问题,您可以简单地使用:

"url":"(?<link>[^"]+)","title":"(?<content>[^"]+)

或为了更安全

"url":"(?<link>(?>[^"]+|(?<=\\)")+)","title":"(?<content>(?>[^"]+|(?<=\\)")+)
于 2013-07-09T04:42:17.600 回答