0

I'm using httpwebreqest/httpwebresponse, the problem is on some sites httpwebresponse doesn't recognize cookies. This is what response.Headers returns.

 Cookie1=1;domain=subdomain.host.com;path=/;Expires=Thu, 30-Oct-1980 16:00:00 GMT
 Cookie2= ; HTTPOnly= ; domain=subdomain.host.com;path=/;Expires=Thu, 30-Oct-1980 16:00:00 GMT,
 Cookie5= ; domain=.host.com;path=/;HTTPOnly= ;version=1
 Cookie3=2; expires=Thu, 30-Oct-1980 16:00:00 GMT;domain=.host.com;path=/;HTTPOnly= ;version=1
 Cookie4=3; domain=.host.com;path=/;version= 

Raw (the cookies from response.Headers are all in single line string):

 Cookie1=1;domain=subdomain.host.com;path=/;Expires=Thu, 30-Oct-1980 16:00:00 GMT,Cookie2= ; HTTPOnly= ; domain=subdomain.host.com;path=/;Expires=Thu, 30-Oct-1980 16:00:00 GMT,Cookie5= ; domain=.host.com;path=/;HTTPOnly= ;version=1,Cookie3=2; expires=Thu, 30-Oct-1980 16:00:00 GMT;domain=.host.com;path=/;HTTPOnly= ;version=1,Cookie4=3; domain=.host.com;path=/;version= 

The following regex would work perfectly:

(.*?)=(.*?);

But the problem is I need to scrape the domain and expiration date too, but the domain and 'expires' appears in mixed locations. How can I scrape all the cookies and get domain and expiration field? thanks!

4

1 回答 1

1

您需要以下内容:

@"Cookie(?<index>\d+)\s*=\s*((domain\s*=\s*(?<domain>.*?)[;,])|(expires\s*=\s*(?<expires>.*?GMT))|(.(?!Cookie\d+=)))*"

有以下选项

RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture

根据您的时间是否都是格林威治标准时间,您可能希望使用更复杂的东西来捕获“过期”部分。

于 2013-10-30T08:55:13.867 回答