我可以使用这个正则表达式读取和下载页面上的 .jpg 文件列表
MatchCollection match = Regex.Matches(htmlText,@"http://.*?\b.jpg\b", RegexOptions.RightToLeft);
输出示例: 来自html中这一行
的http://somefiles.jpg问题:我怎样才能读取这种格式的文件?<img src="http://somefiles.jpg"/>
<a href="download/datavoila-setup.exe" id="button_download" title="Download your copy of DataVoila!" onclick="pageTracker._trackPageview('/download/datavoila-setup.exe')"></a>
我只想在页面上使用 .exe 提取文件。所以在上面的例子中 ^ 我只想获取datavoila-setup.exe
文件。抱歉,我是个小菜鸟,很困惑如何去做 T_T。提前感谢任何可以帮助我的人。:)
这是我更新的代码,但我在 HtmlDocument doc = new HtmlDocument(); 上遇到错误 部分“没有可用的源”,我得到一个空值列表:(
protected void Button2_Click(object sender, EventArgs e)
{
//Get the url given by the user
string urls;
urls = txtSiteAddress.Text;
StringBuilder result = new StringBuilder();
//Give request to the url given
HttpWebRequest requesters = (HttpWebRequest)HttpWebRequest.Create(urls);
requesters.UserAgent = "";
//Check for the web response
WebResponse response = requesters.GetResponse();
Stream streams = response.GetResponseStream();
//reads the url as html codes
StreamReader readers = new StreamReader(streams);
string htmlTexts = readers.ReadToEnd();
HtmlDocument doc = new HtmlDocument();
doc.Load(streams);
var list = doc.DocumentNode.SelectNodes("//a[@href]")
.Select(p => p.Attributes["href"].Value)
.Where(x => x.EndsWith("exe"))
.ToList();
doc.Save("list");
}
这是 Flipbed 的答案,它有效但不是我没有得到一个干净的捕获:(我认为在将 html 拆分为文本时需要编辑一些东西
protected void Button2_Click(object sender, EventArgs e)
{
//Get the url given by the user
string urls;
urls = txtSiteAddress.Text;
StringBuilder result = new StringBuilder();
//Give request to the url given
HttpWebRequest requesters = (HttpWebRequest)HttpWebRequest.Create(urls);
requesters.UserAgent = "";
//Check for the web response
WebResponse response = requesters.GetResponse();
Stream streams = response.GetResponseStream();
//reads the url as html codes
StreamReader readers = new StreamReader(streams);
string htmlTexts = readers.ReadToEnd();
WebClient webclient = new WebClient();
string checkurl = webclient.DownloadString(urls);
List<string> list = new List<string>();//!3
//Splits the html into with \ into texts
string[] parts = htmlTexts.Split(new string[] { "\"" },//!3
StringSplitOptions.RemoveEmptyEntries);//!3
//Compares the split text with valid file extension
foreach (string part in parts)//!3
{
if (part.EndsWith(".exe"))//!3
{
list.Add(part);//!3
//Download the data into a Byte array
byte[] fileData = webclient.DownloadData(this.txtSiteAddress.Text + '/' + part);//!6
//Create FileStream that will write the byte array to
FileStream file =//!6
File.Create(this.txtDownloadPath.Text + "\\" + list);//!6
//Write the full byte array to the file
file.Write(fileData, 0, fileData.Length);//!6
//Download message complete
lblMessage.Text = "Download Complete!";
//Clears the textfields content
txtSiteAddress.Text = "";
txtDownloadPath.Text = "";
//Close the file so other processes can access it
file.Close();
break;
}
}