我必须使用 ant 从网页中提取数字。我已经使用任务下载了页面。马页是:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<TITLE>Index of .......</TITLE>
</HEAD>
<BODY>
<H1>Index of .....</H1>
<PRE><IMG SRC="/icons/blank.gif" ALT=" "> <A HREF="?N=A">Name</A> <A HREF="?M=D">Last modified</A> <A HREF="?S=A">Size</A> <A HREF="?D=A">Description</A>
<HR>
<IMG SRC="/icons/back.gif" ALT="[DIR]"> <A HREF="/projects/i/">Parent Directory</A> 19-Dec-2012 11:39 -
<IMG SRC="/icons/folder.gif" ALT="[DIR]"> <A HREF="20120114-1731/">20120114-1731/</A> 14-Feb-2012 17:40 -
<IMG SRC="/icons/folder.gif" ALT="[DIR]"> <A HREF="20120115-1055/">20120115-1055/</A> 15-Feb-2012 11:04 -
<IMG SRC="/icons/folder.gif" ALT="[DIR]"> <A HREF="20120115-1336/">20120115-1336/</A> 15-Feb-2012 13:44 -
<IMG SRC="/icons/folder.gif" ALT="[DIR]"> <A HREF="20120115-1656/">20120115-1656/</A> 15-Feb-2012 17:05 -
<IMG SRC="/icons/folder.gif" ALT="[DIR]"> <A HREF="20120115-2157/">20120115-2157/</A> 15-Feb-2012 22:06 -
</PRE><HR>
<ADDRESS>Apache/1.3.41 Server at romgsa.ibm.com Port 443</ADDRESS>
</BODY></HTML>
来自:<IMG SRC="/icons/folder.gif" ALT="[DIR]"> <A HREF="20120114-1731/"& gt;20120114-1731/</A> 我要提取“20120114- 1731"