I'm using wget
(from perl) to get web pages from a site. I'm really only interested in the html
,htm
,php
,asp
,aspx
file types. However, at least one site has supplied links using file names with no extensions/suffix. I need those too.
My:
wget -A html,htm,php,asp,aspx
works great, except for the no suffix links.
I've tried a number of regex strings to try and get the no suffix pages, but to no avail. wget returns just the main page. So far, the only way to get these files is to open it up to all files (which isn't terrible for this website, but would be terrible for others).
Is there either a regex or regular way to specify I want links from wget with no suffixes?