您将需要使用DOMDocument解析 DOM 树:
<?php
function GetTitle($url)
{
$dom = new DOMDocument;
@$dom->loadHTMLFile($url); // @ supresses warnings
// try to get meta application-name
foreach ($dom->getElementsByTagName("meta") as $meta)
{
$metaName = $meta->attributes->getNamedItem("name");
if (strtolower($metaName->nodeValue) == "application-name")
{
$metaContent = $meta->attributes->getNamedItem("content");
if ($metaContent != NULL)
return $metaContent->nodeValue;
}
}
// title fallback:
foreach ($dom->getElementsByTagName("title") as $title)
return $title->nodeValue;
return NULL;
}
print(GetTitle("http://www.nytimes.com/"));
?>
首先,GetTitle()
寻找<meta name="application-name">
标签。如果未找到,它将回退并返回页面标题。
此外,您应该传递基本 url。Fe 如果你有这个 url: http://stackoverflow.com/questions/16185145/how-to-retrieve-website-names/16185654#16185654
,你应该去掉除了http://stackoverflow.com
使用parse_url之外的所有东西:
$parsedUrl = parse_url($url);
GetTitle($parsedUrl["scheme"] + "://" + $parsedUrl["host"]);