A website has linked to my site incorrectly, adding a full stop at the end of the link:
http://www.example.com/hello-world.
I would have expected this to go to a 404 page, instead, it loads the correct page, without redirecting to the valid URL.
This will be creating duplicate content issues in search engines.
Looking at a few other Wordpress site, it seems that it's a common issue, if you enter any number of full stops or hyphens (just the couple I came across), the page loads the correct content:
http://www.example.com/hello-------world.......
http://www.example.com/hello-....world-----
Has anyone else come across this issue and found a solution?
I could setup a redirect from the linked URL to the correct URL, but ideally I'd like to find a solution so this won't occur in the future.
UPDATE
I've found that the issue seems to be due to the sanitize_title_with_dashes
function in /wp-includes/formatting.php (line 954):
function sanitize_title_with_dashes($title, $raw_title = '', $context = 'display') {
echo "1: " . $title . "<br />";
$title = strip_tags($title);
// Preserve escaped octets.
$title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
// Remove percent signs that are not part of an octet.
$title = str_replace('%', '', $title);
// Restore octets.
$title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);
if (seems_utf8($title)) {
if (function_exists('mb_strtolower')) {
$title = mb_strtolower($title, 'UTF-8');
}
$title = utf8_uri_encode($title, 200);
}
$title = strtolower($title);
$title = preg_replace('/&.+?;/', '', $title); // kill entities
$title = str_replace('.', '-', $title);
echo "2: " . $title . "<br />";
if ( 'save' == $context ) {
// Convert nbsp, ndash and mdash to hyphens
$title = str_replace( array( '%c2%a0', '%e2%80%93', '%e2%80%94' ), '-', $title );
// Strip these characters entirely
$title = str_replace( array(
// iexcl and iquest
'%c2%a1', '%c2%bf',
// angle quotes
'%c2%ab', '%c2%bb', '%e2%80%b9', '%e2%80%ba',
// curly quotes
'%e2%80%98', '%e2%80%99', '%e2%80%9c', '%e2%80%9d',
'%e2%80%9a', '%e2%80%9b', '%e2%80%9e', '%e2%80%9f',
// copy, reg, deg, hellip and trade
'%c2%a9', '%c2%ae', '%c2%b0', '%e2%80%a6', '%e2%84%a2',
), '', $title );
// Convert times to x
$title = str_replace( '%c3%97', 'x', $title );
}
$title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
$title = preg_replace('/\s+/', '-', $title);
$title = preg_replace('|-+|', '-', $title);
$title = trim($title, '-');
return $title;
}
It seems to be replacing full stops with hyphens, then replacing multiple hyphens, then trimming hyphens from the end of the URL.
UPDATE
It doesn't seem that this is an issue with categories, I wonder why the page/post titles are sanitized to that level when categories aren't...