2

A website has linked to my site incorrectly, adding a full stop at the end of the link:

http://www.example.com/hello-world.

I would have expected this to go to a 404 page, instead, it loads the correct page, without redirecting to the valid URL.

This will be creating duplicate content issues in search engines.

Looking at a few other Wordpress site, it seems that it's a common issue, if you enter any number of full stops or hyphens (just the couple I came across), the page loads the correct content:

http://www.example.com/hello-------world.......
http://www.example.com/hello-....world-----

Has anyone else come across this issue and found a solution?

I could setup a redirect from the linked URL to the correct URL, but ideally I'd like to find a solution so this won't occur in the future.

UPDATE

I've found that the issue seems to be due to the sanitize_title_with_dashes function in /wp-includes/formatting.php (line 954):

function sanitize_title_with_dashes($title, $raw_title = '', $context = 'display') {

    echo "1: " . $title . "<br />";

    $title = strip_tags($title);
    // Preserve escaped octets.
    $title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
    // Remove percent signs that are not part of an octet.
    $title = str_replace('%', '', $title);
    // Restore octets.
    $title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);

    if (seems_utf8($title)) {
        if (function_exists('mb_strtolower')) {
            $title = mb_strtolower($title, 'UTF-8');
        }
        $title = utf8_uri_encode($title, 200);
    }

    $title = strtolower($title);
    $title = preg_replace('/&.+?;/', '', $title); // kill entities
    $title = str_replace('.', '-', $title);

    echo "2: " . $title . "<br />";

    if ( 'save' == $context ) {
        // Convert nbsp, ndash and mdash to hyphens
        $title = str_replace( array( '%c2%a0', '%e2%80%93', '%e2%80%94' ), '-', $title );

        // Strip these characters entirely
        $title = str_replace( array(
            // iexcl and iquest
            '%c2%a1', '%c2%bf',
            // angle quotes
            '%c2%ab', '%c2%bb', '%e2%80%b9', '%e2%80%ba',
            // curly quotes
            '%e2%80%98', '%e2%80%99', '%e2%80%9c', '%e2%80%9d',
            '%e2%80%9a', '%e2%80%9b', '%e2%80%9e', '%e2%80%9f',
            // copy, reg, deg, hellip and trade
            '%c2%a9', '%c2%ae', '%c2%b0', '%e2%80%a6', '%e2%84%a2',
        ), '', $title );

        // Convert times to x
        $title = str_replace( '%c3%97', 'x', $title );
    }

    $title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
    $title = preg_replace('/\s+/', '-', $title);
    $title = preg_replace('|-+|', '-', $title);
    $title = trim($title, '-');

    return $title;
}

It seems to be replacing full stops with hyphens, then replacing multiple hyphens, then trimming hyphens from the end of the URL.

UPDATE

It doesn't seem that this is an issue with categories, I wonder why the page/post titles are sanitized to that level when categories aren't...

4

2 回答 2

0

那是因为url重写被重写的方式。

您可能想弄乱它来解决问题。它可以.htaccess在网站文档根目录的文件中找到。

于 2012-11-13T16:47:14.537 回答
0

除非您或其他人故意链接到它们,否则搜索引擎不会索引任何这些链接。基本上,我不会担心太多。

原因:

在接收 URL slug 时,WordPress 可能会清理变量并去除不需要的字符。我怀疑它与实际的 .htaccess 文件有什么关系。

于 2012-11-13T16:51:47.963 回答