0

I am managing a website that has only about 20-50 pages (articles, links and etc.). Somehow, Google indexed over 1000 links (duplicates, same page with different string in the URL). I found that those links contain ?date= in url. I already blocked by writing Disallow: *date* in robots.txt, made an XML map (which I did not had before) placed it into root folder and imported to Google Webmaster Tools. But the problem still stays: links are (and probably will be) in search results. I would easily remove URLs in GWT, but they can only remove one link at the time, and removing >1000 one by one is not an option.

The question: Is it possible to make dynamic 301 redirects from every page that contains $date= in url to the original one, and how? I am thinking that Google will re-index those pages, redirect to original ones, and delete those numerous pages from search results.

Example:

bad page: www.website.com/article?date=1961-11-1 and n same pages with different "date"

good page: www.website.com/article

automatically redirect all bad pages to good ones.

I have spent whole work day trying to solve this problem, would be nice to get some support. Thank you!

P.S. As far as I think this coding question is the right one to ask in stackoverflow, but if I am wrong (forgive me) redirect me to right place where I can ask this one.

4

1 回答 1

1

You're looking for the canonical link element, that's the way Google suggests to solve this problem (here's the Webmasters help page about it), and it's used by most if not all search engines. When you place an element like

<link rel='canonical' href='http://www.website.com/article'>

in the header of the page, the URI in the href attribute will be considered the 'canonical' version of the page, the one to be indexed and so on.

For the record: if the duplicate content is not a html page (say, it's a dynamically generated image), and supposing you're using Apache, you can use .htaccess to redirect to the canonical version. Unfortunately the Redirect and RedirectMatch directives don't handle query strings (they're strictly for URIs), but you could use mod_rewrite to strip parts of the query string. See, for example, this answer for a way to do it.

于 2013-01-23T01:23:07.977 回答