Use nofollow on internal links to unwanted facets
Posted: Mon Jan 27, 2025 3:32 am
Using the noindex tag
The second suggestion is to set a noindex directive for any parameter-based page , which does not add any SEO value. This tag can be easily implemented and prevents search engines from indexing the page, thus avoiding duplicate content issues.
However, this method also has its cons:
It is not interpreted as a directive, but as a suggestion.
It does not prevent search engines from crawling URLs.
It does not prevent search engines from wasting resources when crawling.
It does not consolidate the pagerank of the pages.
One way to solve the problem of wasted crawl resources is to mark all internal links to unwanted facets with rel=nofollow . For example, with this markup we could prevent Google from visiting any page with two or more filters checked and transferring PageRank. Be careful though, those links are inserted elsewhere (without no follow) or inside the sitemap.xml, that URL will be crawled and indexed.
Unfortunately, however, " nofollow " does not completely solve the problem:
It does not prevent search engines from indexing duplicate content.
If the URL is linked from elsewhere (e.g. in sitemap.xml) it will still be some of the important considerations crawled and indexed.
To learn more, read the post: Dofollow vs nofollow links: what they are and when to use them
Using robots.txt
The robots.txt file helps search engines understand if there are areas of the site that you want to prevent or allow crawling. You can block crawler access to some (or all) URLs based on parameters , to allow more efficient use of the crawl budget and avoid the emergence of duplicate content.
This solution, of course, has some drawbacks:
It is not interpreted as a directive, but as a suggestion.
Even if bots “respected” the instruction not to crawl specific pages, they could still be indexed, without the description.
Does not allow reading of tags (e.g. canonical and noindex) applied to the page.
Here's what John Mueller said in a tweet about it:
The second suggestion is to set a noindex directive for any parameter-based page , which does not add any SEO value. This tag can be easily implemented and prevents search engines from indexing the page, thus avoiding duplicate content issues.
However, this method also has its cons:
It is not interpreted as a directive, but as a suggestion.
It does not prevent search engines from crawling URLs.
It does not prevent search engines from wasting resources when crawling.
It does not consolidate the pagerank of the pages.
One way to solve the problem of wasted crawl resources is to mark all internal links to unwanted facets with rel=nofollow . For example, with this markup we could prevent Google from visiting any page with two or more filters checked and transferring PageRank. Be careful though, those links are inserted elsewhere (without no follow) or inside the sitemap.xml, that URL will be crawled and indexed.
Unfortunately, however, " nofollow " does not completely solve the problem:
It does not prevent search engines from indexing duplicate content.
If the URL is linked from elsewhere (e.g. in sitemap.xml) it will still be some of the important considerations crawled and indexed.
To learn more, read the post: Dofollow vs nofollow links: what they are and when to use them
Using robots.txt
The robots.txt file helps search engines understand if there are areas of the site that you want to prevent or allow crawling. You can block crawler access to some (or all) URLs based on parameters , to allow more efficient use of the crawl budget and avoid the emergence of duplicate content.
This solution, of course, has some drawbacks:
It is not interpreted as a directive, but as a suggestion.
Even if bots “respected” the instruction not to crawl specific pages, they could still be indexed, without the description.
Does not allow reading of tags (e.g. canonical and noindex) applied to the page.
Here's what John Mueller said in a tweet about it: