If you’re creating a news aggregator website or curating content in other ways, duplicate content issues are probably at the front of your mind.
It’s good to think about duplicate content issues – it is something you need to pay attention to. However, it’s not something that needs to scare you because there is no such thing as a Google duplicate content penalty (you don’t have to believe us – we’ll let Google explain for us!).
As long as you follow some basic best practices, you can safely curate content with no risk to your site or your site’s SEO.
In this post, we’ll dig into the myth of the Google duplicate content penalty on SEO. Then, we’ll share some actionable tips that you can implement with WP RSS Aggregator to safely curate and aggregate content on your WordPress website.
Is There a Google Duplicate Content Penalty?
Let’s start with the elephant in the room:
Does Google penalize duplicated content?
The answer is no (with one caveat, explained below).
However, there’s a lot of misinformation in the digital marketing sphere when it comes to content duplication, so let’s turn to a few choice quotes from Google (or Google search team members).
First, there’s the official Google help doc on duplicate content, which states the following:
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin.
In a 2014 live session, John Mueller of Google stated that:
We don’t have a duplicate content penalty. It’s not that we would demote a site for having a lot of duplicate content.
And in 2017, Gary Illyes, a Webmaster Trends Analyst, posted on Twitter that:
Google doesn’t have a duplicate content penalty.
Over the years, Google has been pretty clear and vocal that your site will not be penalized for the mere fact of having content that’s the same.
And this makes sense, when you think about it. Duplicate content is common across a wide range of online publishing, for entirely legitimate reasons. News syndication is the example that immediately springs to mind, where the same original source might be quoted at length across different websites.
But perhaps more widespread is the example of eCommerce sites – it’s not at all unlikely that many different websites will sell a few of the same products, and it’s likely not practical for website owners to come up with new content describing the product’s attributes simply so their own website can have a different version of the product description for Google’s benefit.
Product pages, therefore, will frequently have simply copied content directly from the manufacturer, resulting in a lot of identical content across different sites. It would make online retail extremely complicated if every single site had to give each product slightly different descriptions to avoid a search engine’s duplicate content penalty.
However, just because Google won’t penalize you for having duplicate content, that doesn’t mean Google will rank word-for-word duplicate content at the top of its search results. In other words, it can affect your site’s SEO.
So if you’re just jacking someone’s content, and you’re just taking it word for word, and you’re not providing any other value, you’re not going to get penalized, you just won’t rank that high for that article.
So let’s sum it up so that you can understand what conclusions you should draw:
- If you’re curating content on your site, you don’t need to worry about your overall site getting penalized because the duplicate content penalty is a myth.
- However, if you want your curated content to do well in Google’s search engine rankings, you’ll probably need to add some value because Google won’t rank different pages above the original source if you have a word-for-word copy.
Again, it’s important to note that you’re not getting penalized in the second scenario – Google would just rather display the original page if you’re not adding any of your own value. But that doesn’t mean Google’s algorithm will penalize the non-duplicate content on your site.
Google Will Penalize Deceptive Manipulation, Though
So what is the one caveat? Google might penalize your site if it thinks you’re specifically duplicating content to manipulate Google Technical SEO rankings, which is true for any optimization actions specifically intended to manipulate Google SEO results.
This isn’t really a duplicate content penalty – it’s a manipulating Google penalty.
Here’s the text straight from Google:
Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results.
If you follow the best practices in this post, though, you will stay squarely on Google’s good side and have nothing to worry about.
How to Optimize WP RSS Aggregator to Avoid Duplicate Content Problems
To finish things out, let’s look at some actionable tactics that you can implement using the WP RSS Aggregator plugin to handle identical content in an optimal way.
For these tips, we’ll assume you’re using the Feed to Post add-on to import feed items as actual WordPress posts.
1. Use the rel=canonical Setting
As you learned earlier, Google understands that there are plenty of valid reasons for duplicate content to exist. If you’re transparent about your content and where it came from, you won’t run into any issues like plagiarism.
To help add this transparency, Google (and other search engines) use the canonical tag rel=canonical.
Essentially, rel=canonical lets you credit the original source so that search engines know that you’re curating content (as opposed to stealing it). It’s a small code snippet that goes in the <head> section of your site. This means that human visitors won’t see it, but search engines and other bots will be able to detect it.
To enable rel=canonical for feed items in WP RSS Aggregator, make sure to check the Canonical Link box in the Feed to Post – General settings:
When this setting is enabled, WP RSS Aggregator will automatically add a canonical URL structure to the original feed item in the <head> section of your site.
2. Add a Link to the Original Source
With rel=canonical, you add a behind-the-scenes credit to the original source.
You’ll also want to go one step further and credit the source on the front-end, too.
Again, beyond the fact that it’s just generally good to credit the creator of content that you’ve curated, this also shows that you’re not trying to pass the content off as your own.
With WP RSS Aggregator, you can automatically add a credit either before or after the curated content using the Feed to Post – Prepend to Content or Feed to Post – Append to Content settings, respectively.
In the settings box, you can use the URL placeholders to dynamically insert the original post URL, source, title, etc. This way, you won’t have to worry about having different URLs.
3. Use Word Trimming to Not Show the Full Text
Another strategy to avoid issues is to limit the amount of content that you curate.
That is, rather than curating the full text on your site, you can only include an excerpt and then direct readers to the full post using the credit link.
To set this up, you can use WP RSS Aggregator’s Word Trimming feature.
You can control word trimming on a feed-by-feed basis using the Feed to Post – Word Trimming box in the sidebar:
4. Use the
noindex tag to flag web pages you don’t want to be indexed
This applies more to internal links within your own site, but can also be used where you have an agreement with an original source for your syndicated articles that they should have priority in search engines.
You’re chiefly likely to want to use this when you have duplicate pages on your own site, such as the standard web version and the print version of an article stored on separate pages. If you don’t want Google searches to turn up the print version by accident, you need the
You can implement
noindex in two different ways: as a meta tag, or as an HTTP response header. Both have the same effect, so the decision is up to you which works best for your site.
noindex as a meta tag
You can prevent most search engine crawlers from indexing any page on your site by placing the following meta tag into the header area of your site’s HTML, i.e.
<meta name="robots" content="noindex">
You can also prevent specific web crawlers from indexing your page; for example, like this for Google:
<meta name="googlebot" content="noindex">
This will prevent Google indexing your page but let Bing etc. index it as usual.
noindexas an HTTP response header
You can alternatively return an X-Robots-Tag header with a value of either
none in your response, e.g.
HTTP/1.1 200 OK
5. Save as a Draft and Make Some Manual Modifications (Optional)
This last tip is optional because it requires some manual effort. But another beneficial tactic is to add some of your own unique content or commentary to each post that you curate.
By adding this value, you can increase the chance that Google actually ranks the content that you curate in its SERPs.
If you want to add your own unique content, you should make sure to save new feed items as drafts, rather than importing them right away. You can control this using the Post Status setting in the Feed to Post – General settings:
Once you have a piece of content as a draft, here are two ways to enhance it.
First, you can change the title of the imported post. This will instantly help differentiate your post because it will no longer have the exact same title as the original source.
Second, you can add some original content or internal links before or after the content that you’ve curated. It doesn’t have to be long – even just a few sentences setting up the article or summing up key points can make a difference.
When you’re finished, you can publish the post to make it live on your own site.
You Don’t Need to Worry About Duplicate Content
Unless you’re specifically trying to use duplicate content to manipulate Google rankings, you don’t need to worry about duplicate text when you’re aggregating content.
Over the years, Google has made it abundantly clear that there’s no such thing as a duplicate content penalty.
Google fully understands that there are a wide variety of reasons why similar content might exist. Going back to Neil Patel’s video, he says this:
Here’s the thing with duplicate content – they don’t care so much about duplicate content, they more so care about the user experience.
In the end, that’s a good way to think about duplicate content.
As long as you credit the original source and follow the other best practices in this post, you don’t need to fear anything from duplicate web pages and posts affecting SEO badly.
However, if you want to rank your duplicate content in Google, you will want to focus on creating value in some way, whether that’s adding your own unique content or doing something else.
Not only does WP RSS Aggregator make it easy to import RSS feeds as WordPress posts, but it also includes all the features that you need to avoid issues with duplicate content. Purchase the Feed to Post add-on to get started today!