If you’re creating a news aggregator website or curating content in other ways, duplicate content issues are probably at the front of your mind.
It’s good to think about duplicate content issues – it is something you need to pay attention to. However, it’s not something that needs to scare you because there is no such thing as a Google duplicate content penalty (you don’t have to believe us – we’ll let Google explain for us!).
In this post, we’ll dig into the myth of the Google duplicate content penalty. Then, we’ll share some actionable tips that you can implement with WP RSS Aggregator to safely curate and aggregate content on your WordPress website.
What Is a Duplicate Content Penalty?
A duplicate content penalty is an alleged penalty that Google and other search engines can impose on websites for having duplicate content.
The penalty is said to occur when these search engines penalize your website for having low-quality or spammy duplicate content. This may include content that appears on multiple domains on the internet in exactly the same way.
However, there is no concrete evidence that search engines like Google actually strike publishers with a duplicate content penalty. Some people believe that the penalty is simply a myth, while others think it duplicate content might matter for SEO.
Is There a Google Duplicate Content Penalty?
So, does Google penalize duplicated content?
The answer is no (with one caveat, explained below). Google has never officially said that they penalize websites for having duplicate content. However, many experts believe that Google does use duplicate content as a factor in its algorithm.
So, if you’re worried about getting penalized, your best bet is to avoid publishing duplicate content on your website. You can do this by either avoiding it entirely or using canonical tags, robots.txt files, and 301 redirects for duplicate content.
You also need to be careful when syndicating your content or copying and pasting it from other websites. For instance, instead of copying an entire article, you can publish just a brief summary with a link back to the original article.
However, there’s a lot of misinformation in the digital marketing sphere when it comes to content duplication, so let’s turn to a few choice quotes from Google (or Google search team members).
First, there’s the official Google help doc on duplicate content, which states the following:
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin.
In a 2014 live session, John Mueller of Google stated that:
We don’t have a duplicate content penalty. It’s not that we would demote a site for having a lot of duplicate content.
And in 2017, Gary Illyes, a Webmaster Trends Analyst, posted on Twitter that:
Google doesn’t have a duplicate content penalty.
Over the years, Google has been pretty clear and vocal that your site will not be penalized for the mere fact of having content that’s the same.
And this makes sense, when you think about it. Duplicate content is common across a wide range of online publishing, for entirely legitimate reasons. News syndication is the example that immediately springs to mind, where news sites repost an original news source or quote it at length across different websites.
But perhaps more widespread is the example of eCommerce sites – it’s not at all unlikely that many different websites will sell a few of the same products, and it’s likely not practical for website owners to come up with new content describing the product’s attributes simply so their own website can have a different version of the product description for Google’s benefit.
Product pages, therefore, will frequently have simply copied content directly from the manufacturer, resulting in a lot of identical content across different sites. It would make online retail extremely complicated if every single site had to give each product slightly different descriptions to avoid a search engine’s duplicate content penalty.
However, just because Google won’t penalize you for having duplicate content, that doesn’t mean Google will rank word-for-word duplicate content at the top of its search results.
In fact, Google has said that it “tries to filter out duplicate documents so that users experience less redundancy”.
In other words, it can affect your site’s SEO, which isn’t great if you’re using your RSS feed for SEO purposes.
In the words of content marketing expert Neil Patel:
So if you’re just jacking someone’s content, and you’re just taking it word for word, and you’re not providing any other value, you’re not going to get penalized, you just won’t rank that high for that article.
So let’s sum it up so that you can understand what conclusions you should draw:
- If you’re curating content on your site, you don’t need to worry about your overall site getting penalized because the duplicate content penalty is a myth.
- However, if you want your curated content to do well in Google’s search engine rankings, you’ll probably need to add some value because Google won’t rank different pages above the original source if you have a word-for-word copy.
Again, it’s important to note that you’re not getting penalized in the second scenario – Google would just rather display the original page if you’re not adding any of your own value. But that doesn’t mean Google’s algorithm will penalize the non-duplicate content on your site.
Google Will Penalize Deceptive Manipulation, Though
So what is the one caveat? Google might penalize your site if it thinks you’re specifically duplicating content to manipulate Google rankings, which is true for any optimization actions specifically intended to manipulate SERP results.
This isn’t really a duplicate content penalty – it’s a manipulating Google penalty.
Here’s the text straight from Google:
Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results.
If you follow the best practices in this post, though, you will stay squarely on Google’s good side and have nothing to worry about.
How to Optimize WP RSS Aggregator to Avoid a Duplicate Content Penalty
To finish things out, let’s look at some actionable tactics that you can implement using the WP RSS Aggregator plugin to handle identical content in an optimal way.
For these tips, we’ll assume you’re using the Feed to Post add-on to import feed items as actual WordPress posts.
1. Use the rel=canonical Setting
As you learned earlier, Google understands that there are plenty of valid reasons for duplicate content to exist. If you’re transparent about your content and where it came from, you won’t run into any issues like plagiarism.
To help add this transparency, Google (and other search engines) use the canonical tag rel=canonical.
Essentially, rel=canonical lets you credit the original source so that search engines know that you’re curating content (as opposed to stealing it). It’s a small code snippet that goes in the <head> section of your site. This means that human visitors won’t see it, but search engines and other bots will be able to detect it.
To enable rel=canonical for feed items in WP RSS Aggregator, make sure to check the Canonical Link box in the Feed to Post – General settings:
When this setting is enabled, WP RSS Aggregator will automatically add a canonical URL structure to the original feed item in the <head> section of your site.
2. Add a Link to the Original Source
With rel=canonical, you add a behind-the-scenes credit to the original source.
You’ll also want to go one step further and credit the source on the front-end, too.
Again, beyond the fact that it’s just generally good to credit the creator of content that you’ve curated, this also shows that you’re not trying to pass the content off as your own.
With WP RSS Aggregator, you can automatically add a credit either before or after the curated content using the Feed to Post – Prepend to Content or Feed to Post – Append to Content settings, respectively.
In the settings box, you can use the URL placeholders to dynamically insert the original post URL, source, title, etc. This way, you won’t have to worry about having different URLs.
3. Use Word Trimming to Not Show the Full Text
Another strategy to avoid issues is to limit the amount of content that you curate.
That is, rather than curating the full text on your site, you can only include an excerpt and then direct readers to the full post using the credit link.
To set this up, you can use WP RSS Aggregator’s Word Trimming feature.
You can control word trimming on a feed-by-feed basis using the Feed to Post – Word Trimming box in the sidebar:
4. Use the noindex
tag to flag web pages you don’t want to be indexed
This applies more to internal links within your own site, but can also be used where you have an agreement with an original source for your syndicated articles that they should have priority in search engines.
You’re chiefly likely to want to use this when you have duplicate pages on your own site, such as the standard web version and the print version of an article stored on separate pages. If you don’t want Google searches to turn up the print version by accident, you need the noindex
tag.
You can implement noindex
in two different ways: as a meta tag, or as an HTTP response header. Both have the same effect, so the decision is up to you which works best for your site.
Implementing noindex
as a meta tag
You can prevent most search engine crawlers from indexing any page on your site by placing the following meta tag into the header area of your site’s HTML, i.e.
<head>
<meta name="robots" content="noindex">
….
</head>
You can also prevent specific web crawlers from indexing your page; for example, like this for Google:
<meta name="googlebot" content="noindex">
This will prevent Google indexing your page but let Bing etc. index it as usual.
Implementingnoindex
as an HTTP response header
You can alternatively return an X-Robots-Tag header with a value of either noindex
or none
in your response, e.g.
HTTP/1.1 200 OK
(…)
X-Robots-Tag: noindex
(…)
5. Save as a Draft and Make Some Manual Modifications (Optional)
This last tip is optional because it requires some manual effort. But another beneficial tactic is to add some of your own unique content or commentary to each post that you curate.
By adding this value, you can increase the chance that Google actually ranks the content that you curate in its SERPs.
If you want to add your own unique content, you should make sure to save new feed items as drafts, rather than importing them right away. You can control this using the Post Status setting in the Feed to Post – General settings:
Once you have a piece of content as a draft, here are two ways to enhance it.
First, you can change the title of the imported post. This will instantly help differentiate your post because it will no longer have the exact same title as the original source.
Second, you can add some original content or internal links before or after the content that you’ve curated. It doesn’t have to be long – even just a few sentences setting up the article or summing up key points can make a difference.
When you’re finished, you can publish the post to make it live on your own site.
How Much Duplicate Content is Okay?
Even if you take all of these precautions, you still might be wondering exactly how much duplicate content is acceptable.
As a benchmark, Google’s Matt Cutts claims that 25-30% of the web’s content is duplicated. However, the key is in the context and intention behind the duplication. For instance, quoting, referencing, or repurposing content for educational, informative, or editorial purposes is generally seen as legitimate. Search engines are designed to understand this and typically do not penalize such practices.
However, when it comes to using tools like RSS aggregators, it’s important to tread carefully. These tools can be used ethically to duplicate content, but it should be done in a way that adds value to the original material. This could be through curation, commentary, or integrating it into a broader narrative. The goal is to use duplication not just for the sake of it, but to enhance, complement, and contextualize the content for your audience. This approach maintains the integrity of your site and aligns with the best practices for content management.
You Don’t Need to Worry About Duplicate Content Penalties
Unless you’re specifically trying to use duplicate content to manipulate Google rankings, you don’t need to worry about duplicate text when you’re aggregating content.
Over the years, Google has made it abundantly clear that there’s no such thing as a duplicate content penalty.
Google fully understands that there are a wide variety of reasons why similar content might exist. Going back to Neil Patel’s video, he says this:
Here’s the thing with duplicate content – they don’t care so much about duplicate content, they more so care about the user experience.
In the end, that’s a good way to think about duplicate content.
As long as you credit the original source and follow the other best practices in this post, you don’t need to fear anything from duplicate web pages and posts affecting your rankings badly.
However, if you want to rank your duplicate content in Google, you will want to focus on creating value in some way, whether that’s adding your own unique content or doing something else.
To help you curate content on WordPress while staying in Google SERPs’ good graces, you can use WP RSS Aggregator and the Feed to Post add-on.
Not only does WP RSS Aggregator make it easy to import RSS feeds as WordPress posts, but it also includes all the features that you need to avoid issues with duplicate content. Purchase the Feed to Post add-on to get started today!
6 Responses
How do I rank on Google as a news aggregator. Because my contents are all Curated from different sources. Please I would need your advice.
Hey Newscentric, there are plenty of ways to start ranking on search engines over time. It would very much depend on the type of website you’re building and what else you’re offering apart from curated news.
One idea I’d suggest right away is to build relationships with the websites you’re curating content from and show them the value that you’re providing by putting their content in front of a wider audience. Once you’ve done that, they may be more inclined to link back to you or refer more readers your way, which would help your SEO.
Hi.
How do I set feed to post as noindex by default?
Would it really be enough to have a cononical to get Google to understand that my intentions are good? I Can only find a old source about cross domain canonicals. You have any better source then this? https://developers.google.com/search/blog/2009/12/handling-legitimate-cross-domain
Kind regards
Adam
Hi Adam, yes, search engines will understand that your intentions are good so long as you’re always crediting and linking back to the original source.
In some cases, it’s best practice not to reproduce the entirety of the original content on your website, which is why Feed to Post has the options to limit the amount of content you show in the curated posts.
Please reach out to our support team for any technical questions and they’ll be happy to help. Use the “Support” link in the header menu above.
Thank you for your answer.
Probably an honest mistake, but you didn’t answer my first question.
Can I set the post (with feed-to-post) to “noindex” by default? If not, do you have a link to a solution?
That’s where support can guide you better 🙂
You can reach the team here: https://www.wprssaggregator.com/contact/