What is double content?

Maxi Maxhuni

Maxi

doppelt

Duplicate content on the web

Duplicate content can negatively impact your website's ranking in search engine results. When two URLs contain the same information, search engines don't know which one should appear higher in the results list and may lower the priority of both URLs. As a result, other websites will take precedence over yours, which can lead to reduced visibility for your own site(s).

Advertising can also lead to duplicate content, for example, if an ad containing the same content is published on multiple websites. It's important to ensure these ads are marked with a canonical tag to prevent them from being treated as duplicate content and negatively impacting your website's search engine ranking.

In this article, we focus on the technical causes of duplicate content in search engines and their solutions. If you'd like a deeper understanding of duplicate content and how it relates to plagiarized or scraped content as well as keyword cannibalization, take a look at our article:

To make this more tangible, let's use an example:

Example of duplicate content:

Double contents can be compared to a crossroads where two signs indicate the same destination but point in different directions – which way should you go?

To complicate matters further, although the end goal is similar, there are slight differences between the two approaches. This might not be important to you as a reader, but search engines have to decide which page to display in the search results, as it would be inefficient to show duplicate content.

listen

 

What if your article is about, for example, «maple tree» is available under two different URLs, e.g.:

http://www.beispiel.ch/ahornbaum

and

http://www.beispiel.ch/artikel-kategorie/ahornbaum

This is not an uncommon occurrence; in fact, it's quite frequent in modern content management systems (CMS). Now imagine that bloggers have picked up on the article, with some linking to the first URL and others linking back to the second URL?

This is the most annoying aspect of search engine optimization:
Duplicate content hinders your SEO efforts when multiple URLs are promoted through these links, thus reducing your chances of ranking in the top search results when someone enters the keyword "maple tree".

However, if all your links point to one URL, these concerns are eliminated and higher rankings are much easier to achieve.

At what point does content become duplicate content?

advertising

It's important to know if Google started with the same type of content. If your website is in multiple languages, it won't duplicate any content. Even quoted text that you copied verbatim is NOT considered duplicate content, as long as you've provided it semantically correctly as source code.

If the metadata is identical, the search engine accesses it twice. This approach is also logical—each item must have its own summary in terms of title and description.

Why should you avoid duplicate content?

chaos

Duplicate content can seriously harm your ranking and put you at risk of all duplicate pages being ranked lower. Unfortunately, search engines are unable to decide which page to show users when multiple options exist. This is the best-case scenario when it comes to duplicate content—it could be even worse!

Duplicating content can have a disastrous impact on your website, and if there's a blatant problem with thin or literally copied content, it could lead to manual action from Google. Therefore, you need to ensure that every page offers high-quality and unique content if you want to succeed in search engine rankings.

Duplicate content isn't just a problem for search engines; it can also be incredibly frustrating for your users. When they browse your website, they expect to find the page or information they're looking for, and anything that hinders this experience could cause them to abandon your site.

As with many other aspects of SEO optimization, fixing duplicate content issues is therefore crucial not only for better search engine rankings, but also for improved user experience!

Causes of Duplicate Content

note

Duplicate content is rarely the result of a conscious decision. Rather, it is usually due to glitches in the technical implementation or an unintentional error during cloning, which leads to exact copies of posts appearing on multiple websites without any indication of which version is the original.

This can be difficult for those striving to create unique and creative online content – ​​after all, nobody wants their hard work to be duplicated!

Many technical glitches occur because developers don't think like a browser or a user, let alone a search engine spider.

Google's measures against duplicate content

Google

Google will not take action unless there is concrete evidence that the duplication is intentional deception. If a website violates this rule, it may be removed from Google search results or have its URL de-indexed. However, Google will impose a penalty.

How can we determine if multiple URLs exist?
It's quite simple. The search engine uses a special algorithm to detect identical content across the entire website. For example, each book is broken down into its constituent parts, which can consist of individual words from the book title. This makes it quick and easy to find copied material.

What is internal duplicate content?

Duplicate content within a domain or on other websites and applications is problematic, especially for e-commerce companies. It's safe to say that both online stores and content management systems (CMS) are particularly vulnerable to this type of duplicate content.

Google is good at prioritizing URLs and easily removing duplicate internal content, meaning that this type of repeated material is usually not a major problem. It all comes down to presentation.

What is external duplicate content?

External duplication occurs when similar data appears within the same domain. This type of copying of content on websites can be more concerning for Google than external duplication across domains or subdomains, such as when identical information exists on different TLDs (top-level domains).

Misunderstanding of the concept of a URL

For developers, the unique identifier of an article is the corresponding ID in the database, while search engines recognize URLs as a unique marker of content.

Duplicate Content: Session IDs

Would you like to keep track of your website visitors and give them the option to save products they want to buy in an online shopping cart?

For this to work, it's important that all visitors have a session. Sessions are short recordings that show what users have done on your website, such as adding products to their shopping cart.

To track an individual's path from page to page, we need to store a unique identifier for each session, the so-called "session ID." The most effective way to do this is by using cookies. Unfortunately, search engines generally do not store cookie data.

In this case, some systems fall back on using session IDs in the URL. This means that every internal link on the website adds this session ID to its URL, and because this session ID is unique for that session, a new URL is created, resulting in duplicate content.

What do scrapers do?

The more popular your website becomes, the more scrapers can try to copy and use your content without permission. This leads to a duplicate issue, as search engines can't distinguish which version of the article was published first or even if they're linked to you at all!

The presence of duplicate content shows how popular your content has become, but also increases the difficulty of managing these duplicates over time.

Print-friendly pages

It's important to decide which version of your page you want to show Google. Your content management system can create print-friendly versions, but if these aren't locked, the search engine will find them alongside the pages with additional ads and materials. While both deliver similar results from a user's perspective, from an SEO standpoint, there's a significant difference.

www or no www

A classic, but still widespread problem: duplicate WWW and non-WWW content when both versions of your page are available.

Another unusual phenomenon is the switch from HTTP to HTTPS when the same content is accessible via both protocols. Search engines struggle with this issue, so it's important to take steps to resolve it.

Conceptual solution: a "canonical" URL

The solution to the problem of multiple URLs leading to corresponding content is simple:

Anyone working at a publication can specify which URL should be the "correct" one for a particular article. However, if you ask three people from the same company, sometimes each will give a different answer.

This is a problem that needs to be solved, as no more than one URL can be considered the "correct" choice for a given piece of content. Search engines refer to this as the "canonical" URL.

Identifying Duplicate Content Problems

optimize

Do you know if your website or content has a problem with duplicate content?

This is easy to identify using Google. Furthermore, there are several search operators that can be helpful in such cases. To find all URLs on your site that contain the phrase "X articles," simply enter this query into Google:

site:example.com intitle:»Keyword X»

With Google, you can easily display all pages that match your search term. example.com to contain. To more easily exclude duplicate content, you should include the section «intitle» Tailor your search query as much as possible.

The same method can also be used to identify recurring material on the internet; for example, if the full title of your article is...
If the keyword X – why it is beautiful – is, simply enter:

intitle:Keyword X – and why it is beautiful

To track down websites that have copied your content, search Google for the title of your article. You should even include one or two full sentences from the article, as some scrapers can modify the titles. In some cases, such searches will display a warning message on the last page with the following results:

Google seems to be filtering out duplicate results, but the quality is still not up to date. Therefore, it's worth taking a look at the other results that appear when you click the link and identifying which ones could be improved.

Once you have located the copied content, you can either ask the original author to remove the article or contact the website's webmaster and ask them to remove the copied content. If this doesn't work, there are other options to ensure that the copied content is removed from your website.

Practical solution for duplicate content

Once you've determined the final URL for your content, it's time to initiate a canonization process. This means that search engines need to recognize which page is authoritative and be able to find it quickly. The following sequence offers four solutions:

 

  • To avoid creating duplicate content, you should ensure that you redirect all duplicate pages to the specified canonical URL.

 

  • Additionally, include a canonical link element on these pages to direct viewers to the preferred page, and include an HTML link from the duplicate page that is displayed first – this will guide them in the right direction and prevent confusion when navigating your website.

Avoiding duplicate content

Double content

 

Fortunately, some of the causes of duplicate content are easily avoidable.

  • Want to get rid of session IDs in your URLs? You can quickly and easily disable them in the system settings.

 

  • Are you using double-sided printer-friendly pages? If so, they're completely unnecessary! Opt for a print template instead to optimize your printing needs.

 

  • Are you using comment pagination in WordPress? It is strongly recommended that this feature (under Settings > Discussion) be disabled on almost all websites.

 

  • If the order of the parameters is not consistent, you should instruct your programmer to write a script that keeps the parameters in a specific order (this type of URL encoding is often referred to as a URL factory).

 

  • Are you having trouble tracking links? Using hashtag-based campaign tracking instead of parameter-based campaign tracking is usually the solution.

 

  • Are you having trouble with the WWW or the non-WWW? Make a decision and stick to it by redirecting one to the other. You can even set your preference in Google Webmaster Tools, but remember that you must first authenticate both versions of your domain name!

 

  • Even though it proves somewhat difficult, it's worth the effort to find ways to prevent duplicate content. Doing this from the start is the best way to try and solve this problem—and potentially any other as well!

Our blog post outlines all the important steps for planning your content!

Link to the original content

away

If you are unable to take any of the above measures because you may be unable to do so, Since you cannot control the section of the website where your content appears, it is always a good idea to add a link to the original article at the top or bottom of the article.

It might be helpful to do this in your RSS feed as well, by adding a link to the content. Some scrapers will remove this link, but others might leave it. If Google encounters multiple links pointing to your original article, it will soon recognize this as the actual canonical version of your content, thus eliminating duplicate content.

When is duplicate content harmful?

Google acknowledges that in rare cases, content may be malicious or copied with the objective purpose of manipulating or deceiving search engine rankings. Google lists dozens of examples of non-malicious material.

However, if the copying of internet content has been carried out using manipulation techniques (so-called scrapping – or search engine spam – or similar methods), the affected pages have the potential to be removed from Google's search results.

It is important that the content on your website is unique and authentic to ensure that your site is not wrongly penalized for fraudulent or manipulated content.

 

In summary It's safe to say that duplicate content is a problem that affects virtually every website with more than 1000 pages – but it is possible to solve this problem. If you take the time to remove duplicate content from your website, the results can be absolutely exceptional!

Your hard-earned, high-quality content will climb the rankings and generate even more attention for your business. Don't wait any longer – take action today and watch your website grow rapidly by removing duplicate content!

Learn more about your content and how to optimize it! Our SEO experts are happy to advise you in a free 30-minute call!

Book an appointment now and increase your visibility today!

Mik Group Team

Written by:

Maxi Maxhuni

Maxi Maxhuni

Maxi is an expert in digital marketing and SEO with a special focus on sustainable customer acquisition strategies. With years of experience...

Similar articles:

Request free SEO consultation

Enter your details and we will contact you 📅

    Increase your traffic!

    Analyze your website now ➜

    Switzerland Flag