What is Robots.txt?

Robots.txt is a file that instructs search engine spiders to avoid certain pages or sections of a website. Website not to crawl.

Most major search engines (including Google, Bing, and Yahoo) recognize and take into account requests from robots.txt.

What is Robots.txt

Why is Robots.txt important?

Most websites do not require a robots.txt file.

This is because Google's "robots" can usually find and index all the important pages of your website.

They do NOT automatically index pages that are not important or duplicate versions of other pages.

However, there are three main reasons why you might want to use a robots.txt file.

Block non-public pages: Sometimes you have pages on your website that you do not want to be indexed.

For example, you might have a staging version of a page.

Or a login page.

These pages must exist.

But you don't want random people landing on them.

In this case, you would use robots.txt to block these pages from search engine crawlers and bots.

Maximize crawl budgetIf you are finding it difficult to index all your pages, you may have a problem with your crawler budget.

By blocking unimportant pages with «robots.txt», Googlebot can spend more of your crawl budget on the pages that are actually important.

Prevent resource indexing: Using meta directives can work just as well as robots.txt to prevent pages from being indexed.

However, meta directives do not work well for multimedia resources such as PDFs and images.

This is where Robots.txt comes into play.

The bottom line?

Robots.txt instructs search engine crawlers not to crawl certain pages of your website.

You can in the Google Search Console Check how many pages you have indexed.

The bottom line

If the number matches the number of pages you want to index, you don't need to worry about a robots.txt file.

However, if this number is higher than expected (and you notice that indexed URLs should not be indexed), you will need to create a robots.txt file for your website.

Robots.txt Tips & Tricks

Create a robots.txt file.

Your first step is to create your robots.txt file.

Since it is a text file, you can actually create a file using Windows Notepad.

And no matter how you ultimately create your robots.txt file, the format is exactly the same:

User-agent: X

Disallow: Y

User-Agent is the bot you are currently talking to.

Everything that appears after "Do not allow" are pages or sections that you want to block.

Here is an example:

User-agent: googlebot

Disallow: /images

This rule instructs Googlebot not to index the image folder of your website.

You can also use an asterisk (*) to speak to all bots that are on your website.

Here is an example:

User-agent: *

Disallow: /images

The asterisk (*) instructs all spiders NOT to search your image folder.

This is just one of many ways to use a robots.txt file.

In this helpful guide from Google Find more information about the different rules you can use to prevent bots from crawling different pages of your website.

Here is an example

Check for errors and mistakes.

It is REALLY important that your file is set up correctly.

An error and Your entire website could be deindexed..

Fortunately, you don't have to hope that your code is set up correctly.

Google has a sleek Robot Test Tool, which you can use:

Check for errors and mistakes.

It shows you your file… and all errors and warnings that are found.

As you can see, we block spiders from crawling our WP-Admin page.

We also use robots.txt to prevent the crawling of automatically generated tags from Wordpress to block (to limit duplicate content).

Make your robots.txt file easy to find.

Once you have created your file, you can bring it to life.

You can technically place your file in any main directory of your site.

To increase the likelihood of your file being found, we recommend placing it in the following location:

https://beispiel.ch/robots.txt

(Please note that your file is case-sensitive. Make sure you use a lowercase "r" in the file name.)

Robots.txt vs. Meta Guidelines

Why should you use robots.txt if you can manage pages at the page level with the meta tag «noindex"Can block?"

As mentioned previously, implementing the noindex tag in multimedia resources such as videos and PDFs is difficult.

If you want to block thousands of pages, it is sometimes easier to block the entire area of ​​that page with robots.txt, rather than manually adding a noindex tag to each individual page.

There are also edge cases where you don't want to allocate a crawler budget to Google when landing on pages with the noindex tag.

That said:

Outside of these three edge cases, we recommend using meta directives instead of robots.txt.

They are easier to implement.

And the risk of a catastrophe is lower (e.g., blocking the entire page).

Subscribe to Newsletter

Subscribe today so you don't miss any of the latest posts!

    These companies trust us
    Nau Media Logo
    Novartis logo
    Hansplast logo
    Philips logo

    Customer Reviews

    Google Reviews
    5 / 5

    Increase your traffic!

    Analyze your website now ➜

    Switzerland Flag