Drupal 7 RobotsTxt Module

By shane
Sat, 2013-03-09 22:58
Daily Dose of Drupal Episode #124

Share with Others

The Drupal 7 RobotsTxt module makes it easy to control your Drupal site's robots.txt file from the Drupal admin. This is especially useful for multisite Drupal installations in which a different robots.txt file is needed on each site.

In this module you will learn:

  • How to install and configure the robots.txt file on a Drupal multisite installation

Thanks to Drupalize.me for sponsoring this episode of the Daily Dose of Drupal.

Hello everyone and welcome to another Daily Dose of Drupal, today we’re on Episode Number 123. We’re going to be going overt the RobotsTxt Module. The RobotsTxt Module is useful if you’re running a multiple or multi-sites from a single Drupal core code based and basically if you need to have a different Robots.txt file for each site this is the module that you’re going to want to use.

If you’re not familiar with what a Robo.txt module is it’s basically a way for you to tell search engine such as Google, Bing and others not to index specific pages on your site. So essentially if you have some pages on your site you do not want the search engines to know about or to index and to be searchable and you can add those to your Robots.txt file and they will be then not indexed by those various search engines that follow the Robots.txt standard.

Before we get started; as always I’m Shane Thomas, you can follow me on Twitter at smthomas3, you can find me on Google + and you can also sign up for the codekarate.com newsletter. Today’s episode is sponsored by druplalize.me. Drupalize.me is one of the best ways to learn things about Drupal. They have tons of cool videos; you can simply click Browse, search through all their free videos and see if you can find ones that you like.

You can also then of course become a member and you can learn a whole bunch more. If you do decide to become a member use the coupon code CK20FEB, you get 20% off and you let them know that I sent them their way or I sent you their way actually. Let’s go ahead and get started; so I installed the Robots.txt module on two test sites I have here in a multi-site; test3.codekarate.com and test4.codekarate.com.

So as you can see it’s installed on both sites. One thing you need to do when you’re using this module is you need to come into your Drupal core installation and you need to either delete or rename this original Robots.txt file and this is so the web server doesn’t grab this or grab the actual Robots.txt and instead will go to your Drupal site and use or actually look up the information that you’re going to enter from the database.

So I’m just going to call this Robotsold.txt and now we’re going to simply come in and configure the Robots.txt on both sites. So as you can see it’s a pretty standard or this is the standard one that just comes out of the box with Drupal. Just to show that this one is going to take effect I’m going to change the crawl delay to 20 on this one, I’m going to save this.

This is on the test 3 site, on the test 4 site I’m going to come in and go ahead and say just as an example this allow / and this should disallow search engines from indexing the entire site. Now if I simply go to the Robot.txt page on both of these sites you can see the one on test 3, the Robots.txt page has the crawl delay still at 10 so I have to look at … well I think it updated but you can see it’s a normal Robots.txt file, however here all of the other disallow statements have been removed and it just disallow with a /, there we go, crawl delay has been updated now.

I just had to save it again and really that’s all there is to it. If you need to change your Robots.txt file on multiple sites inside a Drupal multi-site installation simply use the Robots.txt file or rename your original robots.txt file or remove it and then install this on every site in the multi-site and then you’ll have full control through the Drupal’s web UI or the Admin UI to control exactly how you want your Robts.txt file to function.

So if you have specific pages you can simply add them inside your Robots.txt file at the bottom pages you might want to disallow. Let’s say you had some page for some reason you don’t want Google to find or you don’t want Bing to find or index, you don’t want people to be able to search for it you simply add a disallow statement at the bottom, save it and it’ll only be applied to this site.

Of course if you come back and you would rename or drop in the old Robots.txt file that we renamed at the beginning then when you went to this page you would get that file instead of the configured robots.txt file that we’ve set up on the web UIs.

So it’s important to make sure you rename or remove that file. That’s all there is today, thanks for watching the Daily Dose of Drupal, thanks to Drupalize.me for sponsoring this episode and we’ll see you next time.