Recent Posts

Friday 1 March 2013

Privacy Issues When Using Robots.txt & The Robots Meta Tag?


Privacy Issues When Using Robots.txt & The Robots Meta Tag?

Understanding the difference between the robots.txt file and Robots <META> Tag is critical for search engine optimization and security. It can have a profound impact on the privacy of your website and customers as well. The first thing to know is what robots.txt files and Robots <META> Tags are.

Robots.txt

Robots.txt is a file you place in your website’s top level directory, the same folder in which a static homepage would go. Inside robots.txt, you can instruct search engines to not crawl content by disallowing file names or directories. There are two parts to a robots.txt directive, the user-agent and one or more disallow instructions.


The user-agent specifies one or all Web crawlers or spiders. When we think of Web crawlers we tend to think Google and Bing; however, a spider can come from anywhere, not just search engines, and there are many of them crawling the Internet.
Here is a simple robots.txt file telling all Web crawlers that it is okay to spider every page:

User-agent: *
Disallow:
To disallow all search engines from crawling an entire website, use:
User-agent: *
Disallow: /

The difference is the forward slash after Disallow:, signifying the root folder and everything in it, including sub-folders and files.

Is Robots.txt A Security Or Privacy Risk?

Using robots.txt to hide sensitive or private files is a security risk. Not only might search engines index disallowed files, it is like giving a treasure map to pirates.

Use Robots <META> Tag To Keep Files Out Of The Search Index

Because robots.txt does not exclude files from the search indexes, Google and Bing follow a protocol which does accomplish exactly that, the Robots <META> tag.

<html>
<head>
<title>...</title>
<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW">
</head>

The robots <META> tag provides two instructions:

  1. index or noindex
  2. follow or nofollow


Index or noindex instructs search engines whether or not to index a page. When you select index, they may or may not choose to include a webpage in the index. If you select noindex, the search engines will definitely not include it.

Follow or nofollow instructs Web crawlers whether or not to follow the links on a page. It is like adding an rel=”nofollow” tag to every link on a page. Nofollow evaporates PageRank, the raw search engine ranking authority passed from page to age via links. Even if you noindex a page, it is probably a bad idea to nofollow it. Let PageRank flow through to its final conclusion. Otherwise, you could be pouring perfectly good link juice down the drain.

When you want to exclude a page from the search engine indexes, do this:

<html>
<head>
<title>...</title>
<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW">
</head>

LET US CONNECT !!!

#1 Search Engine Marketing © 2014 | Distributed By Seo Company India | Designed By VikasKumarRaghav.com