Friday, 1 March 2013

Privacy Issues When Using Robots.txt & The Robots Meta Tag?

Understanding the difference between the robots.txt file and Robots <META> Tag is critical for search engine optimization and security. It can have a profound impact on the privacy of your website and customers as well. The first thing to know is what robots.txt files and Robots <META> Tags are.

Robots.txt

Robots.txt is a file you place in your website’s top level directory, the same folder in which a static homepage would go. Inside robots.txt, you can instruct search engines to not crawl content by disallowing file names or directories. There are two parts to a robots.txt directive, the user-agent and one or more disallow instructions.

The user-agent specifies one or all Web crawlers or spiders. When we think of Web crawlers we tend to think Google and Bing; however, a spider can come from anywhere, not just search engines, and there are many of them crawling the Internet.

Here is a simple robots.txt file telling all Web crawlers that it is okay to spider every page:

User-agent: *

Disallow:

To disallow all search engines from crawling an entire website, use:

User-agent: *

Disallow: /

The difference is the forward slash after Disallow:, signifying the root folder and everything in it, including sub-folders and files.

Is Robots.txt A Security Or Privacy Risk?

Using robots.txt to hide sensitive or private files is a security risk. Not only might search engines index disallowed files, it is like giving a treasure map to pirates.

Use Robots <META> Tag To Keep Files Out Of The Search Index

Because robots.txt does not exclude files from the search indexes, Google and Bing follow a protocol which does accomplish exactly that, the Robots <META> tag.

<html>

<head>

</head>

The robots <META> tag provides two instructions:

index or noindex
follow or nofollow

Index or noindex instructs search engines whether or not to index a page. When you select index, they may or may not choose to include a webpage in the index. If you select noindex, the search engines will definitely not include it.

Follow or nofollow instructs Web crawlers whether or not to follow the links on a page. It is like adding an rel=”nofollow” tag to every link on a page. Nofollow evaporates PageRank, the raw search engine ranking authority passed from page to age via links. Even if you noindex a page, it is probably a bad idea to nofollow it. Let PageRank flow through to its final conclusion. Otherwise, you could be pouring perfectly good link juice down the drain.

When you want to exclude a page from the search engine indexes, do this:

<html>

<head>

</head>

Design by Vikas Kumar Raghav | Seo Company India - Seo Agency India -Seo Blogs- Seo Training Institute

Posted by: Unknown

on: 04:56

in: Privacy Issues When Using Robots.txt and The Robots Meta Tag?

About Gyan Infotech

Gyan Infotech is Ranked No. 1 Digital Marketing Agency and Google Certified Partner which started out in Noida, India with a small staff in 2010.

Five years down the line, our team now numbers over 100+ members and the company has shown 5 times growth year on year. We offer outsourcing solutions which effectively answer your business needs, freeing up your resources for core functions.

At Gyan Infotech, we are committed to meeting your process specifications and quality expectations. Our multi-functional teams take pride in doing the job right for you the first time.

#1 Search Engine Marketing

Recent Posts

Friday, 1 March 2013

Privacy Issues When Using Robots.txt & The Robots Meta Tag?

Robots.txt

Popular Posts

Total Pageviews

About Me

ADVERTISMENT

LET US CONNECT !!!

About Gyan Infotech

Popular Posts

Contact Form