Subjects>Engineering & Technology>Computer Science

What are the disadvantages of web crawlers?

Anonymous

∙ 13y ago

Updated: 4/28/2022

Disadvantages of Web RobotsNetwork Performance

Robots traditionally have a bad press in discussions on bandwidth, even though the functions of some well-written and ethical robots are ultimately to conserve bandwidth.

There are points to consider on the bandwidth front, since robots can span relatively large portions of Web-space over short periods. Bottlenecks can arise locally though high bandwidth consumption, particularly if the robot is in frequent or permanent use, or if it is used during network peak times. The problem is exacerbated if the frequency of requests for resources is unregulated.

Server-side Concerns

So-called "rapid-fire" requests (successive HTTP requests to a single server without delays) have been shown to be very resource consuming for a server under current HTTP implementations (in fact, this is the basis of several "denial of service" attacks). Here again, an unregulated robot can cause problems. Suitable delays and an ethical traversal algorithm can help resolve this.

The skewing of server logs is another issue that causes concern. A robot that indexes an entire site will distort logs if a recognised "user-agent" is not supplied. These may be hard to distinguish from regular users.

Unethical Robots

A small number of rogue robots are in use. The tasks of these robots are such that they are particularly unwelcome by servers. Such tasks include email culling, for the production of large lists of email addresses that can be sold to advertisers and copyright violation through copying entire sites.

Additionally robots can contribute to a site "hit quota" and consume bandwidth which the site may pay for.

Wiki User

∙ 13y ago

What else can I help you with?

Continue Learning about Computer Science

What is a web crawler virus?

A web crawler virus is a type of malware that exploits web crawlers or bots to spread itself across the internet. It typically infiltrates websites by embedding malicious code, which can then be executed when users visit the infected site. This virus can lead to data theft, website defacement, and other harmful consequences. Unlike traditional viruses, it relies on web traffic and search engine algorithms to propagate rather than direct user interaction.

All funbrain passwords?

paint8=mighty guy luck8=water bug panda8=rolypoly rodeo purple8=creepy crawlers

What are the Disadvantages the disadvantages of modernism?

The price

What are the disadvantages of seminars and workshops.?

disadvantages of seminars and workshops

Disadvantages of online processing?

disadvantages of on line processing?

What is the part of the search engine responsible for collecting data on the web?

Web crawlers are charged with the responsibility to visiting webpages and reporting what they find to the search engines. Google has its own web crawlers (aka robots) and they call them Googlebots. Web crawlers have also been referred to as spiders, although I think this term is more commonly replaced with "bots".

What is the standard used by websites to communicate with web crawlers and other web robots?

The standard used by websites to communicate with web crawlers and other web robots is called the Robots Exclusion Protocol, often implemented through a file called robots.txt.

What is the standard used by websites to communicate with web crawlers and other web robots, such as search engine bots?

The standard used by websites to communicate with web crawlers and other web robots, such as search engine bots, is called the Robots Exclusion Protocol or robots.txt.

How many type of crawler?

There are primarily three types of crawlers: general-purpose crawlers, which index a wide range of web pages; focused crawlers, which target specific topics or types of content; and incremental crawlers, which revisit previously indexed pages to check for updates or changes. Each type serves different purposes in web indexing and data retrieval.

What web crawlers use PHP?

PHPCrawl, PHP Parallel Web Scraper I'm sure there are many others.

What search engine uses web crawlers?

The most well known are Google, Bing, and Yahoo.

What are crawlers and for what purpose are they used?

A crawler is a computer program with the purpose to visit web sites and do something with the information on it. Many crawlers crawl for search engines to index whatever page they visit. Such crawlers often return several times per day to check for updates. Another use is to gather information such as mail addresses or something that suits the owner. This kind of crawlers check all the links on the page and visit them after the information collection, and in this way never stopping but keep crawling all over (the public parts of) the Web.

What is a sitemap?

Google Sitemaps is an experiment in Web crawling by using Sitemaps to inform and direct Google search crawlers. Webmasters can place a Sitemap-formatted file on their Web server which enables Google crawlers to find out what pages are present and which have recently changed, and to crawl your site accordingly. Google Sitemaps is intended for all web site owners, from those with a single web page to companies with millions of ever-changing pages.

When was The Crawlers created?

The Crawlers was created in 1954.

What is a Google Sitemap?

What are disadvantages of web directories?

nuffin'

How does Lycos fetch submitted documents?

Lycos fetches submitted documents by sending out automated web crawlers, also known as spiders, to systematically browse and index content from publicly accessible web pages. These crawlers follow links from one page to another, collecting information to be stored in the search engine's database for retrieval in response to user queries.