robot-id: abcdatos robot-name: ABCdatos BotLink robot-cover-url: http://www.abcdatos.com/ robot-details-url: http://www.abcdatos.com/botlink/ robot-owner-name: ABCdatos robot-owner-url: http://www.abcdatos.com/ robot-owner-email: botlink+AEA-abcdatos.com robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: windows robot-availability: none robot-exclusion: no robot-exclusion-useragent: BotLink robot-noindex: no robot-host: 217.126.39.167 robot-from: no robot-useragent: ABCdatos BotLink/1.0.2 (test links) robot-language: basic robot-description: This robot is used to verify availability of the ABCdatos directory entries (http://www.abcdatos.com), checking HTTP HEAD. Robot runs twice a week. Under HTTP 5xx error responses or unable to connect, it repeats verification some hours later, verifiying if that was a temporary situation. robot-history: This robot was developed by ABCdatos team to help working in the directory maintenance. robot-environment: commercial modified-date: Thu, 29 May 2003 01:00:00 GMT modified-by: ABCdatos robot-id: Acme.Spider robot-name: Acme.Spider robot-cover-url: http://www.acme.com/java/software/Acme.Spider.html robot-details-url: http://www.acme.com/java/software/Acme.Spider.html robot-owner-name: Jef Poskanzer - ACME Laboratories robot-owner-url: http://www.acme.com/ robot-owner-email: jef@acme.com robot-status: active robot-purpose: indexing maintenance statistics robot-type: standalone robot-platform: java robot-availability: source robot-exclusion: yes robot-exclusion-useragent: Due to a deficiency in Java it's not currently possible to set the User-Agent. robot-noindex: no robot-host: * robot-from: no robot-useragent: Due to a deficiency in Java it's not currently possible to set the User-Agent. robot-language: java robot-description: A Java utility class for writing your own robots. robot-history: robot-environment: modified-date: Wed, 04 Dec 1996 21:30:11 GMT modified-by: Jef Poskanzer robot-id: ahoythehomepagefinder robot-name: Ahoy! The Homepage Finder robot-cover-url: http://www.cs.washington.edu/research/ahoy/ robot-details-url: http://www.cs.washington.edu/research/ahoy/doc/home.html robot-owner-name: Marc Langheinrich robot-owner-url: http://www.cs.washington.edu/homes/marclang robot-owner-email: marclang@cs.washington.edu robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: UNIX robot-availability: none robot-exclusion: yes robot-exclusion-useragent: ahoy robot-noindex: no robot-host: cs.washington.edu robot-from: no robot-useragent: 'Ahoy! The Homepage Finder' robot-language: Perl 5 robot-description: Ahoy! is an ongoing research project at the University of Washington for finding personal Homepages. robot-history: Research project at the University of Washington in 1995/1996 robot-environment: research modified-date: Fri June 28 14:00:00 1996 modified-by: Marc Langheinrich robot-id: Alkaline robot-name: Alkaline robot-cover-url: http://www.vestris.com/alkaline robot-details-url: http://www.vestris.com/alkaline robot-owner-name: Daniel Doubrovkine robot-owner-url: http://cuiwww.unige.ch/~doubrov5 robot-owner-email: dblock@vestris.com robot-status: development active robot-purpose: indexing robot-type: standalone robot-platform: unix windows95 windowsNT robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: AlkalineBOT robot-noindex: yes robot-host: * robot-from: no robot-useragent: AlkalineBOT robot-language: c++ robot-description: Unix/NT internet/intranet search engine robot-history: Vestris Inc. search engine designed at the University of Geneva robot-environment: commercial research modified-date: Thu Dec 10 14:01:13 MET 1998 modified-by: Daniel Doubrovkine robot-id:anthill robot-name:Anthill robot-cover-url:http://www.anthill.org/index.html robot-details-url:http://www.anthill.org/index.html robot-owner-name:Torsten Kaubisch robot-owner-url:http://www.anthill.org/index.html robot-owner-email:info@anthill.org robot-status:development robot-purpose:indexing robot-type:standalone robot-platform:independent robot-availability:not yet robot-exclusion:no (soon in V1.2) robot-exclusion-useragent:anthill robot-noindex:no robot-host:anywhere robot-from:no robot-useragent:AnthillV1.1 robot-language:java robot-description:Anthill is used to gather priceinformation automatically from online stores.support for international versions. robot-history:This is a reasearch project at the University of Mannheim in Germany, professorship Prof. Martin Schader, assistant Dr. Stefan Kuhlins robot-environment:research modified-date:Thu, 6 Dec 2001 01:55:00 GMT modified-by:Torsten Kaubisch robot-id: appie robot-name: Walhello appie robot-cover-url: www.walhello.com robot-details-url: www.walhello.com/aboutgl.html robot-owner-name: Aimo Pieterse robot-owner-url: www.walhello.com robot-owner-email: aimo@walhello.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: windows98 robot-availability: none robot-exclusion: yes robot-exclusion-useragent: appie robot-noindex: yes robot-host: 213.10.10.116, 213.10.10.117, 213.10.10.118 robot-from: yes robot-useragent: appie/1.1 robot-language: Visual C++ robot-description: The appie-spider is used to collect and index web pages for the Walhello search engine robot-history: The spider was built in march/april 2000 robot-environment: commercial modified-date: Thu, 20 Jul 2000 22:38:00 GMT modified-by: Aimo Pieterse robot-id: arachnophilia robot-name: Arachnophilia robot-cover-url: robot-details-url: robot-owner-name: Vince Taluskie robot-owner-url: http://www.ph.utexas.edu/people/vince.html robot-owner-email: taluskie@utpapa.ph.utexas.edu robot-status: robot-purpose: robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: halsoft.com robot-from: robot-useragent: Arachnophilia robot-language: robot-description: The purpose (undertaken by HaL Software) of this run was to collect approximately 10k html documents for testing automatic abstract generation robot-history: robot-environment: modified-date: modified-by: robot-id: arale robot-name: Arale robot-cover-url: http://web.tiscali.it/_flat robot-details-url: http://web.tiscali.it/_flat robot-owner-name: Flavio Tordini robot-owner-url: http://web.tiscali.it/_flat robot-owner-email: flaviotordini@tiscali.it robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: unix, windows, windows95, windowsNT, os2, mac, linux robot-availability: source, binary robot-exclusion: no robot-exclusion-useragent: arale robot-noindex: no robot-host: * robot-from: no robot-useragent: no robot-language: java robot-description: A java multithreaded web spider. Download entire web sites or specific resources from the web. Render dynamic sites to static pages. robot-history: This is brand new. robot-environment: hobby modified-date: Thu, 09 Jan 2001 17:28:52 GMT modified-by: Flavio Tordini robot-id: araneo robot-name: Araneo robot-cover-url: http://esperantisto.net robot-details-url: http://esperantisto.net/araneo/ robot-owner-name: Arto Sarle robot-owner-url: http://esperantisto.net robot-owner-email: araneo@esperantisto.net robot-status: development robot-purpose: indexing, statistics robot-type: standalone robot-platform: Linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: araneo robot-noindex: yes robot-nofollow: yes robot-host: *.esperantisto.net robot-from: yes robot-useragent: Araneo/0.7 (araneo@esperantisto.net; http://esperantisto.net) robot-language: Python, Java robot-description: Araneo is a web robot developed for crawling and indexing web pages written in the international language Esperanto. The database will be used to build a web search engine and auxiliary services to be published at esperantisto.net. robot-history: (The name Araneo means "spider" in Esperanto.) robot-environment: hobby, research modified-date: Fri, 16 Nov 2001 08:30:00 GMT modified-by: Arto Sarle robot-id: araybot robot-name: AraybOt robot-cover-url: http://www.araykoo.com/ robot-details-url: http://www.araykoo.com/araybot.html robot-owner-name: Guti robot-owner-url: http://www.araykoo.com/ robot-owner-email: robot@araykoo.com robot-status: active robot-purpose: indexing maintenance robot-type: standalone robot-platform: Linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: AraybOt robot-noindex: yes robot-host: * robot-from: no robot-useragent: AraybOt/1.0 (+http://www.araykoo.com/araybot.html) robot-language: perl5 robot-description: AraybOt is the agent software of AraykOO! which crawls web sites listed in http://dmoz.org/Adult/, in order to build a adult search engine. robot-history: robot-environment: service modified-date: Sat, 19 Jun 2004 20:25:00 GMT+1 modified-by: Guti robot-id: architext robot-name: ArchitextSpider robot-cover-url: http://www.excite.com/ robot-details-url: robot-owner-name: Architext Software robot-owner-url: http://www.atext.com/spider.html robot-owner-email: spider@atext.com robot-status: robot-purpose: indexing, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: *.atext.com robot-from: yes robot-useragent: ArchitextSpider robot-language: perl 5 and c robot-description: Its purpose is to generate a Resource Discovery database, and to generate statistics. The ArchitextSpider collects information for the Excite and WebCrawler search engines. robot-history: robot-environment: modified-date: Tue Oct 3 01:10:26 1995 modified-by: robot-id: aretha robot-name: Aretha robot-cover-url: robot-details-url: robot-owner-name: Dave Weiner robot-owner-url: http://www.hotwired.com/Staff/userland/ robot-owner-email: davew@well.com robot-status: robot-purpose: robot-type: robot-platform: Macintosh robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: robot-from: robot-useragent: robot-language: robot-description: A crude robot built on top of Netscape and Userland Frontier, a scripting system for Macs robot-history: robot-environment: modified-date: modified-by: robot-id: ariadne robot-name: ARIADNE robot-cover-url: (forthcoming) robot-details-url: (forthcoming) robot-owner-name: Mr. Matthias H. Gross robot-owner-url: http://www.lrz-muenchen.de/~gross/ robot-owner-email: Gross@dbs.informatik.uni-muenchen.de robot-status: development robot-purpose: statistics, development of focused crawling strategies robot-type: standalone robot-platform: java robot-availability: none robot-exclusion: yes robot-exclusion-useragent: ariadne robot-noindex: no robot-host: dbs.informatik.uni-muenchen.de robot-from: no robot-useragent: Due to a deficiency in Java it's not currently possible to set the User-Agent. robot-language: java robot-description: The ARIADNE robot is a prototype of a environment for testing focused crawling strategies. robot-history: This robot is part of a research project at the University of Munich (LMU), started in 2000. robot-environment: research modified-date: Mo, 13 Mar 2000 14:00:00 GMT modified-by: Mr. Matthias H. Gross robot-id:arks robot-name:arks robot-cover-url:http://www.dpsindia.com robot-details-url:http://www.dpsindia.com robot-owner-name:Aniruddha Choudhury robot-owner-url: robot-owner-email:aniruddha.c@usa.net robot-status:development robot-purpose:indexing robot-type:standalone robot-platform:PLATFORM INDEPENDENT robot-availability:data robot-exclusion:yes robot-exclusion-useragent:arks robot-noindex:no robot-host:dpsindia.com robot-from:no robot-useragent:arks/1.0 robot-language:Java 1.2 robot-description:The Arks robot is used to build the database for the dpsindia/lawvistas.com search service . The robot runs weekly, and visits sites in a random order robot-history:finds its root from s/w development project for a portal robot-environment:commercial modified-date:6 th November 2000 modified-by:Aniruddha Choudhury robot-id: aspider robot-name: ASpider (Associative Spider) robot-cover-url: robot-details-url: robot-owner-name: Fred Johansen robot-owner-url: http://www.pvv.ntnu.no/~fredj/ robot-owner-email: fredj@pvv.ntnu.no robot-status: retired robot-purpose: indexing robot-type: robot-platform: unix robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: nova.pvv.unit.no robot-from: yes robot-useragent: ASpider/0.09 robot-language: perl4 robot-description: ASpider is a CGI script that searches the web for keywords given by the user through a form. robot-history: robot-environment: hobby modified-date: modified-by: robot-id: atn.txt robot-name: ATN Worldwide robot-details-url: robot-cover-url: robot-owner-name: All That Net robot-owner-url: http://www.allthatnet.com robot-owner-email: info@allthatnet.com robot-status: active robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: ATN_Worldwide robot-noindex: robot-nofollow: robot-host: www.allthatnet.com robot-from: robot-useragent: ATN_Worldwide robot-language: robot-description: The ATN robot is used to build the database for the AllThatNet search service operated by All That Net. The robot runs weekly, and visits sites in a random order. robot-history: robot-environment: modified-date: July 09, 2000 17:43 GMT robot-id: atomz robot-name: Atomz.com Search Robot robot-cover-url: http://www.atomz.com/help/ robot-details-url: http://www.atomz.com/ robot-owner-name: Mike Thompson robot-owner-url: http://www.atomz.com/ robot-owner-email: mike@atomz.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: service robot-exclusion: yes robot-exclusion-useragent: Atomz robot-noindex: yes robot-host: www.atomz.com robot-from: no robot-useragent: Atomz/1.0 robot-language: c robot-description: Robot used for web site search service. robot-history: Developed for Atomz.com, launched in 1999. robot-environment: service modified-date: Tue Jul 13 03:50:06 GMT 1999 modified-by: Mike Thompson robot-id: auresys robot-name: AURESYS robot-cover-url: http://crrm.univ-mrs.fr robot-details-url: http://crrm.univ-mrs.fr robot-owner-name: Mannina Bruno robot-owner-url: ftp://crrm.univ-mrs.fr/pub/CVetud/Etudiants/Mannina/CVbruno.htm robot-owner-email: mannina@crrm.univ-mrs.fr robot-status: robot actively in use robot-purpose: indexing,statistics robot-type: Standalone robot-platform: Aix, Unix robot-availability: Protected by Password robot-exclusion: Yes robot-exclusion-useragent: robot-noindex: no robot-host: crrm.univ-mrs.fr, 192.134.99.192 robot-from: Yes robot-useragent: AURESYS/1.0 robot-language: Perl 5.001m robot-description: The AURESYS is used to build a personnal database for somebody who search information. The database is structured to be analysed. AURESYS can found new server by IP incremental. It generate statistics... robot-history: This robot finds its roots in a research project at the University of Marseille in 1995-1996 robot-environment: used for Research modified-date: Mon, 1 Jul 1996 14:30:00 GMT modified-by: Mannina Bruno robot-id: backrub robot-name: BackRub robot-cover-url: robot-details-url: robot-owner-name: Larry Page robot-owner-url: http://backrub.stanford.edu/ robot-owner-email: page@leland.stanford.edu robot-status: robot-purpose: indexing, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: *.stanford.edu robot-from: yes robot-useragent: BackRub/*.* robot-language: Java. robot-description: robot-history: robot-environment: modified-date: Wed Feb 21 02:57:42 1996. modified-by: robot-id: robot-name: bayspider robot-cover-url: http://www.baytsp.com/ robot-details-url: http://www.baytsp.com/ robot-owner-name: BayTSP.com,Inc robot-owner-url: robot-owner-email: marki@baytsp.com robot-status: Active robot-purpose: Copyright Infringement Tracking robot-type: Stand Alone robot-platform: NT robot-availability: 24/7 robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: robot-from: robot-useragent: BaySpider robot-language: English robot-description: robot-history: robot-environment: modified-date: 1/15/2001 modified-by: Marki@baytsp.com robot-id: bbot robot-name: BBot robot-cover-url: http://www.otthon.net/search robot-details-url: http://www.otthon.net/search/bbot robot-owner-name: Istvan Fulop robot-owner-url: http://www.otthon.net robot-owner-email: poluf1 at yahoo dot co dot uk robot-status: development robot-purpose: indexing, maintenance robot-type: standalone robot-platform: windows robot-availability: none robot-exclusion: yes robot-exclusion-useragent: bbot robot-noindex: yes robot-nofollow: yes robot-host: *.netcologne.de robot-from: yes robot-useragent: bbot/0.100 robot-language: perl robot-description: Mainly intended for site level search, sometimes set loose. robot-history: Started project in 11/2000. Called BBot since 24/04/2003. robot-environment: hobby modified-date: Sun, 04 May 2003 10:15:00 GMT modified-by: Istvan Fulop robot-id: bigbrother robot-name: Big Brother robot-cover-url: http://pauillac.inria.fr/~fpottier/mac-soft.html.en robot-details-url: robot-owner-name: Francois Pottier robot-owner-url: http://pauillac.inria.fr/~fpottier/ robot-owner-email: Francois.Pottier@inria.fr robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: mac robot-availability: binary robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: not as of 1.0 robot-useragent: Big Brother robot-language: c++ robot-description: Macintosh-hosted link validation tool. robot-history: robot-environment: shareware modified-date: Thu Sep 19 18:01:46 MET DST 1996 modified-by: Francois Pottier robot-id: bjaaland robot-name: Bjaaland robot-cover-url: http://www.textuality.com robot-details-url: http://www.textuality.com robot-owner-name: Tim Bray robot-owner-url: http://www.textuality.com robot-owner-email: tbray@textuality.com robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Bjaaland robot-noindex: no robot-host: barry.bitmovers.net robot-from: no robot-useragent: Bjaaland/0.5 robot-language: perl5 robot-description: Crawls sites listed in the ODP (see http://dmoz.org) robot-history: None, yet robot-environment: service modified-date: Monday, 19 July 1999, 13:46:00 PDT modified-by: tbray@textuality.com robot-id: blackwidow robot-name: BlackWidow robot-cover-url: http://140.190.65.12/~khooghee/index.html robot-details-url: robot-owner-name: Kevin Hoogheem robot-owner-url: robot-owner-email: khooghee@marys.smumn.edu robot-status: robot-purpose: indexing, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: 140.190.65.* robot-from: yes robot-useragent: BlackWidow robot-language: C, C++. robot-description: Started as a research project and now is used to find links for a random link generator. Also is used to research the growth of specific sites. robot-history: robot-environment: modified-date: Fri Feb 9 00:11:22 1996. modified-by: robot-id: blindekuh robot-name: Die Blinde Kuh robot-cover-url: http://www.blinde-kuh.de/ robot-details-url: http://www.blinde-kuh.de/robot.html (german language) robot-owner-name: Stefan R. Mueller robot-owner-url: http://www.rrz.uni-hamburg.de/philsem/stefan_mueller/ robot-owner-email:maschinist@blinde-kuh.de robot-status: development robot-purpose: indexing robot-type: browser robot-platform: unix robot-availability: none robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: minerva.sozialwiss.uni-hamburg.de robot-from: yes robot-useragent: Die Blinde Kuh robot-language: perl5 robot-description: The robot is use for indixing and proofing the registered urls in the german language search-engine for kids. Its a none-comercial one-woman-project of Birgit Bachmann living in Hamburg, Germany. robot-history: The robot was developed by Stefan R. Mueller to help by the manual proof of registered Links. robot-environment: hobby modified-date: Mon Jul 22 1998 modified-by: Stefan R. Mueller robot-id:Bloodhound robot-name:Bloodhound robot-cover-url:http://web.ukonline.co.uk/genius/bloodhound.htm robot-details-url:http://web.ukonline.co.uk/genius/bloodhound.htm robot-owner-name:Dean Smart robot-owner-url:http://web.ukonline.co.uk/genius/bloodhound.htm robot-owner-email:genius@ukonline.co.uk robot-status:active robot-purpose:Web Site Download robot-type:standalone robot-platform:Windows95, WindowsNT, Windows98, Windows2000 robot-availability:Executible robot-exclusion:No robot-exclusion-useragent:Ukonline robot-noindex:No robot-host:* robot-from:No robot-useragent:None robot-language:Perl5 robot-description:Bloodhound will download an whole web site depending on the number of links to follow specified by the user. robot-history:First version was released on the 1 july 2000 robot-environment:Commercial modified-date:1 july 2000 modified-by:Dean Smart robot-id: borg-bot robot-name: Borg-Bot robot-cover-url: robot-details-url: http://www.skunkfarm.com/borgbot.htm robot-owner-name: James Bragg robot-owner-url: http://www.skunkfarm.com robot-owner-email: botdev@skunkfarm.com robot-status: development robot-purpose: indexing statistics robot-type: standalone robot-platform: Linux Windows2000 robot-availability: none robot-exclusion: yes robot-exclusion-useragent: borg-bot/0.9 robot-noindex: yes robot-host: 24.11.13.173 robot-from: yes robot-useragent: borg-bot/0.9 robot-language: python robot-description: Developmental crawler to feed a search engine robot-history: robot-environment: research service modified-date: Sat, 20 Oct 2001 04:00:00 GMT modified-by: Sat, 20 Oct 2001 04:00:00 GMT robot-id: boxseabot robot-name: BoxSeaBot robot-cover-url: http://www.boxsea.com/crawler robot-details-url: http://www.boxsea.com/crawler robot-owner-name: BoxSea Search Engine robot-owner-url: http://www.boxsea.com robot-owner-email: boxseasearch@yahoo.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: linux robot-availability: robot-exclusion: yes robot-exclusion-useragent: boxseabot robot-noindex: robot-host: robot-from: robot-useragent: BoxSeaBot/0.5 (http://boxsea.com/crawler) robot-language: java robot-description: This robot is used to find pages for building the BoxSea search engine indices. robot-history: The robot code uses Nutch. Earlier experimental crawls were done under various user agent names such as NutchCVS(boxsea) robot-environment: modified-date: Fri, 23 Jul 2004 11:58:00 PST modified-by: BoxSeaBot robot-id: brightnet robot-name: bright.net caching robot robot-cover-url: robot-details-url: robot-owner-name: robot-owner-url: robot-owner-email: robot-status: active robot-purpose: caching robot-type: robot-platform: robot-availability: none robot-exclusion: no robot-noindex: robot-host: 209.143.1.46 robot-from: no robot-useragent: Mozilla/3.01 (compatible;) robot-language: robot-description: robot-history: robot-environment: modified-date: Fri Nov 13 14:08:01 EST 1998 modified-by: brian d foy robot-id: bspider robot-name: BSpider robot-cover-url: not yet robot-details-url: not yet robot-owner-name: Yo Okumura robot-owner-url: not yet robot-owner-email: okumura@rsl.crl.fujixerox.co.jp robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: bspider robot-noindex: yes robot-host: 210.159.73.34, 210.159.73.35 robot-from: yes robot-useragent: BSpider/1.0 libwww-perl/0.40 robot-language: perl robot-description: BSpider is crawling inside of Japanese domain for indexing. robot-history: Starts Apr 1997 in a research project at Fuji Xerox Corp. Research Lab. robot-environment: research modified-date: Mon, 21 Apr 1997 18:00:00 JST modified-by: Yo Okumura robot-id: cactvschemistryspider robot-name: CACTVS Chemistry Spider robot-cover-url: http://schiele.organik.uni-erlangen.de/cactvs/spider.html robot-details-url: robot-owner-name: W. D. Ihlenfeldt robot-owner-url: http://schiele.organik.uni-erlangen.de/cactvs/ robot-owner-email: wdi@eros.ccc.uni-erlangen.de robot-status: robot-purpose: indexing. robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: utamaro.organik.uni-erlangen.de robot-from: no robot-useragent: CACTVS Chemistry Spider robot-language: TCL, C robot-description: Locates chemical structures in Chemical MIME formats on WWW and FTP servers and downloads them into database searchable with structure queries (substructure, fullstructure, formula, properties etc.) robot-history: robot-environment: modified-date: Sat Mar 30 00:55:40 1996. modified-by: robot-id: calif robot-name: Calif robot-details-url: http://www.tnps.dp.ua/calif/details.html robot-cover-url: http://www.tnps.dp.ua/calif/ robot-owner-name: Alexander Kosarev robot-owner-url: http://www.tnps.dp.ua/~dark/ robot-owner-email: kosarev@tnps.net robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: calif robot-noindex: yes robot-host: cobra.tnps.dp.ua robot-from: yes robot-useragent: Calif/0.6 (kosarev@tnps.net; http://www.tnps.dp.ua) robot-language: c++ robot-description: Used to build searchable index robot-history: In development stage robot-environment: research modified-date: Sun, 6 Jun 1999 13:25:33 GMT robot-id: cassandra robot-name: Cassandra robot-cover-url: http://post.mipt.rssi.ru/~billy/search/ robot-details-url: http://post.mipt.rssi.ru/~billy/search/ robot-owner-name: Mr. Oleg Bilibin robot-owner-url: http://post.mipt.rssi.ru/~billy/ robot-owner-email: billy168@aha.ru robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: crossplatform robot-availability: none robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: www.aha.ru robot-from: no robot-useragent: robot-language: java robot-description: Cassandra search robot is used to create and maintain indexed database for widespread Information Retrieval System robot-history: Master of Science degree project at Moscow Institute of Physics and Technology robot-environment: research modified-date: Wed, 3 Jun 1998 12:00:00 GMT robot-id: cgireader robot-name: Digimarc Marcspider/CGI robot-cover-url: http://www.digimarc.com/prod_fam.html robot-details-url: http://www.digimarc.com/prod_fam.html robot-owner-name: Digimarc Corporation robot-owner-url: http://www.digimarc.com robot-owner-email: wmreader@digimarc.com robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: 206.102.3.* robot-from: robot-useragent: Digimarc CGIReader/1.0 robot-language: c++ robot-description: Similar to Digimarc Marcspider, Marcspider/CGI examines image files for watermarks but more focused on CGI Urls. In order to not waste internet bandwidth with yet another crawler, we have contracted with one of the major crawlers/seach engines to provide us with a list of specific CGI URLs of interest to us. If an URL is to a page of interest (via CGI), then we access the page to get the image URLs from it, but we do not crawl to any other pages. robot-history: First operation in December 1997 robot-environment: service modified-date: Fri, 5 Dec 1997 12:00:00 GMT modified-by: Dan Ramos robot-id: checkbot robot-name: Checkbot robot-cover-url: http://www.xs4all.nl/~graaff/checkbot/ robot-details-url: robot-owner-name: Hans de Graaff robot-owner-url: http://www.xs4all.nl/~graaff/checkbot/ robot-owner-email: graaff@xs4all.nl robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: unix,WindowsNT robot-availability: source robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: no robot-useragent: Checkbot/x.xx LWP/5.x robot-language: perl 5 robot-description: Checkbot checks links in a given set of pages on one or more servers. It reports links which returned an error code robot-history: robot-environment: hobby modified-date: Tue Jun 25 07:44:00 1996 modified-by: Hans de Graaff robot-id: christcrawler robot-name: ChristCrawler.com robot-cover-url: http://www.christcrawler.com/search.cfm robot-details-url: http://www.christcrawler.com/index.cfm robot-owner-name: Jeremy DeYoung robot-owner-url: http://www.christcentral.com/aboutus/index.cfm robot-owner-email: jeremy.deyoung@christcentral.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Windows NT 4.0 SP5 robot-availability: none robot-exclusion: yes robot-exclusion-useragent: christcrawler robot-noindex: yes robot-host: 64.51.218.*, 64.51.219.*, 12.107.236.*, 12.107.237.* robot-from: yes robot-useragent: Mozilla/4.0 (compatible; ChristCrawler.com, ChristCrawler@ChristCENTRAL.com) robot-language: Cold Fusion 4.5 robot-description: A Christian internet spider that searches web sites to find Christian Related material robot-history: Developed because of the growing need for a more God influence on the Internet. robot-environment: service modified-date: Fri, 27 Jun 2001 00:53:12 CST modified-by: Jeremy DeYoung robot-id: churl robot-name: churl robot-cover-url: http://www-personal.engin.umich.edu/~yunke/scripts/churl/ robot-details-url: robot-owner-name: Justin Yunke robot-owner-url: http://www-personal.engin.umich.edu/~yunke/ robot-owner-email: yunke@umich.edu robot-status: robot-purpose: maintenance robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: robot-useragent: robot-language: robot-description: A URL checking robot, which stays within one step of the local server robot-history: robot-environment: modified-date: modified-by: robot-id: cienciaficcion robot-name: cIeNcIaFiCcIoN.nEt robot-cover-url: http://www.cienciaficcion.net/ robot-details-url: http://www.cienciaficcion.net/ robot-owner-name: David Fernández robot-owner-url: http://www.cyberdark.net/ robot-owner-email: root@cyberdark.net robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: linux robot-availability: none robot-exclusion: no robot-exclusion-useragent: robot-noindex: yes robot-host: epervier.cqhost.net robot-from: no robot-useragent: cIeNcIaFiCcIoN.nEt Spider (http://www.cienciaficcion.net) robot-language: php,perl robot-description: Robot encargado de la indexación de las páginas para www.cienciaficcion.net robot-history: Alcorkón (Madrid) - Europa 2000/2001 robot-environment: hobby modified-date: Sat, 18 Aug 2001 00:38:52 GMT modified-by: David Fernández robot-id: cmc robot-name: CMC/0.01 robot-details-url: http://www2.next.ne.jp/cgi-bin/music/help.cgi?phase=robot robot-cover-url: http://www2.next.ne.jp/music/ robot-owner-name: Shinobu Kubota. robot-owner-url: http://www2.next.ne.jp/cgi-bin/music/help.cgi?phase=profile robot-owner-email: shinobu@po.next.ne.jp robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: CMC/0.01 robot-noindex: no robot-host: haruna.next.ne.jp, 203.183.218.4 robot-from: yes robot-useragent: CMC/0.01 robot-language: perl5 robot-description: This CMC/0.01 robot collects the information of the page that was registered to the music specialty searching service. robot-history: This CMC/0.01 robot was made for the computer music center on November 4, 1997. robot-environment: hobby modified-date: Sat, 23 May 1998 17:22:00 GMT robot-id:Collective robot-name:Collective robot-cover-url:http://web.ukonline.co.uk/genius/collective.htm robot-details-url:http://web.ukonline.co.uk/genius/collective.htm robot-owner-name:Dean Smart robot-owner-url:http://web.ukonline.co.uk/genius/collective.htm robot-owner-email:genius@ukonline.co.uk robot-status:development robot-purpose:Collective is a highly configurable program designed to interrogate online search engines and online databases, it will ignore web pages that lie about there content, and dead url's, it can be super strict, it searches each web page it finds for your search terms to ensure those terms are present, any positive urls are added to a html file for your to view at any time even before the program has finished. Collective can wonder the web for days if required. robot-type:standalone robot-platform:Windows95, WindowsNT, Windows98, Windows2000 robot-availability:Executible robot-exclusion:No robot-exclusion-useragent: robot-noindex:No robot-host:* robot-from:No robot-useragent:LWP robot-language:Perl5 (With Visual Basic front-end) robot-description:Collective is the most cleverest Internet search engine, With all found url?s guaranteed to have your search terms. robot-history:Develpment started on August, 03, 2000 robot-environment:Commercial modified-date:August, 03, 2000 modified-by:Dean Smart robot-id: combine robot-name: Combine System robot-cover-url: http://www.ub2.lu.se/~tsao/combine.ps robot-details-url: http://www.ub2.lu.se/~tsao/combine.ps robot-owner-name: Yong Cao robot-owner-url: http://www.ub2.lu.se/ robot-owner-email: tsao@munin.ub2.lu.se robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: combine robot-noindex: no robot-host: *.ub2.lu.se robot-from: yes robot-useragent: combine/0.0 robot-language: c, perl5 robot-description: An open, distributed, and efficient harvester. robot-history: A complete re-design of the NWI robot (w3index) for DESIRE project. robot-environment: research modified-date: Tue, 04 Mar 1997 16:11:40 GMT modified-by: Yong Cao robot-id: confuzzledbot robot-name: ConfuzzledBot robot-cover-url: http://www.blue.lu/ robot-details-url: http://bot.confuzzled.lu/ robot-owner-name: Britz Thibaut robot-owner-url: http://www.confuzzled.lu/ robot-owner-email: bot@confuzzled.lu robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: Linux,Freebsd robot-availability: none robot-exclusion: yes robot-exclusion-useragent: confuzzledbot robot-noindex: yes robot-nofollow: yes robot-host: *.ion.lu robot-from: no robot-useragent: Confuzzledbot/X.X (+http://www.confuzzled.lu/bot/) robot-language: perl5 robot-description: The robot is used to build a searchable database for luxembourgish sites. It only indexes .lu domains and luxembourgish sites added to the directory. robot-history: Developed 2000-2002. Only minor changes recently robot-environment: hobby modified-date: Tue, 11 May 2004 17:45:00 CET modified-by: Britz Thibaut robot-id: coolbot robot-name: CoolBot robot-cover-url: www.suchmaschine21.de robot-details-url: www.suchmaschine21.de robot-owner-name: Stefan Fischerlaender robot-owner-url: www.suchmaschine21.de robot-owner-email: info@suchmaschine21.de robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: CoolBot robot-noindex: yes robot-host: www.suchmaschine21.de robot-from: no robot-useragent: CoolBot robot-language: perl5 robot-description: The CoolBot robot is used to build and maintain the directory of the german search engine Suchmaschine21. robot-history: none so far robot-environment: service modified-date: Wed, 21 Jan 2001 12:16:00 GMT modified-by: Stefan Fischerlaender robot-id: core robot-name: Web Core / Roots robot-cover-url: http://www.di.uminho.pt/wc robot-details-url: robot-owner-name: Jorge Portugal Andrade robot-owner-url: http://www.di.uminho.pt/~cbm robot-owner-email: wc@di.uminho.pt robot-status: robot-purpose: indexing, maintenance robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: shiva.di.uminho.pt, from www.di.uminho.pt robot-from: no robot-useragent: root/0.1 robot-language: perl robot-description: Parallel robot developed in Minho Univeristy in Portugal to catalog relations among URLs and to support a special navigation aid. robot-history: First versions since October 1995. robot-environment: modified-date: Wed Jan 10 23:19:08 1996. modified-by: robot-id: cosmos robot-name: XYLEME Robot robot-cover-url: http://xyleme.com/ robot-details-url: robot-owner-name: Mihai Preda robot-owner-url: http://www.mihaipreda.com/ robot-owner-email: preda@xyleme.com robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: yes robot-exclusion-useragent: cosmos robot-noindex: no robot-nofollow: no robot-host: robot-from: yes robot-useragent: cosmos/0.3 robot-language: c++ robot-description: index XML, follow HTML robot-history: robot-environment: service modified-date: Fri, 24 Nov 2000 00:00:00 GMT modified-by: Mihai Preda robot-id: cruiser robot-name: Internet Cruiser Robot robot-cover-url: http://www.krstarica.com/ robot-details-url: http://www.krstarica.com/eng/url/ robot-owner-name: Internet Cruiser robot-owner-url: http://www.krstarica.com/ robot-owner-email: robot@krstarica.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Internet Cruiser Robot robot-noindex: yes robot-host: *.krstarica.com robot-from: no robot-useragent: Internet Cruiser Robot/2.1 robot-language: c++ robot-description: Internet Cruiser Robot is Internet Cruiser's prime index agent. robot-history: robot-environment: service modified-date: Fri, 17 Jan 2001 12:00:00 GMT modified-by: tech@krstarica.com robot-id: cusco robot-name: Cusco robot-cover-url: http://www.cusco.pt/ robot-details-url: http://www.cusco.pt/ robot-owner-name: Filipe Costa Clerigo robot-owner-url: http://www.viatecla.pt/ robot-owner-email: clerigo@viatecla.pt robot-status: active robot-purpose: indexing robot-type: standlone robot-platform: any robot-availability: none robot-exclusion: yes robot-exclusion-useragent: cusco robot-noindex: yes robot-host: *.cusco.pt, *.viatecla.pt robot-from: yes robot-useragent: Cusco/3.2 robot-language: Java robot-description: The Cusco robot is part of the CUCE indexing sistem. It gathers information from several sources: HTTP, Databases or filesystem. At this moment, it's universe is the .pt domain and the information it gathers is available at the Portuguese search engine Cusco http://www.cusco.pt/. robot-history: The Cusco search engine started in the company ViaTecla as a project to demonstrate our development capabilities and to fill the need of a portuguese-specific search engine. Now, we are developping new functionalities that cannot be found in any other on-line search engines. robot-environment:service, research modified-date: Mon, 21 Jun 1999 14:00:00 GMT modified-by: Filipe Costa Clerigo robot-id: cyberspyder robot-name: CyberSpyder Link Test robot-cover-url: http://www.cyberspyder.com/cslnkts1.html robot-details-url: http://www.cyberspyder.com/cslnkts1.html robot-owner-name: Tom Aman robot-owner-url: http://www.cyberspyder.com/ robot-owner-email: amant@cyberspyder.com robot-status: active robot-purpose: link validation, some html validation robot-type: standalone robot-platform: windows 3.1x, windows95, windowsNT robot-availability: binary robot-exclusion: user configurable robot-exclusion-useragent: cyberspyder robot-noindex: no robot-host: * robot-from: no robot-useragent: CyberSpyder/2.1 robot-language: Microsoft Visual Basic 4.0 robot-description: CyberSpyder Link Test is intended to be used as a site management tool to validate that HTTP links on a page are functional and to produce various analysis reports to assist in managing a site. robot-history: The original robot was created to fill a widely seen need for a easy to use link checking program. robot-environment: commercial modified-date: Tue, 31 Mar 1998 01:02:00 GMT modified-by: Tom Aman robot-id: cydralspider robot-name: CydralSpider robot-cover-url: http://www.cydral.com/ robot-details-url: http://en.cydral.com/help.html robot-owner-name: Cydral robot-owner-url: http://www.cydral.com/ robot-owner-email: cydral@cydral.com robot-status: active robot-purpose: gather Web content for image search engine service robot-type: standalone robot-platform: unix; windows robot-availability: none robot-exclusion: yes robot-exclusion-useragent: cydralspider robot-noindex: yes robot-host: *.cydral.com robot-from: yes robot-useragent: CydralSpider/X.X (Cydral Web Image Search; http://www.cydral.com/) robot-language: c++ robot-description: Advanced image spider for www.cydral.com robot-history: Developped in 2003, the robot uses new methods to discover Web sites and index images robot-environment: commercial modified-date: Tue, 17 Jun 2004, 11:50:30 GMT modified-by: cydral@cydral.com robot-id: desertrealm robot-name: Desert Realm Spider robot-cover-url: http://www.desertrealm.com robot-details-url: http://spider.desertrealm.com robot-owner-name: Brian B. robot-owner-url: http://www.desertrealm.com robot-owner-email: spider@desertrealm.com robot-status: robot actively in use robot-purpose: indexing robot-type: standalone robot-platform: cross platform robot-availability: none robot-exclusion: yes robot-exclusion-useragent: desertrealm, desert realm robot-noindex: yes robot-nofollow: yes robot-host: * robot-from: no robot-useragent: DesertRealm.com; 0.2; [J]; robot-language: java 1.3, java 1.4 robot-description: The spider indexes fantasy and science fiction sites by using a customizable keyword algorithm. Only home pages are indexed, but all pages are looked at for links. Pages are visited randomly to limit impact on any one webserver. robot-history: The spider originally was created to learn more about how search engines work. robot-environment: hobby modified-date: Fri, 19 Sep 2003 08:57:52 GMT modified-by: Brian B. robot-id: deweb robot-name: DeWeb(c) Katalog/Index robot-cover-url: http://deweb.orbit.de/ robot-details-url: robot-owner-name: Marc Mielke robot-owner-url: http://www.orbit.de/ robot-owner-email: dewebmaster@orbit.de robot-status: robot-purpose: indexing, mirroring, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: deweb.orbit.de robot-from: yes robot-useragent: Deweb/1.01 robot-language: perl 4 robot-description: Its purpose is to generate a Resource Discovery database, perform mirroring, and generate statistics. Uses combination of Informix(tm) Database and WN 1.11 serversoftware for indexing/ressource discovery, fulltext search, text excerpts. robot-history: robot-environment: modified-date: Wed Jan 10 08:23:00 1996 modified-by: robot-id: dienstspider robot-name: DienstSpider robot-cover-url: http://sappho.csi.forth.gr:22000/ robot-details-url: robot-owner-name: Antonis Sidiropoulos robot-owner-url: http://www.csi.forth.gr/~asidirop robot-owner-email: asidirop@csi.forth.gr robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: sappho.csi.forth.gr robot-from: robot-useragent: dienstspider/1.0 robot-language: C robot-description: Indexing and searching the NCSTRL(Networked Computer Science Technical Report Library) and ERCIM Collection robot-history: The version 1.0 was the developer's master thesis project robot-environment: research modified-date: Fri, 4 Dec 1998 0:0:0 GMT modified-by: asidirop@csi.forth.gr robot-id: digger robot-name: Digger robot-cover-url: http://www.diggit.com/ robot-details-url: robot-owner-name: Benjamin Lipchak robot-owner-url: robot-owner-email: admin@bulldozersoftware.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix, windows robot-availability: none robot-exclusion: yes robot-exclusion-useragent: digger robot-noindex: yes robot-host: robot-from: yes robot-useragent: Digger/1.0 JDK/1.3.0 robot-language: java robot-description: indexing web sites for the Diggit! search engine robot-history: robot-environment: service modified-date: modified-by: robot-id: diibot robot-name: Digital Integrity Robot robot-cover-url: http://www.digital-integrity.com/robotinfo.html robot-details-url: http://www.digital-integrity.com/robotinfo.html robot-owner-name: Digital Integrity, Inc. robot-owner-url: robot-owner-email: robot@digital-integrity.com robot-status: Production robot-purpose: WWW Indexing robot-type: robot-platform: unix robot-availability: none robot-exclusion: Conforms to robots.txt convention robot-exclusion-useragent: DIIbot robot-noindex: Yes robot-host: digital-integrity.com robot-from: robot-useragent: DIIbot robot-language: Java/C robot-description: robot-history: robot-environment: modified-date: modified-by: robot-id: directhit robot-name: Direct Hit Grabber robot-cover-url: www.directhit.com robot-details-url: http://www.directhit.com/about/company/spider.html robot-status: active robot-description: Direct Hit Grabber indexes documents and collects Web statistics for the Direct Hit Search Engine (available at www.directhit.com and our partners' sites) robot-purpose: Indexing and statistics robot-type: standalone robot-platform: unix robot-language: C++ robot-owner-name: Direct Hit Technologies, Inc. robot-owner-url: www.directhit.com robot-owner-email: DirectHitGrabber@directhit.com robot-exclusion: yes robot-exclusion-useragent: grabber robot-noindex: yes robot-host: *.directhit.com robot-from: yes robot-useragent: grabber robot-environment: service modified-by: grabber@directhit.com robot-id: dnabot robot-name: DNAbot robot-cover-url: http://xx.dnainc.co.jp/dnabot/ robot-details-url: http://xx.dnainc.co.jp/dnabot/ robot-owner-name: Tom Tanaka robot-owner-url: http://xx.dnainc.co.jp robot-owner-email: tomatell@xx.dnainc.co.jp robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix, windows, windows95, windowsNT, mac robot-availability: data robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: xx.dnainc.co.jp robot-from: yes robot-useragent: DNAbot/1.0 robot-language: java robot-description: A search robot in 100 java, with its own built-in database engine and web server . Currently in Japanese. robot-history: Developed by DNA, Inc.(Niigata City, Japan) in 1998. robot-environment: commercial modified-date: Mon, 4 Jan 1999 14:30:00 GMT modified-by: Tom Tanaka robot-id: download_express robot-name: DownLoad Express robot-cover-url: http://www.jacksonville.net/~dlxpress robot-details-url: http://www.jacksonville.net/~dlxpress robot-owner-name: DownLoad Express Inc robot-owner-url: http://www.jacksonville.net/~dlxpress robot-owner-email: dlxpress@mediaone.net robot-status: active robot-purpose: graphic download robot-type: standalone robot-platform: win95/98/NT robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: downloadexpress robot-noindex: no robot-host: * robot-from: no robot-useragent: robot-language: visual basic robot-description: automatically downloads graphics from the web robot-history: robot-environment: commerical modified-date: Wed, 05 May 1998 modified-by: DownLoad Express Inc robot-id: dragonbot robot-name: DragonBot robot-cover-url: http://www.paczone.com/ robot-details-url: robot-owner-name: Paul Law robot-owner-url: robot-owner-email: admin@paczone.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: DragonBot robot-noindex: no robot-host: *.paczone.com robot-from: no robot-useragent: DragonBot/1.0 libwww/5.0 robot-language: C++ robot-description: Collects web pages related to East Asia robot-history: robot-environment: service modified-date: Mon, 11 Aug 1997 00:00:00 GMT modified-by: robot-id: dwcp robot-name: DWCP (Dridus' Web Cataloging Project) robot-cover-url: http://www.dridus.com/~rmm/dwcp.php3 robot-details-url: http://www.dridus.com/~rmm/dwcp.php3 robot-owner-name: Ross Mellgren (Dridus Norwind) robot-owner-url: http://www.dridus.com/~rmm robot-owner-email: rmm@dridus.com robot-status: development robot-purpose: indexing, statistics robot-type: standalone robot-platform: java robot-availability: source, binary, data robot-exclusion: yes robot-exclusion-useragent: dwcp robot-noindex: no robot-host: *.dridus.com robot-from: dridus@dridus.com robot-useragent: DWCP/2.0 robot-language: java robot-description: The DWCP robot is used to gather information for Dridus' Web Cataloging Project, which is intended to catalog domains and urls (no content). robot-history: Developed from scratch by Dridus Norwind. robot-environment: hobby modified-date: Sat, 10 Jul 1999 00:05:40 GMT modified-by: Ross Mellgren robot-id: e-collector robot-name: e-collector robot-cover-url: http://www.thatrobotsite.com/agents/ecollector.htm robot-details-url: http://www.thatrobotsite.com/agents/ecollector.htm robot-owner-name: Dean Smart robot-owner-url: http://www.thatrobotsite.com robot-owner-email: smarty@thatrobotsite.com robot-status: Active robot-purpose: email collector robot-type: Collector of email addresses robot-platform: Windows 9*/NT/2000 robot-availability: Binary robot-exclusion: No robot-exclusion-useragent: ecollector robot-noindex: No robot-host: * robot-from: No robot-useragent: LWP:: robot-language: Perl5 robot-description: e-collector in the simplist terms is a e-mail address collector, thus the name e-collector. So what? Have you ever wanted to have the email addresses of as many companys that sell or supply for example "dried fruit", i personnaly don't but this is just an example. Those of you who may use this type of robot will know exactly what you can do with information, first don't spam with it, for those still not sure what this type of robot will do for you then take this for example: Your a international distributer of "dried fruit" and you boss has told you if you rise sales by 10% then he will bye you a new car (Wish i had a boss like that), well anyway there are thousands of shops distributers ect, that you could be doing business with but you don't know who they are?, because there in other countries or the nearest town but have never heard about them before. Has the penny droped yet, no well now you have the opertunity to find out who they are with an internet address and a person to contact in that company just by downloading and running e-collector. Plus it's free, you don't have to do any leg work just run the program and sit back and watch your potential customers arriving. robot-history: - robot-environment: Service modified-date: Weekly modified-by: Dean Smart robot-id:ebiness robot-name:EbiNess robot-cover-url:http://sourceforge.net/projects/ebiness robot-details-url:http://ebiness.sourceforge.net/ robot-owner-name:Mike Davis robot-owner-url:http://www.carisbrook.co.uk/mike robot-owner-email:mdavis@kieser.net robot-status:Pre-Alpha robot-purpose:statistics robot-type:standalone robot-platform:unix(Linux) robot-availability:Open Source robot-exclusion:yes robot-exclusion-useragent:ebiness robot-noindex:no robot-host: robot-from:no robot-useragent:EbiNess/0.01a robot-language:c++ robot-description:Used to build a url relationship database, to be viewed in 3D robot-history:Dreamed it up over some beers robot-environment:hobby modified-date:Mon, 27 Nov 2000 12:26:00 GMT modified-by:Mike Davis robot-id: eit robot-name: EIT Link Verifier Robot robot-cover-url: http://wsk.eit.com/wsk/dist/doc/admin/webtest/verify_links.html robot-details-url: robot-owner-name: Jim McGuire robot-owner-url: http://www.eit.com/people/mcguire.html robot-owner-email: mcguire@eit.COM robot-status: robot-purpose: maintenance robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: robot-useragent: EIT-Link-Verifier-Robot/0.2 robot-language: robot-description: Combination of an HTML form and a CGI script that verifies links from a given starting point (with some controls to prevent it going off-site or limitless) robot-history: Announced on 12 July 1994 robot-environment: modified-date: modified-by: robot-id: elfinbot robot-name:ELFINBOT robot-cover-url:http://letsfinditnow.com robot-details-url:http://letsfinditnow.com/elfinbot.html robot-owner-name:Lets Find It Now Ltd robot-owner-url:http://letsfinditnow.com robot-owner-email:admin@letsfinditnow.com robot-status:Active robot-purpose:Indexing for the Lets Find It Now search Engine robot-type:Standalone robot-platform:Unix robot-availability:None robot-exclusion: yes robot-exclusion-useragent:elfinbot robot-noindex:yes robot-host:*.letsfinditnow.com robot-from:no robot-useragent:elfinbot robot-language:Perl5 robot-description:ELFIN is used to index and add data to the "Lets Find It Now Search Engine" (http://letsfinditnow.com). The robot runs every 30 days. robot-history: robot-environment: modified-date: modified-by: robot-id: emacs robot-name: Emacs-w3 Search Engine robot-cover-url: http://www.cs.indiana.edu/elisp/w3/docs.html robot-details-url: robot-owner-name: William M. Perry robot-owner-url: http://www.cs.indiana.edu/hyplan/wmperry.html robot-owner-email: wmperry@spry.com robot-status: retired robot-purpose: indexing robot-type: browser robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: yes robot-useragent: Emacs-w3/v[0-9\.]+ robot-language: lisp robot-description: Its purpose is to generate a Resource Discovery database This code has not been looked at in a while, but will be spruced up for the Emacs-w3 2.2.0 release sometime this month. It will honor the /robots.txt file at that time. robot-history: robot-environment: modified-date: Fri May 5 16:09:18 1995 modified-by: robot-id: emcspider robot-name: ananzi robot-cover-url: http://www.empirical.com/ robot-details-url: robot-owner-name: Hunter Payne robot-owner-url: http://www.psc.edu/~hpayne/ robot-owner-email: hpayne@u-media.com robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: bilbo.internal.empirical.com robot-from: yes robot-useragent: EMC Spider robot-language: java This spider is still in the development stages but, it will be hitting sites while I finish debugging it. robot-description: robot-history: robot-environment: modified-date: Wed May 29 14:47:01 1996. modified-by: robot-id: esculapio robot-name: esculapio robot-cover-url: http://esculapio.cype.com robot-details-url: http://esculapio.cype.com/details.htm robot-owner-name: CYPE Ingenieros robot-owner-url: http://www.cype.com robot-owner-email: imasd@cype.com robot-status: active robot-purpose: link validation robot-type: standalone robot-platform: linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: esculapio robot-noindex: yes robot-host: 80.34.92.45 robot-from: yes robot-useragent: esculapio/1.1 robot-language: C++ robot-description: Checks the integrity of the links between several domains. robot-history: First, a research project. Now, an internal tool. Next, ???. robot-environment: research, service modified-date: Mon, 6 Jun 2004 08:25 +1 GMT modified-by: robot-id: esther robot-name: Esther robot-details-url: http://search.falconsoft.com/ robot-cover-url: http://search.falconsoft.com/ robot-owner-name: Tim Gustafson robot-owner-url: http://www.falconsoft.com/ robot-owner-email: tim@falconsoft.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix (FreeBSD 2.2.8) robot-availability: data robot-exclusion: yes robot-exclusion-useragent: esther robot-noindex: no robot-host: *.falconsoft.com robot-from: yes robot-useragent: esther robot-language: perl5 robot-description: This crawler is used to build the search database at http://search.falconsoft.com/ robot-history: Developed by FalconSoft. robot-environment: service modified-date: Tue, 22 Dec 1998 00:22:00 PST robot-id: evliyacelebi robot-name: Evliya Celebi robot-cover-url: http://ilker.ulak.net.tr/EvliyaCelebi robot-details-url: http://ilker.ulak.net.tr/EvliyaCelebi robot-owner-name: Ilker TEMIR robot-owner-url: http://ilker.ulak.net.tr robot-owner-email: ilker@ulak.net.tr robot-status: development robot-purpose: indexing turkish content robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: N/A robot-noindex: no robot-nofollow: no robot-host: 193.140.83.* robot-from: ilker@ulak.net.tr robot-useragent: Evliya Celebi v0.151 - http://ilker.ulak.net.tr robot-language: perl5 robot-history: robot-description: crawles pages under ".tr" domain or having turkish character encoding (iso-8859-9 or windows-1254) robot-environment: hobby modified-date: Fri Mar 31 15:03:12 GMT 2000 robot-id: nzexplorer robot-name: nzexplorer robot-cover-url: http://nzexplorer.co.nz/ robot-details-url: robot-owner-name: Paul Bourke robot-owner-url: http://bourke.gen.nz/paul.html robot-owner-email: paul@bourke.gen.nz robot-status: active robot-purpose: indexing, statistics robot-type: standalone robot-platform: UNIX robot-availability: source (commercial) robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: bitz.co.nz robot-from: no robot-useragent: explorersearch robot-language: c++ robot-history: Started in 1995 to provide a comprehensive index to WWW pages within New Zealand. Now also used in Malaysia and other countries. robot-environment: service modified-date: Tues, 25 Jun 1996 modified-by: Paul Bourke robot-id: fastcrawler robot-name: FastCrawler robot-cover-url: http://www.1klik.dk/omos/ robot-details-url: http://www.1klik.dk/omos/ robot-owner-name: 1klik.dk A/S robot-owner-url: http://www.1klik.dk robot-owner-email: crawler@1klik.dk robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Windows 2000 Adv. Server robot-availability: none robot-exclusion: yes robot-exclusion-useragent: fastcrawler robot-noindex: yes robot-host: 1klik.dk robot-from: yes robot-useragent: FastCrawler 3.0.X (crawler@1klik.dk) - http://www.1klik.dk robot-language: C++ robot-description: FastCrawler is used to build the databases for search engines used by 1klik.dk and it's partners robot-history: Robot started in April 1999 robot-environment: commercial modified-date: 05-08-2001 modified-by: Kim Gam-Jensen robot-id:fdse robot-name:Fluid Dynamics Search Engine robot robot-cover-url:http://www.xav.com/scripts/search/ robot-details-url:http://www.xav.com/scripts/search/ robot-owner-name:Zoltan Milosevic robot-owner-url:http://www.xav.com/ robot-owner-email:zoltanm@nickname.net robot-status:active robot-purpose:indexing robot-type:standalone robot-platform:unix;windows robot-availability:source;data robot-exclusion:yes robot-exclusion-useragent:FDSE robot-noindex:yes robot-host:yes robot-from:* robot-useragent:Mozilla/4.0 (compatible: FDSE robot) robot-language:perl5 robot-description:Crawls remote sites as part of a shareware search engine program robot-history:Developed in late 1998 over three pots of coffee robot-environment:commercial modified-date:Fri, 21 Jan 2000 10:15:49 GMT modified-by:Zoltan Milosevic robot-id: felix robot-name: Felix IDE robot-cover-url: http://www.pentone.com robot-details-url: http://www.pentone.com robot-owner-name: The Pentone Group, Inc. robot-owner-url: http://www.pentone.com robot-owner-email: felix@pentone.com robot-status: active robot-purpose: indexing, statistics robot-type: standalone robot-platform: windows95, windowsNT robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: FELIX IDE robot-noindex: yes robot-host: * robot-from: yes robot-useragent: FelixIDE/1.0 robot-language: visual basic robot-description: Felix IDE is a retail personal search spider sold by The Pentone Group, Inc. It supports the proprietary exclusion "Frequency: ??????????" in the robots.txt file. Question marks represent an integer indicating number of milliseconds to delay between document requests. This is called VDRF(tm) or Variable Document Retrieval Frequency. Note that users can re-define the useragent name. robot-history: This robot began as an in-house tool for the lucrative Felix IDS (Information Discovery Service) and has gone retail. robot-environment: service, commercial, research modified-date: Fri, 11 Apr 1997 19:08:02 GMT modified-by: Kerry B. Rogers robot-id: ferret robot-name: Wild Ferret Web Hopper #1, #2, #3 robot-cover-url: http://www.greenearth.com/ robot-details-url: robot-owner-name: Greg Boswell robot-owner-url: http://www.greenearth.com/ robot-owner-email: ghbos@postoffice.worldnet.att.net robot-status: robot-purpose: indexing maintenance statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: Hazel's Ferret Web hopper, robot-language: C++, Visual Basic, Java robot-description: The wild ferret web hopper's are designed as specific agents to retrieve data from all available sources on the internet. They work in an onion format hopping from spot to spot one level at a time over the internet. The information is gathered into different relational databases, known as "Hazel's Horde". The information is publicly available and will be free for the browsing at www.greenearth.com. Effective date of the data posting is to be announced. robot-history: robot-environment: modified-date: Mon Feb 19 00:28:37 1996. modified-by: robot-id: fetchrover robot-name: FetchRover robot-cover-url: http://www.engsoftware.com/fetch.htm robot-details-url: http://www.engsoftware.com/spiders/ robot-owner-name: Dr. Kenneth R. Wadland robot-owner-url: http://www.engsoftware.com/ robot-owner-email: ken@engsoftware.com robot-status: active robot-purpose: maintenance, statistics robot-type: standalone robot-platform: Windows/NT, Windows/95, Solaris SPARC robot-availability: binary, source robot-exclusion: yes robot-exclusion-useragent: ESI robot-noindex: N/A robot-host: * robot-from: yes robot-useragent: ESIRover v1.0 robot-language: C++ robot-description: FetchRover fetches Web Pages. It is an automated page-fetching engine. FetchRover can be used stand-alone or as the front-end to a full-featured Spider. Its database can use any ODBC compliant database server, including Microsoft Access, Oracle, Sybase SQL Server, FoxPro, etc. robot-history: Used as the front-end to SmartSpider (another Spider product sold by Engineeering Software, Inc.) robot-environment: commercial, service modified-date: Thu, 03 Apr 1997 21:49:50 EST modified-by: Ken Wadland robot-id: fido robot-name: fido robot-cover-url: http://www.planetsearch.com/ robot-details-url: http://www.planetsearch.com/info/fido.html robot-owner-name: Steve DeJarnett robot-owner-url: http://www.planetsearch.com/staff/steved.html robot-owner-email: fido@planetsearch.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: fido robot-noindex: no robot-host: fido.planetsearch.com, *.planetsearch.com, 206.64.113.* robot-from: yes robot-useragent: fido/0.9 Harvest/1.4.pl2 robot-language: c, perl5 robot-description: fido is used to gather documents for the search engine provided in the PlanetSearch service, which is operated by the Philips Multimedia Center. The robots runs on an ongoing basis. robot-history: fido was originally based on the Harvest Gatherer, but has since evolved into a new creature. It still uses some support code from Harvest. robot-environment: service modified-date: Sat, 2 Nov 1996 00:08:18 GMT modified-by: Steve DeJarnett robot-id: finnish robot-name: Hämähäkki robot-cover-url: http://www.fi/search.html robot-details-url: http://www.fi/www/spider.html robot-owner-name: Timo Metsälä robot-owner-url: http://www.fi/~timo/ robot-owner-email: Timo.Metsala@www.fi robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: UNIX robot-availability: no robot-exclusion: yes robot-exclusion-useragent: Hämähäkki robot-noindex: no robot-host: *.www.fi robot-from: yes robot-useragent: Hämähäkki/0.2 robot-language: C robot-description: Its purpose is to generate a Resource Discovery database from the Finnish (top-level domain .fi) www servers. The resulting database is used by the search engine at http://www.fi/search.html. robot-history: (The name Hämähäkki is just Finnish for spider.) robot-environment: modified-date: 1996-06-25 modified-by: Jaakko.Hyvatti@www.fi robot-id: fireball robot-name: KIT-Fireball robot-cover-url: http://www.fireball.de robot-details-url: http://www.fireball.de/technik.html (in German) robot-owner-name: Gruner + Jahr Electronic Media Service GmbH robot-owner-url: http://www.ems.guj.de robot-owner-email:info@fireball.de robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: KIT-Fireball robot-noindex: yes robot-host: *.fireball.de robot-from: yes robot-useragent: KIT-Fireball/2.0 libwww/5.0a robot-language: c robot-description: The Fireball robots gather web documents in German language for the database of the Fireball search service. robot-history: The robot was developed by Benhui Chen in a research project at the Technical University of Berlin in 1996 and was re-implemented by its developer in 1997 for the present owner. robot-environment: service modified-date: Mon Feb 23 11:26:08 1998 modified-by: Detlev Kalb robot-id: fish robot-name: Fish search robot-cover-url: http://www.win.tue.nl/bin/fish-search robot-details-url: robot-owner-name: Paul De Bra robot-owner-url: http://www.win.tue.nl/win/cs/is/debra/ robot-owner-email: debra@win.tue.nl robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: binary robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: www.win.tue.nl robot-from: no robot-useragent: Fish-Search-Robot robot-language: c robot-description: Its purpose is to discover resources on the fly a version exists that is integrated into the Tübingen Mosaic 2.4.2 browser (also written in C) robot-history: Originated as an addition to Mosaic for X robot-environment: modified-date: Mon May 8 09:31:19 1995 modified-by: robot-id: fouineur robot-name: Fouineur robot-cover-url: http://fouineur.9bit.qc.ca/ robot-details-url: http://fouineur.9bit.qc.ca/informations.html robot-owner-name: Joel Vandal robot-owner-url: http://www.9bit.qc.ca/~jvandal/ robot-owner-email: jvandal@9bit.qc.ca robot-status: development robot-purpose: indexing, statistics robot-type: standalone robot-platform: unix, windows robot-availability: none robot-exclusion: yes robot-exclusion-useragent: fouineur robot-noindex: no robot-host: * robot-from: yes robot-useragent: Mozilla/2.0 (compatible fouineur v2.0; fouineur.9bit.qc.ca) robot-language: perl5 robot-description: This robot build automaticaly a database that is used by our own search engine. This robot auto-detect the language (french, english & spanish) used in the HTML page. Each database record generated by this robot include: date, url, title, total words, title, size and de-htmlized text. Also support server-side and client-side IMAGEMAP. robot-history: No robots does all thing that we need for our usage. robot-environment: service modified-date: Thu, 9 Jan 1997 22:57:28 EST modified-by: jvandal@9bit.qc.ca robot-id: francoroute robot-name: Robot Francoroute robot-cover-url: robot-details-url: robot-owner-name: Marc-Antoine Parent robot-owner-url: http://www.crim.ca/~maparent robot-owner-email: maparent@crim.ca robot-status: robot-purpose: indexing, mirroring, statistics robot-type: browser robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: zorro.crim.ca robot-from: yes robot-useragent: Robot du CRIM 1.0a robot-language: perl5, sqlplus robot-description: Part of the RISQ's Francoroute project for researching francophone. Uses the Accept-Language tag and reduces demand accordingly robot-history: robot-environment: modified-date: Wed Jan 10 23:56:22 1996. modified-by: robot-id: freecrawl robot-name: Freecrawl robot-cover-url: http://euroseek.net/ robot-owner-name: Jesper Ekhall robot-owner-email: ekhall@freeside.net robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Freecrawl robot-noindex: no robot-host: *.freeside.net robot-from: yes robot-useragent: Freecrawl robot-language: c robot-description: The Freecrawl robot is used to build a database for the EuroSeek service. robot-environment: service robot-id: funnelweb robot-name: FunnelWeb robot-cover-url: http://funnelweb.net.au robot-details-url: robot-owner-name: David Eagles robot-owner-url: http://www.pc.com.au robot-owner-email: eaglesd@pc.com.au robot-status: robot-purpose: indexing, statisitics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: earth.planets.com.au robot-from: yes robot-useragent: FunnelWeb-1.0 robot-language: c and c++ robot-description: Its purpose is to generate a Resource Discovery database, and generate statistics. Localised South Pacific Discovery and Search Engine, plus distributed operation under development. robot-history: robot-environment: modified-date: Mon Nov 27 21:30:11 1995 modified-by: robot-id: gama robot-name: gammaSpider, FocusedCrawler robot-details-url: http://www.gammasite.com, http://www.gammasite.com/gammaSpider.html robot-cover-url: http://www.gammasite.com robot-owner-name: gammasite robot-owner-url: http://www.gammasite.com robot-owner-email: support@gammasite.com robot-status: active robot-purpose: indexing, maintenance robot-type: standalone robot-platform: unix, windows, windows95, windowsNT, linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: gammaSpider robot-noindex: no robot-nofollow: no robot-host: * robot-from: no robot-useragent: gammaSpider xxxxxxx ()/ robot-language: c++ robot-description: Information gathering. Focused carwling on specific topic. Uses gammaFetcherServer Product for selling. RobotUserAgent may changed by the user. More features are being added. The product is constatnly under development. AKA FocusedCrawler robot-history: AKA FocusedCrawler robot-environment: service, commercial, research modified-date: Sun, 25 Mar 2001 18:49:52 GMT robot-id: gazz robot-name: gazz robot-cover-url: http://gazz.nttrd.com/ robot-details-url: http://gazz.nttrd.com/ robot-owner-name: NTT Cyberspace Laboratories robot-owner-url: http://gazz.nttrd.com/ robot-owner-email: gazz@nttrd.com robot-status: development robot-purpose: statistics robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: gazz robot-noindex: yes robot-host: *.nttrd.com, *.infobee.ne.jp robot-from: yes robot-useragent: gazz/1.0 robot-language: c robot-description: This robot is used for research purposes. robot-history: Its root is TITAN project in NTT. robot-environment: research modified-date: Wed, 09 Jun 1999 10:43:18 GMT modified-by: noto@isl.ntt.co.jp robot-id: gcreep robot-name: GCreep robot-cover-url: http://www.instrumentpolen.se/gcreep/index.html robot-details-url: http://www.instrumentpolen.se/gcreep/index.html robot-owner-name: Instrumentpolen AB robot-owner-url: http://www.instrumentpolen.se/ip-kontor/eng/index.html robot-owner-email: anders@instrumentpolen.se robot-status: development robot-purpose: indexing robot-type: browser+standalone robot-platform: linux+mysql robot-availability: none robot-exclusion: yes robot-exclusion-useragent: gcreep robot-noindex: yes robot-host: mbx.instrumentpolen.se robot-from: yes robot-useragent: gcreep/1.0 robot-language: c robot-description: Indexing robot to learn SQL robot-history: Spare time project begun late '96, maybe early '97 robot-environment: hobby modified-date: Fri, 23 Jan 1998 16:09:00 MET modified-by: Anders Hedstrom robot-id: getbot robot-name: GetBot robot-cover-url: http://www.blacktop.com.zav/bots robot-details-url: robot-owner-name: Alex Zavatone robot-owner-url: http://www.blacktop.com/zav robot-owner-email: zav@macromedia.com robot-status: robot-purpose: maintenance robot-type: standalone robot-platform: robot-availability: robot-exclusion: no. robot-exclusion-useragent: robot-noindex: robot-host: robot-from: no robot-useragent: ??? robot-language: Shockwave/Director. robot-description: GetBot's purpose is to index all the sites it can find that contain Shockwave movies. It is the first bot or spider written in Shockwave. The bot was originally written at Macromedia on a hungover Sunday as a proof of concept. - Alex Zavatone 3/29/96 robot-history: robot-environment: modified-date: Fri Mar 29 20:06:12 1996. modified-by: robot-id: geturl robot-name: GetURL robot-cover-url: http://Snark.apana.org.au/James/GetURL/ robot-details-url: robot-owner-name: James Burton robot-owner-url: http://Snark.apana.org.au/James/ robot-owner-email: James@Snark.apana.org.au robot-status: robot-purpose: maintenance, mirroring robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: no robot-useragent: GetURL.rexx v1.05 robot-language: ARexx (Amiga REXX) robot-description: Its purpose is to validate links, perform mirroring, and copy document trees. Designed as a tool for retrieving web pages in batch mode without the encumbrance of a browser. Can be used to describe a set of pages to fetch, and to maintain an archive or mirror. Is not run by a central site and accessed by clients - is run by the end user or archive maintainer robot-history: robot-environment: modified-date: Tue May 9 15:13:12 1995 modified-by: robot-id: golem robot-name: Golem robot-cover-url: http://www.quibble.com/golem/ robot-details-url: http://www.quibble.com/golem/ robot-owner-name: Geoff Duncan robot-owner-url: http://www.quibble.com/geoff/ robot-owner-email: geoff@quibble.com robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: mac robot-availability: none robot-exclusion: yes robot-exclusion-useragent: golem robot-noindex: no robot-host: *.quibble.com robot-from: yes robot-useragent: Golem/1.1 robot-language: HyperTalk/AppleScript/C++ robot-description: Golem generates status reports on collections of URLs supplied by clients. Designed to assist with editorial updates of Web-related sites or products. robot-history: Personal project turned into a contract service for private clients. robot-environment: service,research modified-date: Wed, 16 Apr 1997 20:50:00 GMT modified-by: Geoff Duncan robot-id: googlebot robot-name: Googlebot robot-cover-url: http://www.googlebot.com/ robot-details-url: http://www.googlebot.com/bot.html robot-owner-name: Google Inc. robot-owner-url: http://www.google.com/ robot-owner-email: googlebot@google.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: googlebot robot-noindex: yes robot-host: googlebot.com robot-from: yes robot-useragent: Googlebot/2.X (+http://www.googlebot.com/bot.html) robot-language: c++ robot-description: Google's crawler robot-history: Developed by Google Inc robot-environment: commercial modified-date: Thu Mar 29 21:00:07 PST 2001 modified-by: googlebot@google.com robot-id: grapnel robot-name: Grapnel/0.01 Experiment robot-cover-url: varies robot-details-url: mailto:v93_kat@ce.kth.se robot-owner-name: Philip Kallerman robot-owner-url: v93_kat@ce.kth.se robot-owner-email: v93_kat@ce.kth.se robot-status: Experimental robot-purpose: Indexing robot-type: robot-platform: WinNT robot-availability: None, yet robot-exclusion: Yes robot-exclusion-useragent: No robot-noindex: No robot-host: varies robot-from: Varies robot-useragent: robot-language: Perl robot-description: Resource Discovery Experimentation robot-history: None, hoping to make some robot-environment: modified-date: modified-by: 7 Feb 1997 robot-id:griffon robot-name:Griffon robot-cover-url:http://navi.ocn.ne.jp/ robot-details-url:http://navi.ocn.ne.jp/griffon/ robot-owner-name:NTT Communications Corporate Users Business Division robot-owner-url:http://navi.ocn.ne.jp/ robot-owner-email:griffon@super.navi.ocn.ne.jp robot-status:active robot-purpose:indexing robot-type:standalone robot-platform:unix robot-availability:none robot-exclusion:yes robot-exclusion-useragent:griffon robot-noindex:yes robot-nofollow:yes robot-host:*.navi.ocn.ne.jp robot-from:yes robot-useragent:griffon/1.0 robot-language:c robot-description:The Griffon robot is used to build database for the OCN navi search service operated by NTT Communications Corporation. It mainly gathers pages written in Japanese. robot-history:Its root is TITAN project in NTT. robot-environment:service modified-date:Mon,25 Jan 2000 15:25:30 GMT modified-by:toka@navi.ocn.ne.jp robot-id: gromit robot-name: Gromit robot-cover-url: http://www.austlii.edu.au/ robot-details-url: http://www2.austlii.edu.au/~dan/gromit/ robot-owner-name: Daniel Austin robot-owner-url: http://www2.austlii.edu.au/~dan/ robot-owner-email: dan@austlii.edu.au robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Gromit robot-noindex: no robot-host: *.austlii.edu.au robot-from: yes robot-useragent: Gromit/1.0 robot-language: perl5 robot-description: Gromit is a Targetted Web Spider that indexes legal sites contained in the AustLII legal links database. robot-history: This robot is based on the Perl5 LWP::RobotUA module. robot-environment: research modified-date: Wed, 11 Jun 1997 03:58:40 GMT modified-by: Daniel Austin robot-id: gulliver robot-name: Northern Light Gulliver robot-cover-url: robot-details-url: robot-owner-name: Mike Mulligan robot-owner-url: robot-owner-email: crawler@northernlight.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: gulliver robot-noindex: yes robot-host: scooby.northernlight.com, taz.northernlight.com, gulliver.northernlight.com robot-from: yes robot-useragent: Gulliver/1.1 robot-language: c robot-description: Gulliver is a robot to be used to collect web pages for indexing and subsequent searching of the index. robot-history: Oct 1996: development; Dec 1996-Jan 1997: crawl & debug; Mar 1997: crawl again; robot-environment: service modified-date: Wed, 21 Apr 1999 16:00:00 GMT modified-by: Mike Mulligan robot-id: gulperbot robot-name: Gulper Bot robot-cover-url: http://yuntis.ecsl.cs.sunysb.edu/ robot-details-url: http://yuntis.ecsl.cs.sunysb.edu/help/robot/ robot-owner-name: Maxim Lifantsev robot-owner-url: http://www.cs.sunysb.edu/~maxim/ robot-owner-email: gulperbot@ecsl.cs.sunysb.edu robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: gulper robot-noindex: yes robot-nofollow: yes robot-host: yuntis*.ecsl.cs.sunysb.edu robot-from: no robot-useragent: Gulper Web Bot 0.2.4 (www.ecsl.cs.sunysb.edu/~maxim/cgi-bin/Link/GulperBot) robot-language: c++ robot-description: The Gulper Bot is used to collect data for the Yuntis research search engine project. robot-history: Developed in a research project at SUNY Stony Brook. robot-environment: research modified-date: Tue, 28 Aug 2001 21:40:47 GMT modified-by: maxim@cs.sunysb.edu robot-id: hambot robot-name: HamBot robot-cover-url: http://www.hamrad.com/search.html robot-details-url: http://www.hamrad.com/ robot-owner-name: John Dykstra robot-owner-url: robot-owner-email: john@futureone.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix, Windows95 robot-availability: none robot-exclusion: yes robot-exclusion-useragent: hambot robot-noindex: yes robot-host: *.hamrad.com robot-from: robot-useragent: robot-language: perl5, C++ robot-description: Two HamBot robots are used (stand alone & browser based) to aid in building the database for HamRad Search - The Search Engine for Search Engines. The robota are run intermittently and perform nearly identical functions. robot-history: A non commercial (hobby?) project to aid in building and maintaining the database for the the HamRad search engine. robot-environment: service modified-date: Fri, 17 Apr 1998 21:44:00 GMT modified-by: JD robot-id: harvest robot-name: Harvest robot-cover-url: http://harvest.cs.colorado.edu robot-details-url: robot-owner-name: robot-owner-url: robot-owner-email: robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: bruno.cs.colorado.edu robot-from: yes robot-useragent: yes robot-language: robot-description: Harvest's motivation is to index community- or topic- specific collections, rather than to locate and index all HTML objects that can be found. Also, Harvest allows users to control the enumeration several ways, including stop lists and depth and count limits. Therefore, Harvest provides a much more controlled way of indexing the Web than is typical of robots. Pauses 1 second between requests (by default). robot-history: robot-environment: modified-date: modified-by: robot-id: havindex robot-name: havIndex