Custom Software Development Services

Bots Identification

The most reliable method for crawlers' identification is by IP address. This method can be used for the most popular search engines, though their addresses may change occasionally. Unfortunately, such method is not applicable to distributed robots (utilities for sites downloading, personal search systems, pilot bots), which can come to the site from virtually any IP address.

Some crawlers can be detected on the basis of the user agent information if you keep the corresponding list. Some bots disguise themselves as popular browsers (IE, Mozilla) or actually are those browsers (for example, when IE downloads a site to make it available offline). The latter case can be resolved using adaptive methods which analyze the behavior of a remote client. If many pages are requested from single IP in a very short period of time (for example 50 pages in one minute) most probably this is a robot. Apart from that, you can consider several indirect signs, like an empty referrer field or robots.txt file retrieved. Such a multi-level crawler identification scheme turned out to be highly effective in practice.

Offices/Addresses

New York
(Head Office) DataArt

New York, USA
Tel: (212) 378-4108
New-York@dataart.com

London DataArt

London, UK
Tel.: +44 (0) 20 7099 9464
uk-sales@dataart.com

Software Development Centers:

St.Petersburg
Russia DataArt

St. Petersburg, Russia
Tel: +1 (212) 461-3661
Tel: +7 (812) 333-4440
Russia@dataart.com

Voronezh
Russia DataArt

Voronezh, Russia
Tel: +7 (4732) 604-172
Russia@dataart.com

Kharkiv
Ukraine DataArt

Kharkiv, Ukraine
Tel: +380 (57) 727-0827
ukraine@dataart.com

Kherson
Ukraine DataArt

Kherson, Ukraine
Tel: +380 (552) 34-21-19
ukraine@dataart.com

Representative Offices:

Florida DataArt

Florida, USA
Tel: (904) 249-2753
Florida@dataart.com

Research Triangle Park North Carolina DataArt

North Carolina, USA
Tel: (919) 619-2398
rtp@dataart.com

San Diego
California DataArt

San Diego, USA
Tel: (858) 350-9307
California@dataart.com

Media about DataArt

More Media about DataArt...

DataArt News

More DataArt News...