jwm.robotstxt.googlebot

This file implements the standard defined by the Robots Exclusion Protocol (REP) internet draft (I-D).

Google doesn’t follow the standard strictly, because there are a lot of non-conforming robots.txt files out there, and we err on the side of disallowing when this seems intended.

An more user-friendly description of how Google handles robots.txt can be found at:

This library provides a low-level parser for robots.txt (ParseRobotsTxt()), and a matcher for URLs against a robots.txt (class RobotsMatcher).

Functions

ParseRobotsTxt(robots_body, parse_callback)

Parses body of a robots.txt and emits parse callbacks.

Classes

RobotsMatcher

RobotsMatcher - matches robots.txt against URLs.

RobotsParseHandler

Handler for directives found in robots.txt.