RobotsTxtParser
Parser and cache for robots.txt files.
Supports:
- User-agent directive
- Disallow directive
- Allow directive
- Crawl-delay directive
- Wildcard patterns (* and $)
Attributes
- Graph
-
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
RobotsTxtParser.type
Members list
Type members
Classlikes
Parsed robots.txt rules for a domain.
Parsed robots.txt rules for a domain.
Value parameters
- allowRules
-
Paths that are explicitly allowed
- crawlDelay
-
Suggested delay between requests in seconds
- disallowRules
-
Paths that are disallowed
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Value members
Concrete methods
Clear the robots.txt cache (for testing or memory management).
Clear the robots.txt cache (for testing or memory management).
Attributes
Get parsed robots.txt rules for a URL.
Get parsed robots.txt rules for a URL.
Value parameters
- timeoutMs
-
Request timeout
- url
-
URL to check
- userAgent
-
User agent string
Attributes
- Returns
-
Parsed rules for this user agent
Check if a URL is allowed according to robots.txt.
Check if a URL is allowed according to robots.txt.
Fetches and caches robots.txt for the domain if not already cached.
Value parameters
- timeoutMs
-
Request timeout
- url
-
URL to check
- userAgent
-
User agent string
Attributes
- Returns
-
true if URL is allowed
Parse robots.txt content for a specific user agent.
Parse robots.txt content for a specific user agent.
Follows standard robots.txt parsing rules:
- Look for matching User-agent group or "*"
- Collect Allow/Disallow rules
- Parse Crawl-delay