Package org.apache.droids.norobots

Using norobots-rfc

See:
          Description

Interface Summary
ContentLoader An abstract loader intended for retrieving content identified by a URI.
Rule A robots.txt rule.
 

Class Summary
NoRobotClient A Client which may be used to decide which urls on a website may be looked at, according to the norobots specification located at: http://www.robotstxt.org/wc/norobots-rfc.html
SimpleContentLoader A simple implementation of ContentLoader based on URLConnection.
 

Exception Summary
NoRobotException Application exception for anything that might go wrong in the checking of a robots.txt file.
 

Package org.apache.droids.norobots Description

Using norobots-rfc

  1. Import the class import org.apache.http.norobots.NoRobotClient;
  2. Create an instance for your user-agent NoRobotClient nrc = NoRobotClient("googlebot");
  3. Parse a robots.txt at a site nrc.parse( new URL( "http://www.apache.org/" ) );
  4. Ask if a url is allowed boolean test = nrc.isUrlAllowed( new URL( "http://www.apache.org/index.html" ) );



Copyright © 2007-2009. All Rights Reserved.