1 2:mod:`robotparser` --- Parser for robots.txt 3============================================= 4 5.. module:: robotparser 6 :synopsis: Loads a robots.txt file and answers questions about 7 fetchability of other URLs. 8.. sectionauthor:: Skip Montanaro <skip@pobox.com> 9 10 11.. index:: 12 single: WWW 13 single: World Wide Web 14 single: URL 15 single: robots.txt 16 17.. note:: 18 The :mod:`robotparser` module has been renamed :mod:`urllib.robotparser` in 19 Python 3. 20 The :term:`2to3` tool will automatically adapt imports when converting 21 your sources to Python 3. 22 23This module provides a single class, :class:`RobotFileParser`, which answers 24questions about whether or not a particular user agent can fetch a URL on the 25Web site that published the :file:`robots.txt` file. For more details on the 26structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html. 27 28 29.. class:: RobotFileParser(url='') 30 31 This class provides methods to read, parse and answer questions about the 32 :file:`robots.txt` file at *url*. 33 34 35 .. method:: set_url(url) 36 37 Sets the URL referring to a :file:`robots.txt` file. 38 39 40 .. method:: read() 41 42 Reads the :file:`robots.txt` URL and feeds it to the parser. 43 44 45 .. method:: parse(lines) 46 47 Parses the lines argument. 48 49 50 .. method:: can_fetch(useragent, url) 51 52 Returns ``True`` if the *useragent* is allowed to fetch the *url* 53 according to the rules contained in the parsed :file:`robots.txt` 54 file. 55 56 57 .. method:: mtime() 58 59 Returns the time the ``robots.txt`` file was last fetched. This is 60 useful for long-running web spiders that need to check for new 61 ``robots.txt`` files periodically. 62 63 64 .. method:: modified() 65 66 Sets the time the ``robots.txt`` file was last fetched to the current 67 time. 68 69The following example demonstrates basic use of the RobotFileParser class. :: 70 71 >>> import robotparser 72 >>> rp = robotparser.RobotFileParser() 73 >>> rp.set_url("http://www.musi-cal.com/robots.txt") 74 >>> rp.read() 75 >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco") 76 False 77 >>> rp.can_fetch("*", "http://www.musi-cal.com/") 78 True 79 80