3 � f�"�@s\dZddlZddlZddlZdgZejdd�ZGdd�d�ZGdd�d�Z Gd d �d �Z dS) a% robotparser.py Copyright (C) 2000 Bastian Kleineidam You can choose between two licenses when using this package: 1) GNU GPLv2 2) PSF license for Python 2.2 The robots.txt Exclusion Protocol is implemented as specified in http://www.robotstxt.org/norobots-rfc.txt �N�RobotFileParser� RequestRatezrequests secondsc@sjeZdZdZddd�Zdd�Zdd�Zd d �Zd d �Zd d�Z dd�Z dd�Z dd�Z dd�Z dd�ZdS)rzs This class provides a set of methods to read, parse and answer questions about a single robots.txt file. �cCs,g|_d|_d|_d|_|j|�d|_dS)NFr)�entries� default_entry� disallow_all� allow_all�set_url� last_checked)�self�url�r �7/opt/alt/python36/lib64/python3.6/urllib/robotparser.py�__init__s  zRobotFileParser.__init__cCs|jS)z�Returns the time the robots.txt file was last fetched. This is useful for long-running web spiders that need to check for new robots.txt files periodically. )r )r r r r�mtime$szRobotFileParser.mtimecCsddl}|j�|_dS)zYSets the time the robots.txt file was last fetched to the current time. rN)�timer )r rr r r�modified-szRobotFileParser.modifiedcCs&||_tjj|�dd�\|_|_dS)z,Sets the URL referring to a robots.txt file.��N)r �urllib�parse�urlparse�host�path)r r r r rr 5szRobotFileParser.set_urlcCs�ytjj|j�}WnRtjjk rd}z2|jdkr:d|_n|jdkrT|jdkrTd|_WYdd}~XnX|j �}|j |j d�j ��dS) z4Reads the robots.txt URL and feeds it to the parser.��Ti�i�Nzutf-8)rr) rZrequestZurlopenr �errorZ HTTPError�coderr�readr�decode� splitlines)r �f�err�rawr r rr:s zRobotFileParser.readcCs,d|jkr|jdkr(||_n |jj|�dS)N�*)� useragentsrr�append)r �entryr r r� _add_entryGs  zRobotFileParser._add_entrycCs6d}t�}|j��x|D�]�}|sT|dkr8t�}d}n|dkrT|j|�t�}d}|jd�}|dkrr|d|�}|j�}|s�q|jdd�}t|�dkr|dj�j�|d<tj j |dj��|d<|ddk�r|dkr�|j|�t�}|j j |d�d}q|ddk�r4|dk�r|j j t|dd ��d}q|dd k�rh|dk�r|j j t|dd ��d}q|dd k�r�|dk�r|dj�j��r�t|d�|_d}q|dd kr|dkr|djd�}t|�dk�r|dj�j��r|dj�j��rtt|d�t|d��|_d}qW|dk�r2|j|�dS)z�Parse the input lines from a robots.txt file. We allow that a user-agent: line is not preceded by one or more blank lines. rr��#N�:z user-agentZdisallowFZallowTz crawl-delayz request-rate�/)�Entryrr(�find�strip�split�len�lowerrr�unquoter%r&� rulelines�RuleLine�isdigit�int�delayr�req_rate)r �lines�stater'�line�iZnumbersr r rrPsd             zRobotFileParser.parsecCs�|jr dS|jrdS|jsdStjjtjj|��}tjjdd|j|j |j |j f�}tjj |�}|sfd}x"|j D]}|j|�rn|j|�SqnW|jr�|jj|�SdS)z=using the parsed robots.txt decide if useragent can fetch urlFTrr,)rrr rrrr3� urlunparserZparamsZqueryZfragment�quoter� applies_to� allowancer)r � useragentr Z parsed_urlr'r r r� can_fetch�s$    zRobotFileParser.can_fetchcCs4|j�s dSx|jD]}|j|�r|jSqW|jjS)N)rrr@r8r)r rBr'r r r� crawl_delay�s    zRobotFileParser.crawl_delaycCs4|j�s dSx|jD]}|j|�r|jSqW|jjS)N)rrr@r9r)r rBr'r r r� request_rate�s    zRobotFileParser.request_ratecCs0|j}|jdk r||jg}djtt|��dS)N� )rr�join�map�str)r rr r r�__str__�s  zRobotFileParser.__str__N)r)�__name__� __module__� __qualname__�__doc__rrrr rr(rrCrDrErJr r r rrs    Cc@s(eZdZdZdd�Zdd�Zdd�ZdS) r5zoA rule line is a single "Allow:" (allowance==True) or "Disallow:" (allowance==False) followed by a path.cCs>|dkr| rd}tjjtjj|��}tjj|�|_||_dS)NrT)rrr>rr?rrA)r rrAr r rr�s zRuleLine.__init__cCs|jdkp|j|j�S)Nr$)r� startswith)r �filenamer r rr@�szRuleLine.applies_tocCs|jr dndd|jS)NZAllowZDisallowz: )rAr)r r r rrJ�szRuleLine.__str__N)rKrLrMrNrr@rJr r r rr5�sr5c@s0eZdZdZdd�Zdd�Zdd�Zdd �Zd S) r-z?An entry has one or more user-agents and zero or more rulelinescCsg|_g|_d|_d|_dS)N)r%r4r8r9)r r r rr�szEntry.__init__cCs�g}x|jD]}|jd|���q W|jdk r@|jd|j���|jdk rj|j}|jd|j�d|j���|jtt|j ��|jd�dj |�S)Nz User-agent: z Crawl-delay: zRequest-rate: r,rrF) r%r&r8r9ZrequestsZseconds�extendrHrIr4rG)r Zret�agentZrater r rrJ�s    z Entry.__str__cCsF|jd�dj�}x.|jD]$}|dkr*dS|j�}||krdSqWdS)z2check if this entry applies to the specified agentr,rr$TF)r0r2r%)r rBrRr r rr@�s zEntry.applies_tocCs$x|jD]}|j|�r|jSqWdS)zZPreconditions: - our agent applies to this entry - filename is URL decodedT)r4r@rA)r rPr<r r rrA�s   zEntry.allowanceN)rKrLrMrNrrJr@rAr r r rr-�s   r-) rN� collectionsZ urllib.parserZurllib.request�__all__� namedtuplerrr5r-r r r r�<module> s 2