14adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao"""An extensible library for opening URLs using a variety of protocols 24adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 34adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoThe simplest way to use this module is to call the urlopen function, 44adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaowhich accepts a string containing a URL or a Request object (described 54adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaobelow). It opens the URL and returns the results as file-like 64adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoobject; the returned object has some extra methods described below. 74adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 84adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoThe OpenerDirector manages a collection of Handler objects that do 94adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoall the actual work. Each Handler implements a particular protocol or 104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaooption. The OpenerDirector is a composite object that invokes the 114adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoHandlers needed to open the requested URL. For example, the 124adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoHTTPHandler performs HTTP GET and POST requests and deals with 134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaonon-error returns. The HTTPRedirectHandler automatically deals with 144adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoHTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler 154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodeals with digest authentication. 164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaourlopen(url, data=None) -- Basic usage is the same as original 184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaourllib. pass the url and optionally data to post to an HTTP URL, and 194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoget a file-like object back. One difference is that you can also pass 204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoa Request instance instead of URL. Raises a URLError (subclass of 214adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoIOError); for HTTP errors, raises an HTTPError, which can also be 224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaotreated as a valid response. 234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaobuild_opener -- Function that creates a new OpenerDirector instance. 254adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoWill install the default handlers. Accepts one or more Handlers as 264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoarguments, either instances or Handler classes that it will 274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoinstantiate. If one of the argument is a subclass of the default 284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaohandler, the argument will be installed instead of the default. 294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoinstall_opener -- Installs a new opener as the default opener. 314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoobjects of interest: 334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 344adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoOpenerDirector -- Sets up the User Agent as the Python-urllib client and manages 354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaothe Handler classes, while dealing with requests and responses. 364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 374adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoRequest -- An object that encapsulates the state of a request. The 384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaostate can be as simple as the URL. It can also include extra HTTP 394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoheaders, e.g. a User-Agent. 404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 414adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoBaseHandler -- 424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoexceptions: 444adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoURLError -- A subclass of IOError, individual protocols have their own 454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaospecific subclass. 464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 474adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoHTTPError -- Also a valid HTTP response, so you can treat an HTTP error 484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoas an exceptional event or valid response. 494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaointernals: 514adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoBaseHandler and parent 524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao_call_chain conventions 534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 544adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoExample usage: 554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport urllib2 574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# set up authentication info 594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoauthinfo = urllib2.HTTPBasicAuthHandler() 604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoauthinfo.add_password(realm='PDQ Application', 614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao uri='https://mahler:8092/site-updates.py', 624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao user='klem', 634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao passwd='geheim$parole') 644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoproxy_support = urllib2.ProxyHandler({"http" : "http://ahad-haam:3128"}) 664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# build a new opener that adds authentication and caching FTP handlers 684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoopener = urllib2.build_opener(proxy_support, authinfo, urllib2.CacheFTPHandler) 694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# install it 714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaourllib2.install_opener(opener) 724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaof = urllib2.urlopen('http://www.python.org/') 744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao""" 774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# XXX issues: 794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# If an authentication error handler that tries to perform 804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# authentication for some reason but fails, how should the error be 814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# signalled? The client needs to know the HTTP error code. But if 824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# the handler knows that the problem was, e.g., that it didn't know 834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# that hash algo that requested in the challenge, it would be good to 844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# pass that information along to the client, too. 854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# ftp errors aren't handled cleanly 864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# check digest against correct (i.e. non-apache) implementation 874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# Possible extensions: 894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# complex proxies XXX not sure what exactly was meant by this 904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# abstract factory for opener 914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport base64 934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport hashlib 944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport httplib 954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport mimetools 964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport os 974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport posixpath 984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport random 994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport re 1004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport socket 1014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport sys 1024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport time 1034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport urlparse 1044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport bisect 1054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport warnings 1064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaotry: 1084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao from cStringIO import StringIO 1094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoexcept ImportError: 1104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao from StringIO import StringIO 1114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaofrom urllib import (unwrap, unquote, splittype, splithost, quote, 1134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao addinfourl, splitport, splittag, toBytes, 1144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao splitattr, ftpwrapper, splituser, splitpasswd, splitvalue) 1154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# support for FileHandler, proxies via environment variables 1174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaofrom urllib import localhost, url2pathname, getproxies, proxy_bypass 1184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# used in User-Agent header sent 1204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao__version__ = sys.version[:3] 1214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao_opener = None 1234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT): 1244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao global _opener 1254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if _opener is None: 1264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao _opener = build_opener() 1274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return _opener.open(url, data, timeout) 1284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef install_opener(opener): 1304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao global _opener 1314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao _opener = opener 1324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# do these error classes make sense? 1344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# make sure all of the IOError stuff is overridden. we just want to be 1354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# subtypes. 1364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass URLError(IOError): 1384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # URLError is a sub-type of IOError, but it doesn't share any of 1394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # the implementation. need to override __init__ and __str__. 1404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # It sets self.args for compatibility with other EnvironmentError 1414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # subclasses, but args doesn't have the typical format with errno in 1424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # slot 0 and strerror in slot 1. This may be better than nothing. 1434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def __init__(self, reason): 1444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.args = reason, 1454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.reason = reason 1464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def __str__(self): 1484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return '<urlopen error %s>' % self.reason 1494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPError(URLError, addinfourl): 1514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Raised when HTTP error occurs, but also acts like non-error return""" 1524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao __super_init = addinfourl.__init__ 1534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def __init__(self, url, code, msg, hdrs, fp): 1554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.code = code 1564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.msg = msg 1574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.hdrs = hdrs 1584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.fp = fp 1594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.filename = url 1604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # The addinfourl classes depend on fp being a valid file 1614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # object. In some cases, the HTTPError may not have a valid 1624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # file object. If this happens, the simplest workaround is to 1634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # not initialize the base classes. 1644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if fp is not None: 1654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.__super_init(fp, hdrs, url, code) 1664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def __str__(self): 1684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return 'HTTP Error %s: %s' % (self.code, self.msg) 1694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # since URLError specifies a .reason attribute, HTTPError should also 1714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # provide this attribute. See issue13211 fo discussion. 1724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao @property 1734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def reason(self): 1744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.msg 1754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def info(self): 1774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.hdrs 1784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# copied from cookielib.py 1804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao_cut_port_re = re.compile(r":\d+$") 1814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef request_host(request): 1824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Return request-host, as defined by RFC 2965. 1834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao Variation from RFC: returned value is lowercased, for convenient 1854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao comparison. 1864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """ 1884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao url = request.get_full_url() 1894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao host = urlparse.urlparse(url)[1] 1904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if host == "": 1914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao host = request.get_header("Host", "") 1924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # remove port, if present 1944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao host = _cut_port_re.sub("", host, 1) 1954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return host.lower() 1964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass Request: 1984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 1994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def __init__(self, url, data=None, headers={}, 2004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao origin_req_host=None, unverifiable=False): 2014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # unwrap('<URL:type://host/path>') --> 'type://host/path' 2024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.__original = unwrap(url) 2034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.__original, self.__fragment = splittag(self.__original) 2044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.type = None 2054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # self.__r_type is what's left after doing the splittype 2064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.host = None 2074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.port = None 2084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self._tunnel_host = None 2094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.data = data 2104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.headers = {} 2114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for key, value in headers.items(): 2124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.add_header(key, value) 2134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.unredirected_hdrs = {} 2144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if origin_req_host is None: 2154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao origin_req_host = request_host(self) 2164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.origin_req_host = origin_req_host 2174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.unverifiable = unverifiable 2184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def __getattr__(self, attr): 2204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX this is a fallback mechanism to guard against these 2214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # methods getting called in a non-standard order. this may be 2224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # too complicated and/or unnecessary. 2234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX should the __r_XXX attributes be public? 2244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if attr[:12] == '_Request__r_': 2254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao name = attr[12:] 2264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if hasattr(Request, 'get_' + name): 2274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao getattr(self, 'get_' + name)() 2284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return getattr(self, attr) 2294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise AttributeError, attr 2304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def get_method(self): 2324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if self.has_data(): 2334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return "POST" 2344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 2354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return "GET" 2364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX these helper methods are lame 2384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def add_data(self, data): 2404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.data = data 2414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def has_data(self): 2434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.data is not None 2444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def get_data(self): 2464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.data 2474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def get_full_url(self): 2494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if self.__fragment: 2504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return '%s#%s' % (self.__original, self.__fragment) 2514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 2524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.__original 2534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def get_type(self): 2554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if self.type is None: 2564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.type, self.__r_type = splittype(self.__original) 2574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if self.type is None: 2584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise ValueError, "unknown url type: %s" % self.__original 2594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.type 2604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def get_host(self): 2624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if self.host is None: 2634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.host, self.__r_host = splithost(self.__r_type) 2644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if self.host: 2654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.host = unquote(self.host) 2664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.host 2674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def get_selector(self): 2694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.__r_host 2704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def set_proxy(self, host, type): 2724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if self.type == 'https' and not self._tunnel_host: 2734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self._tunnel_host = self.host 2744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 2754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.type = type 2764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.__r_host = self.__original 2774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.host = host 2794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def has_proxy(self): 2814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.__r_host == self.__original 2824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def get_origin_req_host(self): 2844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.origin_req_host 2854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def is_unverifiable(self): 2874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.unverifiable 2884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def add_header(self, key, val): 2904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # useful for something like authentication 2914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.headers[key.capitalize()] = val 2924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def add_unredirected_header(self, key, val): 2944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # will not be added to a redirected request 2954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.unredirected_hdrs[key.capitalize()] = val 2964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 2974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def has_header(self, header_name): 2984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return (header_name in self.headers or 2994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao header_name in self.unredirected_hdrs) 3004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 3014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def get_header(self, header_name, default=None): 3024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.headers.get( 3034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao header_name, 3044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.unredirected_hdrs.get(header_name, default)) 3054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 3064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def header_items(self): 3074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao hdrs = self.unredirected_hdrs.copy() 3084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao hdrs.update(self.headers) 3094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return hdrs.items() 3104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 3114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass OpenerDirector: 3124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def __init__(self): 3134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao client_version = "Python-urllib/%s" % __version__ 3144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.addheaders = [('User-agent', client_version)] 3154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # self.handlers is retained only for backward compatibility 3164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.handlers = [] 3174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # manage the individual handlers 3184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.handle_open = {} 3194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.handle_error = {} 3204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.process_response = {} 3214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.process_request = {} 3224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 3234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def add_handler(self, handler): 3244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not hasattr(handler, "add_parent"): 3254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise TypeError("expected BaseHandler instance, got %r" % 3264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao type(handler)) 3274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 3284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao added = False 3294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for meth in dir(handler): 3304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if meth in ["redirect_request", "do_open", "proxy_open"]: 3314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # oops, coincidental match 3324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao continue 3334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 3344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao i = meth.find("_") 3354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao protocol = meth[:i] 3364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao condition = meth[i+1:] 3374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 3384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if condition.startswith("error"): 3394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao j = condition.find("_") + i + 1 3404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao kind = meth[j+1:] 3414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao try: 3424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao kind = int(kind) 3434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao except ValueError: 3444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao pass 3454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao lookup = self.handle_error.get(protocol, {}) 3464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.handle_error[protocol] = lookup 3474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao elif condition == "open": 3484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao kind = protocol 3494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao lookup = self.handle_open 3504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao elif condition == "response": 3514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao kind = protocol 3524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao lookup = self.process_response 3534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao elif condition == "request": 3544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao kind = protocol 3554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao lookup = self.process_request 3564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 3574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao continue 3584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 3594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao handlers = lookup.setdefault(kind, []) 3604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if handlers: 3614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao bisect.insort(handlers, handler) 3624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 3634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao handlers.append(handler) 3644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao added = True 3654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 3664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if added: 3674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao bisect.insort(self.handlers, handler) 3684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao handler.add_parent(self) 3694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 3704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def close(self): 3714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # Only exists for backwards compatibility. 3724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao pass 3734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 3744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def _call_chain(self, chain, kind, meth_name, *args): 3754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # Handlers raise an exception if no one else should try to handle 3764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # the request, or return None if they can't but another handler 3774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # could. Otherwise, they return the response. 3784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao handlers = chain.get(kind, ()) 3794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for handler in handlers: 3804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao func = getattr(handler, meth_name) 3814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 3824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao result = func(*args) 3834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if result is not None: 3844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return result 3854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 3864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def open(self, fullurl, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT): 3874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # accept a URL or a Request object 3884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if isinstance(fullurl, basestring): 3894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao req = Request(fullurl, data) 3904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 3914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao req = fullurl 3924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if data is not None: 3934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao req.add_data(data) 3944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 3954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao req.timeout = timeout 3964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao protocol = req.get_type() 3974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 3984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # pre-process request 3994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao meth_name = protocol+"_request" 4004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for processor in self.process_request.get(protocol, []): 4014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao meth = getattr(processor, meth_name) 4024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao req = meth(req) 4034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao response = self._open(req, data) 4054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # post-process response 4074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao meth_name = protocol+"_response" 4084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for processor in self.process_response.get(protocol, []): 4094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao meth = getattr(processor, meth_name) 4104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao response = meth(req, response) 4114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return response 4134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def _open(self, req, data=None): 4154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao result = self._call_chain(self.handle_open, 'default', 4164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 'default_open', req) 4174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if result: 4184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return result 4194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao protocol = req.get_type() 4214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao result = self._call_chain(self.handle_open, protocol, protocol + 4224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao '_open', req) 4234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if result: 4244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return result 4254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self._call_chain(self.handle_open, 'unknown', 4274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 'unknown_open', req) 4284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def error(self, proto, *args): 4304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if proto in ('http', 'https'): 4314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX http[s] protocols are special-cased 4324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao dict = self.handle_error['http'] # https is not different than http 4334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao proto = args[2] # YUCK! 4344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao meth_name = 'http_error_%s' % proto 4354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao http_err = 1 4364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao orig_args = args 4374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 4384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao dict = self.handle_error 4394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao meth_name = proto + '_error' 4404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao http_err = 0 4414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao args = (dict, proto, meth_name) + args 4424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao result = self._call_chain(*args) 4434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if result: 4444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return result 4454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if http_err: 4474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao args = (dict, 'default', 'http_error_default') + orig_args 4484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self._call_chain(*args) 4494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# XXX probably also want an abstract factory that knows when it makes 4514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# sense to skip a superclass in favor of a subclass and when it might 4524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# make sense to include both 4534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef build_opener(*handlers): 4554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Create an opener object from a list of handlers. 4564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao The opener will use several default handlers, including support 4584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for HTTP, FTP and when applicable, HTTPS. 4594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao If any of the handlers passed as arguments are subclasses of the 4614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao default handlers, the default handlers will not be used. 4624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """ 4634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao import types 4644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def isclass(obj): 4654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return isinstance(obj, (types.ClassType, type)) 4664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao opener = OpenerDirector() 4684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao default_classes = [ProxyHandler, UnknownHandler, HTTPHandler, 4694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao HTTPDefaultErrorHandler, HTTPRedirectHandler, 4704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao FTPHandler, FileHandler, HTTPErrorProcessor] 4714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if hasattr(httplib, 'HTTPS'): 4724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao default_classes.append(HTTPSHandler) 4734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao skip = set() 4744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for klass in default_classes: 4754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for check in handlers: 4764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if isclass(check): 4774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if issubclass(check, klass): 4784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao skip.add(klass) 4794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao elif isinstance(check, klass): 4804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao skip.add(klass) 4814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for klass in skip: 4824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao default_classes.remove(klass) 4834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for klass in default_classes: 4854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao opener.add_handler(klass()) 4864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for h in handlers: 4884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if isclass(h): 4894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao h = h() 4904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao opener.add_handler(h) 4914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return opener 4924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass BaseHandler: 4944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao handler_order = 500 4954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def add_parent(self, parent): 4974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.parent = parent 4984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 4994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def close(self): 5004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # Only exists for backwards compatibility 5014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao pass 5024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 5034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def __lt__(self, other): 5044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not hasattr(other, "handler_order"): 5054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # Try to preserve the old behavior of having custom classes 5064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # inserted after default ones (works only for custom user 5074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # classes which are not aware of handler_order). 5084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return True 5094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.handler_order < other.handler_order 5104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 5114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 5124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPErrorProcessor(BaseHandler): 5134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Process HTTP error responses.""" 5144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao handler_order = 1000 # after all other processing 5154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 5164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def http_response(self, request, response): 5174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao code, msg, hdrs = response.code, response.msg, response.info() 5184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 5194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # According to RFC 2616, "2xx" code indicates that the client's 5204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # request was successfully received, understood, and accepted. 5214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not (200 <= code < 300): 5224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao response = self.parent.error( 5234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 'http', request, response, code, msg, hdrs) 5244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 5254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return response 5264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 5274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao https_response = http_response 5284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 5294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPDefaultErrorHandler(BaseHandler): 5304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def http_error_default(self, req, fp, code, msg, hdrs): 5314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) 5324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 5334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPRedirectHandler(BaseHandler): 5344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # maximum number of redirections to any single URL 5354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # this is needed because of the state that cookies introduce 5364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao max_repeats = 4 5374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # maximum total number of redirections (regardless of URL) before 5384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # assuming we're in a loop 5394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao max_redirections = 10 5404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 5414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def redirect_request(self, req, fp, code, msg, headers, newurl): 5424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Return a Request or None in response to a redirect. 5434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 5444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao This is called by the http_error_30x methods when a 5454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao redirection response is received. If a redirection should 5464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao take place, return a new Request to allow http_error_30x to 5474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao perform the redirect. Otherwise, raise HTTPError if no-one 5484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else should try to handle this url. Return None if you can't 5494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao but another Handler might. 5504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """ 5514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao m = req.get_method() 5524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if (code in (301, 302, 303, 307) and m in ("GET", "HEAD") 5534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao or code in (301, 302, 303) and m == "POST"): 5544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # Strictly (according to RFC 2616), 301 or 302 in response 5554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # to a POST MUST NOT cause a redirection without confirmation 5564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # from the user (of urllib2, in this case). In practice, 5574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # essentially all clients do redirect in this case, so we 5584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # do the same. 5594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # be conciliant with URIs containing a space 5604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao newurl = newurl.replace(' ', '%20') 5614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao newheaders = dict((k,v) for k,v in req.headers.items() 5624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if k.lower() not in ("content-length", "content-type") 5634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao ) 5644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return Request(newurl, 5654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao headers=newheaders, 5664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao origin_req_host=req.get_origin_req_host(), 5674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao unverifiable=True) 5684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 5694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise HTTPError(req.get_full_url(), code, msg, headers, fp) 5704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 5714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # Implementation note: To avoid the server sending us into an 5724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # infinite loop, the request object needs to track what URLs we 5734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # have already seen. Do this by adding a handler-specific 5744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # attribute to the Request object. 5754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def http_error_302(self, req, fp, code, msg, headers): 5764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # Some servers (incorrectly) return multiple Location headers 5774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # (so probably same goes for URI). Use first header. 5784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if 'location' in headers: 5794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao newurl = headers.getheaders('location')[0] 5804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao elif 'uri' in headers: 5814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao newurl = headers.getheaders('uri')[0] 5824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 5834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return 5844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 5854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # fix a possible malformed URL 5864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao urlparts = urlparse.urlparse(newurl) 5874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not urlparts.path: 5884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao urlparts = list(urlparts) 5894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao urlparts[2] = "/" 5904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao newurl = urlparse.urlunparse(urlparts) 5914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 5924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao newurl = urlparse.urljoin(req.get_full_url(), newurl) 5934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 5944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # For security reasons we do not allow redirects to protocols 5954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # other than HTTP, HTTPS or FTP. 5964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao newurl_lower = newurl.lower() 5974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not (newurl_lower.startswith('http://') or 5984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao newurl_lower.startswith('https://') or 5994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao newurl_lower.startswith('ftp://')): 6004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise HTTPError(newurl, code, 6014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao msg + " - Redirection to url '%s' is not allowed" % 6024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao newurl, 6034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao headers, fp) 6044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX Probably want to forget about the state of the current 6064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # request, although that might interact poorly with other 6074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # handlers that also use handler-specific request attributes 6084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao new = self.redirect_request(req, fp, code, msg, headers, newurl) 6094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if new is None: 6104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return 6114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # loop detection 6134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # .redirect_dict has a key url if url was previously visited. 6144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if hasattr(req, 'redirect_dict'): 6154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao visited = new.redirect_dict = req.redirect_dict 6164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if (visited.get(newurl, 0) >= self.max_repeats or 6174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao len(visited) >= self.max_redirections): 6184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise HTTPError(req.get_full_url(), code, 6194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.inf_msg + msg, headers, fp) 6204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 6214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao visited = new.redirect_dict = req.redirect_dict = {} 6224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao visited[newurl] = visited.get(newurl, 0) + 1 6234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # Don't close the fp until we are sure that we won't use it 6254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # with HTTPError. 6264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao fp.read() 6274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao fp.close() 6284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.parent.open(new, timeout=req.timeout) 6304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao http_error_301 = http_error_303 = http_error_307 = http_error_302 6324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao inf_msg = "The HTTP server returned a redirect error that would " \ 6344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "lead to an infinite loop.\n" \ 6354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "The last 30x error message was:\n" 6364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef _parse_proxy(proxy): 6394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Return (scheme, user, password, host/port) given a URL or an authority. 6404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao If a URL is supplied, it must have an authority (host:port) component. 6424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao According to RFC 3986, having an authority component means the URL must 6434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao have two slashes after the scheme: 6444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao >>> _parse_proxy('file:/ftp.example.com/') 6464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao Traceback (most recent call last): 6474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao ValueError: proxy URL with no authority: 'file:/ftp.example.com/' 6484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao The first three items of the returned tuple may be None. 6504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao Examples of authority parsing: 6524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao >>> _parse_proxy('proxy.example.com') 6544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (None, None, None, 'proxy.example.com') 6554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao >>> _parse_proxy('proxy.example.com:3128') 6564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (None, None, None, 'proxy.example.com:3128') 6574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao The authority component may optionally include userinfo (assumed to be 6594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao username:password): 6604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao >>> _parse_proxy('joe:password@proxy.example.com') 6624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (None, 'joe', 'password', 'proxy.example.com') 6634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao >>> _parse_proxy('joe:password@proxy.example.com:3128') 6644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (None, 'joe', 'password', 'proxy.example.com:3128') 6654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao Same examples, but with URLs instead: 6674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao >>> _parse_proxy('http://proxy.example.com/') 6694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao ('http', None, None, 'proxy.example.com') 6704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao >>> _parse_proxy('http://proxy.example.com:3128/') 6714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao ('http', None, None, 'proxy.example.com:3128') 6724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao >>> _parse_proxy('http://joe:password@proxy.example.com/') 6734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao ('http', 'joe', 'password', 'proxy.example.com') 6744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao >>> _parse_proxy('http://joe:password@proxy.example.com:3128') 6754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao ('http', 'joe', 'password', 'proxy.example.com:3128') 6764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao Everything after the authority is ignored: 6784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao >>> _parse_proxy('ftp://joe:password@proxy.example.com/rubbish:3128') 6804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao ('ftp', 'joe', 'password', 'proxy.example.com') 6814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao Test for no trailing '/' case: 6834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao >>> _parse_proxy('http://joe:password@proxy.example.com') 6854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao ('http', 'joe', 'password', 'proxy.example.com') 6864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 6874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """ 6884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao scheme, r_scheme = splittype(proxy) 6894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not r_scheme.startswith("/"): 6904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # authority 6914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao scheme = None 6924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao authority = proxy 6934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 6944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # URL 6954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not r_scheme.startswith("//"): 6964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise ValueError("proxy URL with no authority: %r" % proxy) 6974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # We have an authority, so for RFC 3986-compliant URLs (by ss 3. 6984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # and 3.3.), path is empty or starts with '/' 6994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao end = r_scheme.find("/", 2) 7004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if end == -1: 7014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao end = None 7024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao authority = r_scheme[2:end] 7034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao userinfo, hostport = splituser(authority) 7044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if userinfo is not None: 7054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao user, password = splitpasswd(userinfo) 7064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 7074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao user = password = None 7084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return scheme, user, password, hostport 7094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 7104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass ProxyHandler(BaseHandler): 7114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # Proxies must be in front 7124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao handler_order = 100 7134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 7144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def __init__(self, proxies=None): 7154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if proxies is None: 7164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao proxies = getproxies() 7174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao assert hasattr(proxies, 'has_key'), "proxies must be a mapping" 7184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.proxies = proxies 7194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for type, url in proxies.items(): 7204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao setattr(self, '%s_open' % type, 7214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao lambda r, proxy=url, type=type, meth=self.proxy_open: \ 7224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao meth(r, proxy, type)) 7234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 7244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def proxy_open(self, req, proxy, type): 7254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao orig_type = req.get_type() 7264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao proxy_type, user, password, hostport = _parse_proxy(proxy) 7274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 7284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if proxy_type is None: 7294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao proxy_type = orig_type 7304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 7314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if req.host and proxy_bypass(req.host): 7324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return None 7334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 7344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if user and password: 7354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao user_pass = '%s:%s' % (unquote(user), unquote(password)) 7364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao creds = base64.b64encode(user_pass).strip() 7374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao req.add_header('Proxy-authorization', 'Basic ' + creds) 7384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao hostport = unquote(hostport) 7394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao req.set_proxy(hostport, proxy_type) 7404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 7414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if orig_type == proxy_type or orig_type == 'https': 7424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # let other handlers take care of it 7434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return None 7444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 7454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # need to start over, because the other handlers don't 7464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # grok the proxy's URL type 7474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # e.g. if we have a constructor arg proxies like so: 7484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # {'http': 'ftp://proxy.example.com'}, we may end up turning 7494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # a request for http://acme.example.com/a into one for 7504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # ftp://proxy.example.com/a 7514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.parent.open(req, timeout=req.timeout) 7524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 7534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPPasswordMgr: 7544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 7554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def __init__(self): 7564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.passwd = {} 7574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 7584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def add_password(self, realm, uri, user, passwd): 7594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # uri could be a single URI or a sequence 7604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if isinstance(uri, basestring): 7614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao uri = [uri] 7624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not realm in self.passwd: 7634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.passwd[realm] = {} 7644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for default_port in True, False: 7654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao reduced_uri = tuple( 7664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao [self.reduce_uri(u, default_port) for u in uri]) 7674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.passwd[realm][reduced_uri] = (user, passwd) 7684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 7694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def find_user_password(self, realm, authuri): 7704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao domains = self.passwd.get(realm, {}) 7714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for default_port in True, False: 7724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao reduced_authuri = self.reduce_uri(authuri, default_port) 7734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for uris, authinfo in domains.iteritems(): 7744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for uri in uris: 7754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if self.is_suburi(uri, reduced_authuri): 7764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return authinfo 7774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return None, None 7784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 7794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def reduce_uri(self, uri, default_port=True): 7804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Accept authority or URI and extract only the authority and path.""" 7814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # note HTTP URLs do not have a userinfo component 7824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao parts = urlparse.urlsplit(uri) 7834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if parts[1]: 7844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # URI 7854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao scheme = parts[0] 7864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao authority = parts[1] 7874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao path = parts[2] or '/' 7884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 7894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # host or host:port 7904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao scheme = None 7914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao authority = uri 7924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao path = '/' 7934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao host, port = splitport(authority) 7944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if default_port and port is None and scheme is not None: 7954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao dport = {"http": 80, 7964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao "https": 443, 7974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao }.get(scheme) 7984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if dport is not None: 7994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao authority = "%s:%d" % (host, dport) 8004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return authority, path 8014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def is_suburi(self, base, test): 8034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Check if test is below base in a URI tree 8044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao Both args must be URIs in reduced form. 8064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """ 8074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if base == test: 8084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return True 8094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if base[0] != test[0]: 8104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return False 8114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao common = posixpath.commonprefix((base[1], test[1])) 8124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if len(common) == len(base[1]): 8134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return True 8144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return False 8154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPPasswordMgrWithDefaultRealm(HTTPPasswordMgr): 8184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def find_user_password(self, realm, authuri): 8204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao user, password = HTTPPasswordMgr.find_user_password(self, realm, 8214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao authuri) 8224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if user is not None: 8234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return user, password 8244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return HTTPPasswordMgr.find_user_password(self, None, authuri) 8254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass AbstractBasicAuthHandler: 8284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX this allows for multiple auth-schemes, but will stupidly pick 8304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # the last one with a realm specified. 8314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # allow for double- and single-quoted realm values 8334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # (single quotes are a violation of the RFC, but appear in the wild) 8344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+' 8354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 'realm=(["\']?)([^"\']*)\\2', re.I) 8364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX could pre-emptively send auth info already accepted (RFC 2617, 8384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # end of section 2, and section 1.2 immediately after "credentials" 8394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # production). 8404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def __init__(self, password_mgr=None): 8424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if password_mgr is None: 8434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao password_mgr = HTTPPasswordMgr() 8444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.passwd = password_mgr 8454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.add_password = self.passwd.add_password 8464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.retried = 0 8474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def reset_retry_count(self): 8494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.retried = 0 8504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def http_error_auth_reqed(self, authreq, host, req, headers): 8524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # host may be an authority (without userinfo) or a URL with an 8534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # authority 8544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX could be multiple headers 8554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao authreq = headers.get(authreq, None) 8564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if self.retried > 5: 8584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # retry sending the username:password 5 times before failing. 8594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise HTTPError(req.get_full_url(), 401, "basic auth failed", 8604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao headers, None) 8614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 8624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.retried += 1 8634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if authreq: 8654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao mo = AbstractBasicAuthHandler.rx.search(authreq) 8664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if mo: 8674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao scheme, quote, realm = mo.groups() 8684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if quote not in ['"', "'"]: 8694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao warnings.warn("Basic Auth Realm was unquoted", 8704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao UserWarning, 2) 8714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if scheme.lower() == 'basic': 8724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao response = self.retry_http_basic_auth(host, req, realm) 8734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if response and response.code != 401: 8744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.retried = 0 8754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return response 8764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def retry_http_basic_auth(self, host, req, realm): 8784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao user, pw = self.passwd.find_user_password(realm, host) 8794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if pw is not None: 8804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raw = "%s:%s" % (user, pw) 8814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao auth = 'Basic %s' % base64.b64encode(raw).strip() 8824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if req.headers.get(self.auth_header, None) == auth: 8834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return None 8844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao req.add_unredirected_header(self.auth_header, auth) 8854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.parent.open(req, timeout=req.timeout) 8864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 8874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return None 8884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler): 8914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao auth_header = 'Authorization' 8934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 8944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def http_error_401(self, req, fp, code, msg, headers): 8954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao url = req.get_full_url() 8964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao response = self.http_error_auth_reqed('www-authenticate', 8974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao url, req, headers) 8984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.reset_retry_count() 8994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return response 9004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 9014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 9024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass ProxyBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler): 9034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 9044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao auth_header = 'Proxy-authorization' 9054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 9064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def http_error_407(self, req, fp, code, msg, headers): 9074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # http_error_auth_reqed requires that there is no userinfo component in 9084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # authority. Assume there isn't one, since urllib2 does not (and 9094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # should not, RFC 3986 s. 3.2.1) support requests for URLs containing 9104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # userinfo. 9114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao authority = req.get_host() 9124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao response = self.http_error_auth_reqed('proxy-authenticate', 9134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao authority, req, headers) 9144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.reset_retry_count() 9154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return response 9164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 9174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 9184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef randombytes(n): 9194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Return n random bytes.""" 9204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # Use /dev/urandom if it is available. Fall back to random module 9214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # if not. It might be worthwhile to extend this function to use 9224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # other platform-specific mechanisms for getting random bytes. 9234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if os.path.exists("/dev/urandom"): 9244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao f = open("/dev/urandom") 9254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao s = f.read(n) 9264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao f.close() 9274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return s 9284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 9294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao L = [chr(random.randrange(0, 256)) for i in range(n)] 9304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return "".join(L) 9314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 9324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass AbstractDigestAuthHandler: 9334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # Digest authentication is specified in RFC 2617. 9344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 9354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX The client does not inspect the Authentication-Info header 9364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # in a successful response. 9374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 9384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX It should be possible to test this implementation against 9394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # a mock server that just generates a static set of challenges. 9404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 9414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX qop="auth-int" supports is shaky 9424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 9434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def __init__(self, passwd=None): 9444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if passwd is None: 9454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao passwd = HTTPPasswordMgr() 9464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.passwd = passwd 9474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.add_password = self.passwd.add_password 9484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.retried = 0 9494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.nonce_count = 0 9504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.last_nonce = None 9514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 9524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def reset_retry_count(self): 9534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.retried = 0 9544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 9554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def http_error_auth_reqed(self, auth_header, host, req, headers): 9564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao authreq = headers.get(auth_header, None) 9574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if self.retried > 5: 9584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # Don't fail endlessly - if we failed once, we'll probably 9594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # fail a second time. Hm. Unless the Password Manager is 9604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # prompting for the information. Crap. This isn't great 9614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # but it's better than the current 'repeat until recursion 9624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # depth exceeded' approach <wink> 9634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise HTTPError(req.get_full_url(), 401, "digest auth failed", 9644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao headers, None) 9654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 9664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.retried += 1 9674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if authreq: 9684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao scheme = authreq.split()[0] 9694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if scheme.lower() == 'digest': 9704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.retry_http_digest_auth(req, authreq) 9714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 9724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def retry_http_digest_auth(self, req, auth): 9734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao token, challenge = auth.split(' ', 1) 9744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao chal = parse_keqv_list(parse_http_list(challenge)) 9754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao auth = self.get_authorization(req, chal) 9764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if auth: 9774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao auth_val = 'Digest %s' % auth 9784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if req.headers.get(self.auth_header, None) == auth_val: 9794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return None 9804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao req.add_unredirected_header(self.auth_header, auth_val) 9814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao resp = self.parent.open(req, timeout=req.timeout) 9824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return resp 9834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 9844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def get_cnonce(self, nonce): 9854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # The cnonce-value is an opaque 9864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # quoted string value provided by the client and used by both client 9874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # and server to avoid chosen plaintext attacks, to provide mutual 9884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # authentication, and to provide some message integrity protection. 9894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # This isn't a fabulous effort, but it's probably Good Enough. 9904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao dig = hashlib.sha1("%s:%s:%s:%s" % (self.nonce_count, nonce, time.ctime(), 9914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao randombytes(8))).hexdigest() 9924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return dig[:16] 9934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 9944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def get_authorization(self, req, chal): 9954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao try: 9964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao realm = chal['realm'] 9974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao nonce = chal['nonce'] 9984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao qop = chal.get('qop') 9994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao algorithm = chal.get('algorithm', 'MD5') 10004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # mod_digest doesn't send an opaque, even though it isn't 10014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # supposed to be optional 10024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao opaque = chal.get('opaque', None) 10034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao except KeyError: 10044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return None 10054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao H, KD = self.get_algorithm_impls(algorithm) 10074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if H is None: 10084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return None 10094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao user, pw = self.passwd.find_user_password(realm, req.get_full_url()) 10114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if user is None: 10124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return None 10134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX not implemented yet 10154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if req.has_data(): 10164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao entdig = self.get_entity_digest(req.get_data(), chal) 10174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 10184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao entdig = None 10194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao A1 = "%s:%s:%s" % (user, realm, pw) 10214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao A2 = "%s:%s" % (req.get_method(), 10224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX selector: what about proxies and full urls 10234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao req.get_selector()) 10244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if qop == 'auth': 10254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if nonce == self.last_nonce: 10264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.nonce_count += 1 10274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 10284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.nonce_count = 1 10294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.last_nonce = nonce 10304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao ncvalue = '%08x' % self.nonce_count 10324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao cnonce = self.get_cnonce(nonce) 10334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao noncebit = "%s:%s:%s:%s:%s" % (nonce, ncvalue, cnonce, qop, H(A2)) 10344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao respdig = KD(H(A1), noncebit) 10354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao elif qop is None: 10364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao respdig = KD(H(A1), "%s:%s" % (nonce, H(A2))) 10374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 10384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX handle auth-int. 10394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise URLError("qop '%s' is not supported." % qop) 10404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX should the partial digests be encoded too? 10424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao base = 'username="%s", realm="%s", nonce="%s", uri="%s", ' \ 10444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 'response="%s"' % (user, realm, nonce, req.get_selector(), 10454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao respdig) 10464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if opaque: 10474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao base += ', opaque="%s"' % opaque 10484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if entdig: 10494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao base += ', digest="%s"' % entdig 10504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao base += ', algorithm="%s"' % algorithm 10514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if qop: 10524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao base += ', qop=auth, nc=%s, cnonce="%s"' % (ncvalue, cnonce) 10534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return base 10544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def get_algorithm_impls(self, algorithm): 10564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # algorithm should be case-insensitive according to RFC2617 10574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao algorithm = algorithm.upper() 10584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # lambdas assume digest modules are imported at the top level 10594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if algorithm == 'MD5': 10604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao H = lambda x: hashlib.md5(x).hexdigest() 10614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao elif algorithm == 'SHA': 10624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao H = lambda x: hashlib.sha1(x).hexdigest() 10634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX MD5-sess 10644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao KD = lambda s, d: H("%s:%s" % (s, d)) 10654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return H, KD 10664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def get_entity_digest(self, data, chal): 10684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX not implemented yet 10694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return None 10704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler): 10734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """An authentication protocol defined by RFC 2069 10744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao Digest authentication improves on basic authentication because it 10764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao does not transmit passwords in the clear. 10774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """ 10784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao auth_header = 'Authorization' 10804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao handler_order = 490 # before Basic auth 10814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def http_error_401(self, req, fp, code, msg, headers): 10834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao host = urlparse.urlparse(req.get_full_url())[1] 10844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao retry = self.http_error_auth_reqed('www-authenticate', 10854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao host, req, headers) 10864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.reset_retry_count() 10874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return retry 10884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass ProxyDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler): 10914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao auth_header = 'Proxy-Authorization' 10934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao handler_order = 490 # before Basic auth 10944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 10954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def http_error_407(self, req, fp, code, msg, headers): 10964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao host = req.get_host() 10974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao retry = self.http_error_auth_reqed('proxy-authenticate', 10984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao host, req, headers) 10994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.reset_retry_count() 11004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return retry 11014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass AbstractHTTPHandler(BaseHandler): 11034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def __init__(self, debuglevel=0): 11054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self._debuglevel = debuglevel 11064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def set_http_debuglevel(self, level): 11084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self._debuglevel = level 11094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def do_request_(self, request): 11114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao host = request.get_host() 11124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not host: 11134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise URLError('no host given') 11144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if request.has_data(): # POST 11164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao data = request.get_data() 11174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not request.has_header('Content-type'): 11184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao request.add_unredirected_header( 11194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 'Content-type', 11204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 'application/x-www-form-urlencoded') 11214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not request.has_header('Content-length'): 11224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao request.add_unredirected_header( 11234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 'Content-length', '%d' % len(data)) 11244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao sel_host = host 11264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if request.has_proxy(): 11274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao scheme, sel = splittype(request.get_selector()) 11284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao sel_host, sel_path = splithost(sel) 11294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not request.has_header('Host'): 11314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao request.add_unredirected_header('Host', sel_host) 11324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for name, value in self.parent.addheaders: 11334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao name = name.capitalize() 11344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not request.has_header(name): 11354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao request.add_unredirected_header(name, value) 11364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return request 11384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def do_open(self, http_class, req): 11404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Return an addinfourl object for the request, using http_class. 11414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao http_class must implement the HTTPConnection API from httplib. 11434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao The addinfourl return value is a file-like object. It also 11444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao has methods and attributes including: 11454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao - info(): return a mimetools.Message object for the headers 11464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao - geturl(): return the original request URL 11474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao - code: HTTP status code 11484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """ 11494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao host = req.get_host() 11504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not host: 11514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise URLError('no host given') 11524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao h = http_class(host, timeout=req.timeout) # will parse host:port 11544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao h.set_debuglevel(self._debuglevel) 11554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao headers = dict(req.unredirected_hdrs) 11574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao headers.update(dict((k, v) for k, v in req.headers.items() 11584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if k not in headers)) 11594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # We want to make an HTTP/1.1 request, but the addinfourl 11614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # class isn't prepared to deal with a persistent connection. 11624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # It will try to read all remaining data from the socket, 11634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # which will block while the server waits for the next request. 11644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # So make sure the connection gets closed after the (only) 11654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # request. 11664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao headers["Connection"] = "close" 11674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao headers = dict( 11684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (name.title(), val) for name, val in headers.items()) 11694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if req._tunnel_host: 11714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao tunnel_headers = {} 11724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao proxy_auth_hdr = "Proxy-Authorization" 11734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if proxy_auth_hdr in headers: 11744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao tunnel_headers[proxy_auth_hdr] = headers[proxy_auth_hdr] 11754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # Proxy-Authorization should not be sent to origin 11764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # server. 11774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao del headers[proxy_auth_hdr] 11784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao h.set_tunnel(req._tunnel_host, headers=tunnel_headers) 11794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao try: 11814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao h.request(req.get_method(), req.get_selector(), req.data, headers) 11824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao except socket.error, err: # XXX what error? 11834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao h.close() 11844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise URLError(err) 11854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 11864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao try: 11874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao r = h.getresponse(buffering=True) 11884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao except TypeError: # buffering kw not supported 11894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao r = h.getresponse() 11904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # Pick apart the HTTPResponse object to get the addinfourl 11924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # object initialized properly. 11934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # Wrap the HTTPResponse object in socket's file object adapter 11954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # for Windows. That adapter calls recv(), so delegate recv() 11964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # to read(). This weird wrapping allows the returned object to 11974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # have readline() and readlines() methods. 11984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 11994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX It might be better to extract the read buffering code 12004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # out of socket._fileobject() and into a base class. 12014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao r.recv = r.read 12034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao fp = socket._fileobject(r, close=True) 12044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao resp = addinfourl(fp, r.msg, req.get_full_url()) 12064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao resp.code = r.status 12074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao resp.msg = r.reason 12084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return resp 12094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPHandler(AbstractHTTPHandler): 12124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def http_open(self, req): 12144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.do_open(httplib.HTTPConnection, req) 12154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao http_request = AbstractHTTPHandler.do_request_ 12174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoif hasattr(httplib, 'HTTPS'): 12194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao class HTTPSHandler(AbstractHTTPHandler): 12204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def https_open(self, req): 12224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.do_open(httplib.HTTPSConnection, req) 12234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao https_request = AbstractHTTPHandler.do_request_ 12254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPCookieProcessor(BaseHandler): 12274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def __init__(self, cookiejar=None): 12284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao import cookielib 12294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if cookiejar is None: 12304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao cookiejar = cookielib.CookieJar() 12314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.cookiejar = cookiejar 12324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def http_request(self, request): 12344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.cookiejar.add_cookie_header(request) 12354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return request 12364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def http_response(self, request, response): 12384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.cookiejar.extract_cookies(response, request) 12394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return response 12404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao https_request = http_request 12424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao https_response = http_response 12434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass UnknownHandler(BaseHandler): 12454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def unknown_open(self, req): 12464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao type = req.get_type() 12474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise URLError('unknown url type: %s' % type) 12484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef parse_keqv_list(l): 12504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Parse list of key=value strings where keys are not duplicated.""" 12514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao parsed = {} 12524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for elt in l: 12534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao k, v = elt.split('=', 1) 12544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if v[0] == '"' and v[-1] == '"': 12554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao v = v[1:-1] 12564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao parsed[k] = v 12574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return parsed 12584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef parse_http_list(s): 12604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """Parse lists as described by RFC 2068 Section 2. 12614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao In particular, parse comma-separated lists where the elements of 12634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao the list may include quoted-strings. A quoted-string could 12644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao contain a comma. A non-quoted string could have quotes in the 12654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao middle. Neither commas nor quotes count if they are escaped. 12664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao Only double-quotes count, not single-quotes. 12674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao """ 12684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao res = [] 12694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao part = '' 12704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao escape = quote = False 12724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for cur in s: 12734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if escape: 12744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao part += cur 12754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao escape = False 12764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao continue 12774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if quote: 12784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if cur == '\\': 12794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao escape = True 12804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao continue 12814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao elif cur == '"': 12824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao quote = False 12834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao part += cur 12844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao continue 12854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if cur == ',': 12874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao res.append(part) 12884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao part = '' 12894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao continue 12904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if cur == '"': 12924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao quote = True 12934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao part += cur 12954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 12964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # append last part 12974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if part: 12984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao res.append(part) 12994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 13004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return [part.strip() for part in res] 13014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 13024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef _safe_gethostbyname(host): 13034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao try: 13044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return socket.gethostbyname(host) 13054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao except socket.gaierror: 13064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return None 13074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 13084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass FileHandler(BaseHandler): 13094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # Use local file or FTP depending on form of URL 13104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def file_open(self, req): 13114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao url = req.get_selector() 13124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if url[:2] == '//' and url[2:3] != '/' and (req.host and 13134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao req.host != 'localhost'): 13144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao req.type = 'ftp' 13154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.parent.open(req) 13164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 13174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.open_local_file(req) 13184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 13194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # names for the localhost 13204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao names = None 13214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def get_names(self): 13224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if FileHandler.names is None: 13234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao try: 13244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao FileHandler.names = tuple( 13254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao socket.gethostbyname_ex('localhost')[2] + 13264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao socket.gethostbyname_ex(socket.gethostname())[2]) 13274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao except socket.gaierror: 13284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao FileHandler.names = (socket.gethostbyname('localhost'),) 13294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return FileHandler.names 13304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 13314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # not entirely sure what the rules are here 13324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def open_local_file(self, req): 13334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao import email.utils 13344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao import mimetypes 13354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao host = req.get_host() 13364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao filename = req.get_selector() 13374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao localfile = url2pathname(filename) 13384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao try: 13394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao stats = os.stat(localfile) 13404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao size = stats.st_size 13414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao modified = email.utils.formatdate(stats.st_mtime, usegmt=True) 13424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao mtype = mimetypes.guess_type(filename)[0] 13434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao headers = mimetools.Message(StringIO( 13444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 'Content-type: %s\nContent-length: %d\nLast-modified: %s\n' % 13454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (mtype or 'text/plain', size, modified))) 13464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if host: 13474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao host, port = splitport(host) 13484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not host or \ 13494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao (not port and _safe_gethostbyname(host) in self.get_names()): 13504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if host: 13514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao origurl = 'file://' + host + filename 13524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 13534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao origurl = 'file://' + filename 13544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return addinfourl(open(localfile, 'rb'), headers, origurl) 13554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao except OSError, msg: 13564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # urllib2 users shouldn't expect OSErrors coming from urlopen() 13574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise URLError(msg) 13584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise URLError('file not on local host') 13594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 13604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass FTPHandler(BaseHandler): 13614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def ftp_open(self, req): 13624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao import ftplib 13634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao import mimetypes 13644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao host = req.get_host() 13654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if not host: 13664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise URLError('ftp error: no host given') 13674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao host, port = splitport(host) 13684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if port is None: 13694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao port = ftplib.FTP_PORT 13704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 13714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao port = int(port) 13724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 13734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # username/password handling 13744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao user, host = splituser(host) 13754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if user: 13764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao user, passwd = splitpasswd(user) 13774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 13784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao passwd = None 13794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao host = unquote(host) 13804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao user = user or '' 13814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao passwd = passwd or '' 13824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 13834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao try: 13844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao host = socket.gethostbyname(host) 13854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao except socket.error, msg: 13864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise URLError(msg) 13874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao path, attrs = splitattr(req.get_selector()) 13884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao dirs = path.split('/') 13894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao dirs = map(unquote, dirs) 13904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao dirs, file = dirs[:-1], dirs[-1] 13914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if dirs and not dirs[0]: 13924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao dirs = dirs[1:] 13934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao try: 13944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout) 13954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao type = file and 'I' or 'D' 13964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for attr in attrs: 13974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao attr, value = splitvalue(attr) 13984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if attr.lower() == 'type' and \ 13994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao value in ('a', 'A', 'i', 'I', 'd', 'D'): 14004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao type = value.upper() 14014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao fp, retrlen = fw.retrfile(file, type) 14024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao headers = "" 14034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao mtype = mimetypes.guess_type(req.get_full_url())[0] 14044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if mtype: 14054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao headers += "Content-type: %s\n" % mtype 14064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if retrlen is not None and retrlen >= 0: 14074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao headers += "Content-length: %d\n" % retrlen 14084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao sf = StringIO(headers) 14094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao headers = mimetools.Message(sf) 14104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return addinfourl(fp, headers, req.get_full_url()) 14114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao except ftplib.all_errors, msg: 14124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao raise URLError, ('ftp error: %s' % msg), sys.exc_info()[2] 14134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 14144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def connect_ftp(self, user, passwd, host, port, dirs, timeout): 14154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao fw = ftpwrapper(user, passwd, host, port, dirs, timeout, 14164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao persistent=False) 14174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao## fw.ftp.set_debuglevel(1) 14184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return fw 14194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 14204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass CacheFTPHandler(FTPHandler): 14214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX would be nice to have pluggable cache strategies 14224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # XXX this stuff is definitely not thread safe 14234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def __init__(self): 14244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.cache = {} 14254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.timeout = {} 14264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.soonest = 0 14274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.delay = 60 14284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.max_conns = 16 14294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 14304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def setTimeout(self, t): 14314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.delay = t 14324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 14334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def setMaxConns(self, m): 14344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.max_conns = m 14354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 14364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def connect_ftp(self, user, passwd, host, port, dirs, timeout): 14374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao key = user, host, port, '/'.join(dirs), timeout 14384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if key in self.cache: 14394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.timeout[key] = time.time() + self.delay 14404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao else: 14414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.cache[key] = ftpwrapper(user, passwd, host, port, dirs, timeout) 14424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.timeout[key] = time.time() + self.delay 14434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.check_cache() 14444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao return self.cache[key] 14454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 14464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def check_cache(self): 14474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # first check for old ones 14484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao t = time.time() 14494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if self.soonest <= t: 14504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for k, v in self.timeout.items(): 14514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if v < t: 14524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.cache[k].close() 14534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao del self.cache[k] 14544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao del self.timeout[k] 14554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.soonest = min(self.timeout.values()) 14564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 14574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao # then check the size 14584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if len(self.cache) == self.max_conns: 14594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for k, v in self.timeout.items(): 14604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao if v == self.soonest: 14614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao del self.cache[k] 14624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao del self.timeout[k] 14634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao break 14644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.soonest = min(self.timeout.values()) 14654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao 14664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao def clear_cache(self): 14674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao for conn in self.cache.values(): 14684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao conn.close() 14694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.cache.clear() 14704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao self.timeout.clear() 1471