14adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao"""An extensible library for opening URLs using a variety of protocols
24adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
34adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoThe simplest way to use this module is to call the urlopen function,
44adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaowhich accepts a string containing a URL or a Request object (described
54adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaobelow).  It opens the URL and returns the results as file-like
64adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoobject; the returned object has some extra methods described below.
74adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
84adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoThe OpenerDirector manages a collection of Handler objects that do
94adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoall the actual work.  Each Handler implements a particular protocol or
104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaooption.  The OpenerDirector is a composite object that invokes the
114adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoHandlers needed to open the requested URL.  For example, the
124adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoHTTPHandler performs HTTP GET and POST requests and deals with
134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaonon-error returns.  The HTTPRedirectHandler automatically deals with
144adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoHTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler
154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodeals with digest authentication.
164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaourlopen(url, data=None) -- Basic usage is the same as original
184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaourllib.  pass the url and optionally data to post to an HTTP URL, and
194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoget a file-like object back.  One difference is that you can also pass
204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoa Request instance instead of URL.  Raises a URLError (subclass of
214adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoIOError); for HTTP errors, raises an HTTPError, which can also be
224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaotreated as a valid response.
234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaobuild_opener -- Function that creates a new OpenerDirector instance.
254adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoWill install the default handlers.  Accepts one or more Handlers as
264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoarguments, either instances or Handler classes that it will
274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoinstantiate.  If one of the argument is a subclass of the default
284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaohandler, the argument will be installed instead of the default.
294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoinstall_opener -- Installs a new opener as the default opener.
314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoobjects of interest:
334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
344adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoOpenerDirector -- Sets up the User Agent as the Python-urllib client and manages
354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaothe Handler classes, while dealing with requests and responses.
364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
374adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoRequest -- An object that encapsulates the state of a request.  The
384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaostate can be as simple as the URL.  It can also include extra HTTP
394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoheaders, e.g. a User-Agent.
404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
414adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoBaseHandler --
424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoexceptions:
444adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoURLError -- A subclass of IOError, individual protocols have their own
454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaospecific subclass.
464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
474adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoHTTPError -- Also a valid HTTP response, so you can treat an HTTP error
484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoas an exceptional event or valid response.
494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaointernals:
514adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoBaseHandler and parent
524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao_call_chain conventions
534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
544adfde8bc82dd39f59e0445588c3e599ada477dJosh GaoExample usage:
554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport urllib2
574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# set up authentication info
594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoauthinfo = urllib2.HTTPBasicAuthHandler()
604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoauthinfo.add_password(realm='PDQ Application',
614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                      uri='https://mahler:8092/site-updates.py',
624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                      user='klem',
634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                      passwd='geheim$parole')
644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoproxy_support = urllib2.ProxyHandler({"http" : "http://ahad-haam:3128"})
664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# build a new opener that adds authentication and caching FTP handlers
684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoopener = urllib2.build_opener(proxy_support, authinfo, urllib2.CacheFTPHandler)
694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# install it
714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaourllib2.install_opener(opener)
724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaof = urllib2.urlopen('http://www.python.org/')
744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao"""
774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# XXX issues:
794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# If an authentication error handler that tries to perform
804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# authentication for some reason but fails, how should the error be
814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# signalled?  The client needs to know the HTTP error code.  But if
824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# the handler knows that the problem was, e.g., that it didn't know
834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# that hash algo that requested in the challenge, it would be good to
844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# pass that information along to the client, too.
854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# ftp errors aren't handled cleanly
864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# check digest against correct (i.e. non-apache) implementation
874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# Possible extensions:
894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# complex proxies  XXX not sure what exactly was meant by this
904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# abstract factory for opener
914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport base64
934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport hashlib
944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport httplib
954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport mimetools
964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport os
974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport posixpath
984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport random
994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport re
1004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport socket
1014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport sys
1024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport time
1034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport urlparse
1044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport bisect
1054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoimport warnings
1064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaotry:
1084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    from cStringIO import StringIO
1094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoexcept ImportError:
1104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    from StringIO import StringIO
1114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaofrom urllib import (unwrap, unquote, splittype, splithost, quote,
1134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao     addinfourl, splitport, splittag, toBytes,
1144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao     splitattr, ftpwrapper, splituser, splitpasswd, splitvalue)
1154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# support for FileHandler, proxies via environment variables
1174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaofrom urllib import localhost, url2pathname, getproxies, proxy_bypass
1184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# used in User-Agent header sent
1204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao__version__ = sys.version[:3]
1214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao_opener = None
1234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
1244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    global _opener
1254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    if _opener is None:
1264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        _opener = build_opener()
1274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return _opener.open(url, data, timeout)
1284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef install_opener(opener):
1304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    global _opener
1314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    _opener = opener
1324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# do these error classes make sense?
1344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# make sure all of the IOError stuff is overridden.  we just want to be
1354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# subtypes.
1364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass URLError(IOError):
1384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # URLError is a sub-type of IOError, but it doesn't share any of
1394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # the implementation.  need to override __init__ and __str__.
1404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # It sets self.args for compatibility with other EnvironmentError
1414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # subclasses, but args doesn't have the typical format with errno in
1424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # slot 0 and strerror in slot 1.  This may be better than nothing.
1434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def __init__(self, reason):
1444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.args = reason,
1454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.reason = reason
1464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def __str__(self):
1484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return '<urlopen error %s>' % self.reason
1494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPError(URLError, addinfourl):
1514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """Raised when HTTP error occurs, but also acts like non-error return"""
1524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    __super_init = addinfourl.__init__
1534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def __init__(self, url, code, msg, hdrs, fp):
1554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.code = code
1564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.msg = msg
1574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.hdrs = hdrs
1584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.fp = fp
1594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.filename = url
1604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # The addinfourl classes depend on fp being a valid file
1614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # object.  In some cases, the HTTPError may not have a valid
1624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # file object.  If this happens, the simplest workaround is to
1634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # not initialize the base classes.
1644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if fp is not None:
1654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            self.__super_init(fp, hdrs, url, code)
1664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def __str__(self):
1684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return 'HTTP Error %s: %s' % (self.code, self.msg)
1694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # since URLError specifies a .reason attribute, HTTPError should also
1714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    #  provide this attribute. See issue13211 fo discussion.
1724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    @property
1734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def reason(self):
1744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return self.msg
1754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def info(self):
1774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return self.hdrs
1784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# copied from cookielib.py
1804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao_cut_port_re = re.compile(r":\d+$")
1814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef request_host(request):
1824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """Return request-host, as defined by RFC 2965.
1834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    Variation from RFC: returned value is lowercased, for convenient
1854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    comparison.
1864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """
1884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    url = request.get_full_url()
1894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    host = urlparse.urlparse(url)[1]
1904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    if host == "":
1914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        host = request.get_header("Host", "")
1924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # remove port, if present
1944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    host = _cut_port_re.sub("", host, 1)
1954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return host.lower()
1964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass Request:
1984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
1994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def __init__(self, url, data=None, headers={},
2004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                 origin_req_host=None, unverifiable=False):
2014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # unwrap('<URL:type://host/path>') --> 'type://host/path'
2024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.__original = unwrap(url)
2034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.__original, self.__fragment = splittag(self.__original)
2044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.type = None
2054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # self.__r_type is what's left after doing the splittype
2064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.host = None
2074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.port = None
2084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self._tunnel_host = None
2094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.data = data
2104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.headers = {}
2114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        for key, value in headers.items():
2124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            self.add_header(key, value)
2134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.unredirected_hdrs = {}
2144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if origin_req_host is None:
2154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            origin_req_host = request_host(self)
2164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.origin_req_host = origin_req_host
2174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.unverifiable = unverifiable
2184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def __getattr__(self, attr):
2204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # XXX this is a fallback mechanism to guard against these
2214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # methods getting called in a non-standard order.  this may be
2224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # too complicated and/or unnecessary.
2234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # XXX should the __r_XXX attributes be public?
2244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if attr[:12] == '_Request__r_':
2254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            name = attr[12:]
2264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if hasattr(Request, 'get_' + name):
2274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                getattr(self, 'get_' + name)()
2284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                return getattr(self, attr)
2294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        raise AttributeError, attr
2304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def get_method(self):
2324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if self.has_data():
2334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return "POST"
2344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
2354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return "GET"
2364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # XXX these helper methods are lame
2384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def add_data(self, data):
2404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.data = data
2414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def has_data(self):
2434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return self.data is not None
2444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def get_data(self):
2464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return self.data
2474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def get_full_url(self):
2494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if self.__fragment:
2504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return '%s#%s' % (self.__original, self.__fragment)
2514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
2524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return self.__original
2534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def get_type(self):
2554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if self.type is None:
2564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            self.type, self.__r_type = splittype(self.__original)
2574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if self.type is None:
2584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                raise ValueError, "unknown url type: %s" % self.__original
2594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return self.type
2604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def get_host(self):
2624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if self.host is None:
2634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            self.host, self.__r_host = splithost(self.__r_type)
2644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if self.host:
2654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                self.host = unquote(self.host)
2664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return self.host
2674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def get_selector(self):
2694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return self.__r_host
2704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def set_proxy(self, host, type):
2724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if self.type == 'https' and not self._tunnel_host:
2734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            self._tunnel_host = self.host
2744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
2754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            self.type = type
2764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            self.__r_host = self.__original
2774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.host = host
2794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def has_proxy(self):
2814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return self.__r_host == self.__original
2824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def get_origin_req_host(self):
2844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return self.origin_req_host
2854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def is_unverifiable(self):
2874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return self.unverifiable
2884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def add_header(self, key, val):
2904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # useful for something like authentication
2914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.headers[key.capitalize()] = val
2924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def add_unredirected_header(self, key, val):
2944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # will not be added to a redirected request
2954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.unredirected_hdrs[key.capitalize()] = val
2964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
2974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def has_header(self, header_name):
2984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return (header_name in self.headers or
2994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                header_name in self.unredirected_hdrs)
3004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
3014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def get_header(self, header_name, default=None):
3024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return self.headers.get(
3034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            header_name,
3044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            self.unredirected_hdrs.get(header_name, default))
3054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
3064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def header_items(self):
3074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        hdrs = self.unredirected_hdrs.copy()
3084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        hdrs.update(self.headers)
3094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return hdrs.items()
3104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
3114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass OpenerDirector:
3124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def __init__(self):
3134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        client_version = "Python-urllib/%s" % __version__
3144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.addheaders = [('User-agent', client_version)]
3154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # self.handlers is retained only for backward compatibility
3164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.handlers = []
3174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # manage the individual handlers
3184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.handle_open = {}
3194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.handle_error = {}
3204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.process_response = {}
3214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.process_request = {}
3224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
3234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def add_handler(self, handler):
3244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if not hasattr(handler, "add_parent"):
3254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            raise TypeError("expected BaseHandler instance, got %r" %
3264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                            type(handler))
3274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
3284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        added = False
3294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        for meth in dir(handler):
3304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if meth in ["redirect_request", "do_open", "proxy_open"]:
3314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                # oops, coincidental match
3324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                continue
3334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
3344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            i = meth.find("_")
3354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            protocol = meth[:i]
3364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            condition = meth[i+1:]
3374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
3384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if condition.startswith("error"):
3394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                j = condition.find("_") + i + 1
3404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                kind = meth[j+1:]
3414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                try:
3424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    kind = int(kind)
3434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                except ValueError:
3444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    pass
3454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                lookup = self.handle_error.get(protocol, {})
3464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                self.handle_error[protocol] = lookup
3474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            elif condition == "open":
3484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                kind = protocol
3494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                lookup = self.handle_open
3504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            elif condition == "response":
3514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                kind = protocol
3524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                lookup = self.process_response
3534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            elif condition == "request":
3544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                kind = protocol
3554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                lookup = self.process_request
3564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            else:
3574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                continue
3584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
3594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            handlers = lookup.setdefault(kind, [])
3604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if handlers:
3614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                bisect.insort(handlers, handler)
3624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            else:
3634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                handlers.append(handler)
3644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            added = True
3654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
3664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if added:
3674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            bisect.insort(self.handlers, handler)
3684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            handler.add_parent(self)
3694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
3704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def close(self):
3714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # Only exists for backwards compatibility.
3724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        pass
3734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
3744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def _call_chain(self, chain, kind, meth_name, *args):
3754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # Handlers raise an exception if no one else should try to handle
3764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # the request, or return None if they can't but another handler
3774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # could.  Otherwise, they return the response.
3784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        handlers = chain.get(kind, ())
3794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        for handler in handlers:
3804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            func = getattr(handler, meth_name)
3814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
3824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            result = func(*args)
3834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if result is not None:
3844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                return result
3854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
3864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def open(self, fullurl, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
3874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # accept a URL or a Request object
3884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if isinstance(fullurl, basestring):
3894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            req = Request(fullurl, data)
3904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
3914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            req = fullurl
3924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if data is not None:
3934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                req.add_data(data)
3944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
3954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        req.timeout = timeout
3964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        protocol = req.get_type()
3974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
3984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # pre-process request
3994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        meth_name = protocol+"_request"
4004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        for processor in self.process_request.get(protocol, []):
4014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            meth = getattr(processor, meth_name)
4024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            req = meth(req)
4034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        response = self._open(req, data)
4054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # post-process response
4074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        meth_name = protocol+"_response"
4084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        for processor in self.process_response.get(protocol, []):
4094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            meth = getattr(processor, meth_name)
4104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            response = meth(req, response)
4114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return response
4134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def _open(self, req, data=None):
4154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        result = self._call_chain(self.handle_open, 'default',
4164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                                  'default_open', req)
4174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if result:
4184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return result
4194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        protocol = req.get_type()
4214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        result = self._call_chain(self.handle_open, protocol, protocol +
4224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                                  '_open', req)
4234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if result:
4244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return result
4254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return self._call_chain(self.handle_open, 'unknown',
4274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                                'unknown_open', req)
4284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def error(self, proto, *args):
4304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if proto in ('http', 'https'):
4314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # XXX http[s] protocols are special-cased
4324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            dict = self.handle_error['http'] # https is not different than http
4334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            proto = args[2]  # YUCK!
4344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            meth_name = 'http_error_%s' % proto
4354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            http_err = 1
4364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            orig_args = args
4374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
4384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            dict = self.handle_error
4394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            meth_name = proto + '_error'
4404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            http_err = 0
4414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        args = (dict, proto, meth_name) + args
4424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        result = self._call_chain(*args)
4434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if result:
4444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return result
4454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if http_err:
4474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            args = (dict, 'default', 'http_error_default') + orig_args
4484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return self._call_chain(*args)
4494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# XXX probably also want an abstract factory that knows when it makes
4514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# sense to skip a superclass in favor of a subclass and when it might
4524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao# make sense to include both
4534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef build_opener(*handlers):
4554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """Create an opener object from a list of handlers.
4564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    The opener will use several default handlers, including support
4584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    for HTTP, FTP and when applicable, HTTPS.
4594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    If any of the handlers passed as arguments are subclasses of the
4614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    default handlers, the default handlers will not be used.
4624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """
4634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    import types
4644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def isclass(obj):
4654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return isinstance(obj, (types.ClassType, type))
4664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    opener = OpenerDirector()
4684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    default_classes = [ProxyHandler, UnknownHandler, HTTPHandler,
4694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                       HTTPDefaultErrorHandler, HTTPRedirectHandler,
4704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                       FTPHandler, FileHandler, HTTPErrorProcessor]
4714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    if hasattr(httplib, 'HTTPS'):
4724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        default_classes.append(HTTPSHandler)
4734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    skip = set()
4744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    for klass in default_classes:
4754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        for check in handlers:
4764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if isclass(check):
4774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                if issubclass(check, klass):
4784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    skip.add(klass)
4794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            elif isinstance(check, klass):
4804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                skip.add(klass)
4814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    for klass in skip:
4824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        default_classes.remove(klass)
4834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    for klass in default_classes:
4854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        opener.add_handler(klass())
4864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    for h in handlers:
4884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if isclass(h):
4894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            h = h()
4904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        opener.add_handler(h)
4914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return opener
4924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass BaseHandler:
4944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    handler_order = 500
4954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def add_parent(self, parent):
4974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.parent = parent
4984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
4994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def close(self):
5004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # Only exists for backwards compatibility
5014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        pass
5024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
5034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def __lt__(self, other):
5044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if not hasattr(other, "handler_order"):
5054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # Try to preserve the old behavior of having custom classes
5064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # inserted after default ones (works only for custom user
5074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # classes which are not aware of handler_order).
5084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return True
5094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return self.handler_order < other.handler_order
5104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
5114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
5124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPErrorProcessor(BaseHandler):
5134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """Process HTTP error responses."""
5144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    handler_order = 1000  # after all other processing
5154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
5164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def http_response(self, request, response):
5174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        code, msg, hdrs = response.code, response.msg, response.info()
5184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
5194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # According to RFC 2616, "2xx" code indicates that the client's
5204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # request was successfully received, understood, and accepted.
5214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if not (200 <= code < 300):
5224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            response = self.parent.error(
5234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                'http', request, response, code, msg, hdrs)
5244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
5254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return response
5264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
5274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    https_response = http_response
5284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
5294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPDefaultErrorHandler(BaseHandler):
5304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def http_error_default(self, req, fp, code, msg, hdrs):
5314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
5324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
5334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPRedirectHandler(BaseHandler):
5344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # maximum number of redirections to any single URL
5354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # this is needed because of the state that cookies introduce
5364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    max_repeats = 4
5374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # maximum total number of redirections (regardless of URL) before
5384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # assuming we're in a loop
5394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    max_redirections = 10
5404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
5414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def redirect_request(self, req, fp, code, msg, headers, newurl):
5424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        """Return a Request or None in response to a redirect.
5434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
5444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        This is called by the http_error_30x methods when a
5454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        redirection response is received.  If a redirection should
5464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        take place, return a new Request to allow http_error_30x to
5474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        perform the redirect.  Otherwise, raise HTTPError if no-one
5484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else should try to handle this url.  Return None if you can't
5494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        but another Handler might.
5504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        """
5514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        m = req.get_method()
5524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if (code in (301, 302, 303, 307) and m in ("GET", "HEAD")
5534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            or code in (301, 302, 303) and m == "POST"):
5544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # Strictly (according to RFC 2616), 301 or 302 in response
5554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # to a POST MUST NOT cause a redirection without confirmation
5564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # from the user (of urllib2, in this case).  In practice,
5574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # essentially all clients do redirect in this case, so we
5584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # do the same.
5594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # be conciliant with URIs containing a space
5604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            newurl = newurl.replace(' ', '%20')
5614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            newheaders = dict((k,v) for k,v in req.headers.items()
5624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                              if k.lower() not in ("content-length", "content-type")
5634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                             )
5644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return Request(newurl,
5654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                           headers=newheaders,
5664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                           origin_req_host=req.get_origin_req_host(),
5674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                           unverifiable=True)
5684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
5694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            raise HTTPError(req.get_full_url(), code, msg, headers, fp)
5704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
5714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # Implementation note: To avoid the server sending us into an
5724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # infinite loop, the request object needs to track what URLs we
5734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # have already seen.  Do this by adding a handler-specific
5744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # attribute to the Request object.
5754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def http_error_302(self, req, fp, code, msg, headers):
5764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # Some servers (incorrectly) return multiple Location headers
5774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # (so probably same goes for URI).  Use first header.
5784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if 'location' in headers:
5794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            newurl = headers.getheaders('location')[0]
5804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        elif 'uri' in headers:
5814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            newurl = headers.getheaders('uri')[0]
5824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
5834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return
5844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
5854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # fix a possible malformed URL
5864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        urlparts = urlparse.urlparse(newurl)
5874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if not urlparts.path:
5884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            urlparts = list(urlparts)
5894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            urlparts[2] = "/"
5904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        newurl = urlparse.urlunparse(urlparts)
5914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
5924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        newurl = urlparse.urljoin(req.get_full_url(), newurl)
5934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
5944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # For security reasons we do not allow redirects to protocols
5954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # other than HTTP, HTTPS or FTP.
5964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        newurl_lower = newurl.lower()
5974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if not (newurl_lower.startswith('http://') or
5984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                newurl_lower.startswith('https://') or
5994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                newurl_lower.startswith('ftp://')):
6004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            raise HTTPError(newurl, code,
6014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                            msg + " - Redirection to url '%s' is not allowed" %
6024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                            newurl,
6034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                            headers, fp)
6044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # XXX Probably want to forget about the state of the current
6064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # request, although that might interact poorly with other
6074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # handlers that also use handler-specific request attributes
6084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        new = self.redirect_request(req, fp, code, msg, headers, newurl)
6094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if new is None:
6104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return
6114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # loop detection
6134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # .redirect_dict has a key url if url was previously visited.
6144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if hasattr(req, 'redirect_dict'):
6154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            visited = new.redirect_dict = req.redirect_dict
6164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if (visited.get(newurl, 0) >= self.max_repeats or
6174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                len(visited) >= self.max_redirections):
6184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                raise HTTPError(req.get_full_url(), code,
6194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                                self.inf_msg + msg, headers, fp)
6204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
6214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            visited = new.redirect_dict = req.redirect_dict = {}
6224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        visited[newurl] = visited.get(newurl, 0) + 1
6234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # Don't close the fp until we are sure that we won't use it
6254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # with HTTPError.
6264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        fp.read()
6274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        fp.close()
6284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return self.parent.open(new, timeout=req.timeout)
6304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    http_error_301 = http_error_303 = http_error_307 = http_error_302
6324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    inf_msg = "The HTTP server returned a redirect error that would " \
6344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao              "lead to an infinite loop.\n" \
6354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao              "The last 30x error message was:\n"
6364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef _parse_proxy(proxy):
6394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """Return (scheme, user, password, host/port) given a URL or an authority.
6404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    If a URL is supplied, it must have an authority (host:port) component.
6424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    According to RFC 3986, having an authority component means the URL must
6434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    have two slashes after the scheme:
6444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    >>> _parse_proxy('file:/ftp.example.com/')
6464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    Traceback (most recent call last):
6474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    ValueError: proxy URL with no authority: 'file:/ftp.example.com/'
6484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    The first three items of the returned tuple may be None.
6504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    Examples of authority parsing:
6524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    >>> _parse_proxy('proxy.example.com')
6544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    (None, None, None, 'proxy.example.com')
6554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    >>> _parse_proxy('proxy.example.com:3128')
6564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    (None, None, None, 'proxy.example.com:3128')
6574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    The authority component may optionally include userinfo (assumed to be
6594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    username:password):
6604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    >>> _parse_proxy('joe:password@proxy.example.com')
6624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    (None, 'joe', 'password', 'proxy.example.com')
6634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    >>> _parse_proxy('joe:password@proxy.example.com:3128')
6644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    (None, 'joe', 'password', 'proxy.example.com:3128')
6654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    Same examples, but with URLs instead:
6674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    >>> _parse_proxy('http://proxy.example.com/')
6694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    ('http', None, None, 'proxy.example.com')
6704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    >>> _parse_proxy('http://proxy.example.com:3128/')
6714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    ('http', None, None, 'proxy.example.com:3128')
6724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    >>> _parse_proxy('http://joe:password@proxy.example.com/')
6734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    ('http', 'joe', 'password', 'proxy.example.com')
6744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    >>> _parse_proxy('http://joe:password@proxy.example.com:3128')
6754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    ('http', 'joe', 'password', 'proxy.example.com:3128')
6764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    Everything after the authority is ignored:
6784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    >>> _parse_proxy('ftp://joe:password@proxy.example.com/rubbish:3128')
6804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    ('ftp', 'joe', 'password', 'proxy.example.com')
6814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    Test for no trailing '/' case:
6834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    >>> _parse_proxy('http://joe:password@proxy.example.com')
6854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    ('http', 'joe', 'password', 'proxy.example.com')
6864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
6874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """
6884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    scheme, r_scheme = splittype(proxy)
6894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    if not r_scheme.startswith("/"):
6904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # authority
6914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        scheme = None
6924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        authority = proxy
6934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    else:
6944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # URL
6954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if not r_scheme.startswith("//"):
6964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            raise ValueError("proxy URL with no authority: %r" % proxy)
6974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # We have an authority, so for RFC 3986-compliant URLs (by ss 3.
6984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # and 3.3.), path is empty or starts with '/'
6994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        end = r_scheme.find("/", 2)
7004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if end == -1:
7014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            end = None
7024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        authority = r_scheme[2:end]
7034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    userinfo, hostport = splituser(authority)
7044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    if userinfo is not None:
7054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        user, password = splitpasswd(userinfo)
7064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    else:
7074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        user = password = None
7084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return scheme, user, password, hostport
7094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
7104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass ProxyHandler(BaseHandler):
7114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # Proxies must be in front
7124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    handler_order = 100
7134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
7144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def __init__(self, proxies=None):
7154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if proxies is None:
7164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            proxies = getproxies()
7174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        assert hasattr(proxies, 'has_key'), "proxies must be a mapping"
7184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.proxies = proxies
7194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        for type, url in proxies.items():
7204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            setattr(self, '%s_open' % type,
7214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    lambda r, proxy=url, type=type, meth=self.proxy_open: \
7224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    meth(r, proxy, type))
7234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
7244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def proxy_open(self, req, proxy, type):
7254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        orig_type = req.get_type()
7264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        proxy_type, user, password, hostport = _parse_proxy(proxy)
7274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
7284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if proxy_type is None:
7294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            proxy_type = orig_type
7304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
7314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if req.host and proxy_bypass(req.host):
7324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return None
7334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
7344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if user and password:
7354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            user_pass = '%s:%s' % (unquote(user), unquote(password))
7364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            creds = base64.b64encode(user_pass).strip()
7374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            req.add_header('Proxy-authorization', 'Basic ' + creds)
7384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        hostport = unquote(hostport)
7394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        req.set_proxy(hostport, proxy_type)
7404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
7414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if orig_type == proxy_type or orig_type == 'https':
7424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # let other handlers take care of it
7434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return None
7444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
7454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # need to start over, because the other handlers don't
7464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # grok the proxy's URL type
7474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # e.g. if we have a constructor arg proxies like so:
7484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # {'http': 'ftp://proxy.example.com'}, we may end up turning
7494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # a request for http://acme.example.com/a into one for
7504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # ftp://proxy.example.com/a
7514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return self.parent.open(req, timeout=req.timeout)
7524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
7534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPPasswordMgr:
7544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
7554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def __init__(self):
7564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.passwd = {}
7574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
7584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def add_password(self, realm, uri, user, passwd):
7594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # uri could be a single URI or a sequence
7604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if isinstance(uri, basestring):
7614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            uri = [uri]
7624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if not realm in self.passwd:
7634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            self.passwd[realm] = {}
7644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        for default_port in True, False:
7654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            reduced_uri = tuple(
7664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                [self.reduce_uri(u, default_port) for u in uri])
7674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            self.passwd[realm][reduced_uri] = (user, passwd)
7684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
7694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def find_user_password(self, realm, authuri):
7704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        domains = self.passwd.get(realm, {})
7714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        for default_port in True, False:
7724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            reduced_authuri = self.reduce_uri(authuri, default_port)
7734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            for uris, authinfo in domains.iteritems():
7744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                for uri in uris:
7754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    if self.is_suburi(uri, reduced_authuri):
7764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                        return authinfo
7774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return None, None
7784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
7794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def reduce_uri(self, uri, default_port=True):
7804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        """Accept authority or URI and extract only the authority and path."""
7814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # note HTTP URLs do not have a userinfo component
7824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        parts = urlparse.urlsplit(uri)
7834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if parts[1]:
7844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # URI
7854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            scheme = parts[0]
7864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            authority = parts[1]
7874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            path = parts[2] or '/'
7884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
7894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # host or host:port
7904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            scheme = None
7914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            authority = uri
7924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            path = '/'
7934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        host, port = splitport(authority)
7944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if default_port and port is None and scheme is not None:
7954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            dport = {"http": 80,
7964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                     "https": 443,
7974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                     }.get(scheme)
7984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if dport is not None:
7994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                authority = "%s:%d" % (host, dport)
8004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return authority, path
8014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def is_suburi(self, base, test):
8034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        """Check if test is below base in a URI tree
8044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        Both args must be URIs in reduced form.
8064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        """
8074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if base == test:
8084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return True
8094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if base[0] != test[0]:
8104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return False
8114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        common = posixpath.commonprefix((base[1], test[1]))
8124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if len(common) == len(base[1]):
8134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return True
8144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return False
8154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPPasswordMgrWithDefaultRealm(HTTPPasswordMgr):
8184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def find_user_password(self, realm, authuri):
8204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        user, password = HTTPPasswordMgr.find_user_password(self, realm,
8214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                                                            authuri)
8224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if user is not None:
8234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return user, password
8244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return HTTPPasswordMgr.find_user_password(self, None, authuri)
8254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass AbstractBasicAuthHandler:
8284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # XXX this allows for multiple auth-schemes, but will stupidly pick
8304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # the last one with a realm specified.
8314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # allow for double- and single-quoted realm values
8334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # (single quotes are a violation of the RFC, but appear in the wild)
8344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+'
8354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    'realm=(["\']?)([^"\']*)\\2', re.I)
8364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # XXX could pre-emptively send auth info already accepted (RFC 2617,
8384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # end of section 2, and section 1.2 immediately after "credentials"
8394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # production).
8404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def __init__(self, password_mgr=None):
8424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if password_mgr is None:
8434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            password_mgr = HTTPPasswordMgr()
8444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.passwd = password_mgr
8454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.add_password = self.passwd.add_password
8464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.retried = 0
8474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def reset_retry_count(self):
8494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.retried = 0
8504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def http_error_auth_reqed(self, authreq, host, req, headers):
8524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # host may be an authority (without userinfo) or a URL with an
8534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # authority
8544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # XXX could be multiple headers
8554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        authreq = headers.get(authreq, None)
8564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if self.retried > 5:
8584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # retry sending the username:password 5 times before failing.
8594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            raise HTTPError(req.get_full_url(), 401, "basic auth failed",
8604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                            headers, None)
8614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
8624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            self.retried += 1
8634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if authreq:
8654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            mo = AbstractBasicAuthHandler.rx.search(authreq)
8664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if mo:
8674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                scheme, quote, realm = mo.groups()
8684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                if quote not in ['"', "'"]:
8694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    warnings.warn("Basic Auth Realm was unquoted",
8704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                                  UserWarning, 2)
8714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                if scheme.lower() == 'basic':
8724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    response = self.retry_http_basic_auth(host, req, realm)
8734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    if response and response.code != 401:
8744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                        self.retried = 0
8754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    return response
8764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def retry_http_basic_auth(self, host, req, realm):
8784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        user, pw = self.passwd.find_user_password(realm, host)
8794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if pw is not None:
8804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            raw = "%s:%s" % (user, pw)
8814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            auth = 'Basic %s' % base64.b64encode(raw).strip()
8824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if req.headers.get(self.auth_header, None) == auth:
8834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                return None
8844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            req.add_unredirected_header(self.auth_header, auth)
8854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return self.parent.open(req, timeout=req.timeout)
8864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
8874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return None
8884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler):
8914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    auth_header = 'Authorization'
8934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
8944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def http_error_401(self, req, fp, code, msg, headers):
8954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        url = req.get_full_url()
8964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        response = self.http_error_auth_reqed('www-authenticate',
8974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                                              url, req, headers)
8984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.reset_retry_count()
8994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return response
9004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
9014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
9024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass ProxyBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler):
9034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
9044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    auth_header = 'Proxy-authorization'
9054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
9064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def http_error_407(self, req, fp, code, msg, headers):
9074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # http_error_auth_reqed requires that there is no userinfo component in
9084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # authority.  Assume there isn't one, since urllib2 does not (and
9094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # should not, RFC 3986 s. 3.2.1) support requests for URLs containing
9104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # userinfo.
9114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        authority = req.get_host()
9124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        response = self.http_error_auth_reqed('proxy-authenticate',
9134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                                          authority, req, headers)
9144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.reset_retry_count()
9154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return response
9164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
9174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
9184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef randombytes(n):
9194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """Return n random bytes."""
9204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # Use /dev/urandom if it is available.  Fall back to random module
9214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # if not.  It might be worthwhile to extend this function to use
9224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # other platform-specific mechanisms for getting random bytes.
9234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    if os.path.exists("/dev/urandom"):
9244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        f = open("/dev/urandom")
9254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        s = f.read(n)
9264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        f.close()
9274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return s
9284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    else:
9294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        L = [chr(random.randrange(0, 256)) for i in range(n)]
9304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return "".join(L)
9314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
9324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass AbstractDigestAuthHandler:
9334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # Digest authentication is specified in RFC 2617.
9344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
9354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # XXX The client does not inspect the Authentication-Info header
9364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # in a successful response.
9374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
9384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # XXX It should be possible to test this implementation against
9394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # a mock server that just generates a static set of challenges.
9404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
9414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # XXX qop="auth-int" supports is shaky
9424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
9434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def __init__(self, passwd=None):
9444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if passwd is None:
9454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            passwd = HTTPPasswordMgr()
9464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.passwd = passwd
9474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.add_password = self.passwd.add_password
9484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.retried = 0
9494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.nonce_count = 0
9504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.last_nonce = None
9514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
9524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def reset_retry_count(self):
9534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.retried = 0
9544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
9554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def http_error_auth_reqed(self, auth_header, host, req, headers):
9564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        authreq = headers.get(auth_header, None)
9574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if self.retried > 5:
9584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # Don't fail endlessly - if we failed once, we'll probably
9594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # fail a second time. Hm. Unless the Password Manager is
9604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # prompting for the information. Crap. This isn't great
9614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # but it's better than the current 'repeat until recursion
9624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # depth exceeded' approach <wink>
9634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            raise HTTPError(req.get_full_url(), 401, "digest auth failed",
9644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                            headers, None)
9654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
9664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            self.retried += 1
9674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if authreq:
9684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            scheme = authreq.split()[0]
9694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if scheme.lower() == 'digest':
9704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                return self.retry_http_digest_auth(req, authreq)
9714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
9724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def retry_http_digest_auth(self, req, auth):
9734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        token, challenge = auth.split(' ', 1)
9744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        chal = parse_keqv_list(parse_http_list(challenge))
9754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        auth = self.get_authorization(req, chal)
9764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if auth:
9774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            auth_val = 'Digest %s' % auth
9784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if req.headers.get(self.auth_header, None) == auth_val:
9794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                return None
9804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            req.add_unredirected_header(self.auth_header, auth_val)
9814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            resp = self.parent.open(req, timeout=req.timeout)
9824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return resp
9834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
9844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def get_cnonce(self, nonce):
9854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # The cnonce-value is an opaque
9864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # quoted string value provided by the client and used by both client
9874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # and server to avoid chosen plaintext attacks, to provide mutual
9884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # authentication, and to provide some message integrity protection.
9894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # This isn't a fabulous effort, but it's probably Good Enough.
9904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        dig = hashlib.sha1("%s:%s:%s:%s" % (self.nonce_count, nonce, time.ctime(),
9914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                                            randombytes(8))).hexdigest()
9924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return dig[:16]
9934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
9944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def get_authorization(self, req, chal):
9954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        try:
9964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            realm = chal['realm']
9974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            nonce = chal['nonce']
9984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            qop = chal.get('qop')
9994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            algorithm = chal.get('algorithm', 'MD5')
10004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # mod_digest doesn't send an opaque, even though it isn't
10014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # supposed to be optional
10024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            opaque = chal.get('opaque', None)
10034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        except KeyError:
10044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return None
10054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        H, KD = self.get_algorithm_impls(algorithm)
10074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if H is None:
10084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return None
10094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        user, pw = self.passwd.find_user_password(realm, req.get_full_url())
10114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if user is None:
10124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return None
10134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # XXX not implemented yet
10154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if req.has_data():
10164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            entdig = self.get_entity_digest(req.get_data(), chal)
10174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
10184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            entdig = None
10194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        A1 = "%s:%s:%s" % (user, realm, pw)
10214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        A2 = "%s:%s" % (req.get_method(),
10224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                        # XXX selector: what about proxies and full urls
10234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                        req.get_selector())
10244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if qop == 'auth':
10254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if nonce == self.last_nonce:
10264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                self.nonce_count += 1
10274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            else:
10284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                self.nonce_count = 1
10294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                self.last_nonce = nonce
10304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            ncvalue = '%08x' % self.nonce_count
10324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            cnonce = self.get_cnonce(nonce)
10334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            noncebit = "%s:%s:%s:%s:%s" % (nonce, ncvalue, cnonce, qop, H(A2))
10344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            respdig = KD(H(A1), noncebit)
10354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        elif qop is None:
10364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            respdig = KD(H(A1), "%s:%s" % (nonce, H(A2)))
10374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
10384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # XXX handle auth-int.
10394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            raise URLError("qop '%s' is not supported." % qop)
10404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # XXX should the partial digests be encoded too?
10424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        base = 'username="%s", realm="%s", nonce="%s", uri="%s", ' \
10444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao               'response="%s"' % (user, realm, nonce, req.get_selector(),
10454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                                  respdig)
10464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if opaque:
10474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            base += ', opaque="%s"' % opaque
10484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if entdig:
10494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            base += ', digest="%s"' % entdig
10504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        base += ', algorithm="%s"' % algorithm
10514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if qop:
10524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            base += ', qop=auth, nc=%s, cnonce="%s"' % (ncvalue, cnonce)
10534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return base
10544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def get_algorithm_impls(self, algorithm):
10564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # algorithm should be case-insensitive according to RFC2617
10574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        algorithm = algorithm.upper()
10584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # lambdas assume digest modules are imported at the top level
10594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if algorithm == 'MD5':
10604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            H = lambda x: hashlib.md5(x).hexdigest()
10614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        elif algorithm == 'SHA':
10624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            H = lambda x: hashlib.sha1(x).hexdigest()
10634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # XXX MD5-sess
10644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        KD = lambda s, d: H("%s:%s" % (s, d))
10654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return H, KD
10664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def get_entity_digest(self, data, chal):
10684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # XXX not implemented yet
10694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return None
10704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler):
10734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """An authentication protocol defined by RFC 2069
10744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    Digest authentication improves on basic authentication because it
10764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    does not transmit passwords in the clear.
10774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """
10784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    auth_header = 'Authorization'
10804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    handler_order = 490  # before Basic auth
10814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def http_error_401(self, req, fp, code, msg, headers):
10834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        host = urlparse.urlparse(req.get_full_url())[1]
10844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        retry = self.http_error_auth_reqed('www-authenticate',
10854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                                           host, req, headers)
10864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.reset_retry_count()
10874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return retry
10884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass ProxyDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler):
10914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    auth_header = 'Proxy-Authorization'
10934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    handler_order = 490  # before Basic auth
10944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
10954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def http_error_407(self, req, fp, code, msg, headers):
10964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        host = req.get_host()
10974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        retry = self.http_error_auth_reqed('proxy-authenticate',
10984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                                           host, req, headers)
10994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.reset_retry_count()
11004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return retry
11014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass AbstractHTTPHandler(BaseHandler):
11034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def __init__(self, debuglevel=0):
11054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self._debuglevel = debuglevel
11064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def set_http_debuglevel(self, level):
11084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self._debuglevel = level
11094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def do_request_(self, request):
11114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        host = request.get_host()
11124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if not host:
11134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            raise URLError('no host given')
11144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if request.has_data():  # POST
11164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            data = request.get_data()
11174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if not request.has_header('Content-type'):
11184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                request.add_unredirected_header(
11194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    'Content-type',
11204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    'application/x-www-form-urlencoded')
11214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if not request.has_header('Content-length'):
11224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                request.add_unredirected_header(
11234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    'Content-length', '%d' % len(data))
11244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        sel_host = host
11264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if request.has_proxy():
11274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            scheme, sel = splittype(request.get_selector())
11284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            sel_host, sel_path = splithost(sel)
11294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if not request.has_header('Host'):
11314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            request.add_unredirected_header('Host', sel_host)
11324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        for name, value in self.parent.addheaders:
11334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            name = name.capitalize()
11344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if not request.has_header(name):
11354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                request.add_unredirected_header(name, value)
11364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return request
11384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def do_open(self, http_class, req):
11404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        """Return an addinfourl object for the request, using http_class.
11414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        http_class must implement the HTTPConnection API from httplib.
11434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        The addinfourl return value is a file-like object.  It also
11444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        has methods and attributes including:
11454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            - info(): return a mimetools.Message object for the headers
11464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            - geturl(): return the original request URL
11474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            - code: HTTP status code
11484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        """
11494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        host = req.get_host()
11504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if not host:
11514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            raise URLError('no host given')
11524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        h = http_class(host, timeout=req.timeout) # will parse host:port
11544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        h.set_debuglevel(self._debuglevel)
11554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        headers = dict(req.unredirected_hdrs)
11574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        headers.update(dict((k, v) for k, v in req.headers.items()
11584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                            if k not in headers))
11594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # We want to make an HTTP/1.1 request, but the addinfourl
11614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # class isn't prepared to deal with a persistent connection.
11624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # It will try to read all remaining data from the socket,
11634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # which will block while the server waits for the next request.
11644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # So make sure the connection gets closed after the (only)
11654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # request.
11664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        headers["Connection"] = "close"
11674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        headers = dict(
11684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            (name.title(), val) for name, val in headers.items())
11694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if req._tunnel_host:
11714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            tunnel_headers = {}
11724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            proxy_auth_hdr = "Proxy-Authorization"
11734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if proxy_auth_hdr in headers:
11744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                tunnel_headers[proxy_auth_hdr] = headers[proxy_auth_hdr]
11754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                # Proxy-Authorization should not be sent to origin
11764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                # server.
11774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                del headers[proxy_auth_hdr]
11784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            h.set_tunnel(req._tunnel_host, headers=tunnel_headers)
11794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        try:
11814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            h.request(req.get_method(), req.get_selector(), req.data, headers)
11824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        except socket.error, err: # XXX what error?
11834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            h.close()
11844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            raise URLError(err)
11854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
11864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            try:
11874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                r = h.getresponse(buffering=True)
11884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            except TypeError: # buffering kw not supported
11894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                r = h.getresponse()
11904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # Pick apart the HTTPResponse object to get the addinfourl
11924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # object initialized properly.
11934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # Wrap the HTTPResponse object in socket's file object adapter
11954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # for Windows.  That adapter calls recv(), so delegate recv()
11964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # to read().  This weird wrapping allows the returned object to
11974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # have readline() and readlines() methods.
11984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
11994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # XXX It might be better to extract the read buffering code
12004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # out of socket._fileobject() and into a base class.
12014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        r.recv = r.read
12034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        fp = socket._fileobject(r, close=True)
12044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        resp = addinfourl(fp, r.msg, req.get_full_url())
12064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        resp.code = r.status
12074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        resp.msg = r.reason
12084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return resp
12094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPHandler(AbstractHTTPHandler):
12124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def http_open(self, req):
12144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return self.do_open(httplib.HTTPConnection, req)
12154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    http_request = AbstractHTTPHandler.do_request_
12174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoif hasattr(httplib, 'HTTPS'):
12194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    class HTTPSHandler(AbstractHTTPHandler):
12204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        def https_open(self, req):
12224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return self.do_open(httplib.HTTPSConnection, req)
12234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        https_request = AbstractHTTPHandler.do_request_
12254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass HTTPCookieProcessor(BaseHandler):
12274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def __init__(self, cookiejar=None):
12284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        import cookielib
12294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if cookiejar is None:
12304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            cookiejar = cookielib.CookieJar()
12314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.cookiejar = cookiejar
12324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def http_request(self, request):
12344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.cookiejar.add_cookie_header(request)
12354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return request
12364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def http_response(self, request, response):
12384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.cookiejar.extract_cookies(response, request)
12394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return response
12404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    https_request = http_request
12424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    https_response = http_response
12434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass UnknownHandler(BaseHandler):
12454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def unknown_open(self, req):
12464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        type = req.get_type()
12474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        raise URLError('unknown url type: %s' % type)
12484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef parse_keqv_list(l):
12504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """Parse list of key=value strings where keys are not duplicated."""
12514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    parsed = {}
12524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    for elt in l:
12534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        k, v = elt.split('=', 1)
12544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if v[0] == '"' and v[-1] == '"':
12554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            v = v[1:-1]
12564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        parsed[k] = v
12574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return parsed
12584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef parse_http_list(s):
12604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """Parse lists as described by RFC 2068 Section 2.
12614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    In particular, parse comma-separated lists where the elements of
12634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    the list may include quoted-strings.  A quoted-string could
12644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    contain a comma.  A non-quoted string could have quotes in the
12654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    middle.  Neither commas nor quotes count if they are escaped.
12664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    Only double-quotes count, not single-quotes.
12674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    """
12684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    res = []
12694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    part = ''
12704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    escape = quote = False
12724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    for cur in s:
12734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if escape:
12744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            part += cur
12754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            escape = False
12764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            continue
12774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if quote:
12784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if cur == '\\':
12794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                escape = True
12804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                continue
12814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            elif cur == '"':
12824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                quote = False
12834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            part += cur
12844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            continue
12854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if cur == ',':
12874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            res.append(part)
12884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            part = ''
12894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            continue
12904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if cur == '"':
12924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            quote = True
12934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        part += cur
12954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
12964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # append last part
12974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    if part:
12984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        res.append(part)
12994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
13004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    return [part.strip() for part in res]
13014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
13024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaodef _safe_gethostbyname(host):
13034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    try:
13044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return socket.gethostbyname(host)
13054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    except socket.gaierror:
13064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return None
13074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
13084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass FileHandler(BaseHandler):
13094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # Use local file or FTP depending on form of URL
13104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def file_open(self, req):
13114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        url = req.get_selector()
13124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if url[:2] == '//' and url[2:3] != '/' and (req.host and
13134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                req.host != 'localhost'):
13144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            req.type = 'ftp'
13154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return self.parent.open(req)
13164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
13174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return self.open_local_file(req)
13184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
13194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # names for the localhost
13204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    names = None
13214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def get_names(self):
13224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if FileHandler.names is None:
13234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            try:
13244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                FileHandler.names = tuple(
13254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    socket.gethostbyname_ex('localhost')[2] +
13264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    socket.gethostbyname_ex(socket.gethostname())[2])
13274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            except socket.gaierror:
13284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                FileHandler.names = (socket.gethostbyname('localhost'),)
13294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return FileHandler.names
13304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
13314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # not entirely sure what the rules are here
13324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def open_local_file(self, req):
13334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        import email.utils
13344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        import mimetypes
13354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        host = req.get_host()
13364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        filename = req.get_selector()
13374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        localfile = url2pathname(filename)
13384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        try:
13394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            stats = os.stat(localfile)
13404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            size = stats.st_size
13414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            modified = email.utils.formatdate(stats.st_mtime, usegmt=True)
13424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            mtype = mimetypes.guess_type(filename)[0]
13434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            headers = mimetools.Message(StringIO(
13444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                'Content-type: %s\nContent-length: %d\nLast-modified: %s\n' %
13454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                (mtype or 'text/plain', size, modified)))
13464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if host:
13474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                host, port = splitport(host)
13484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if not host or \
13494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                (not port and _safe_gethostbyname(host) in self.get_names()):
13504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                if host:
13514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    origurl = 'file://' + host + filename
13524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                else:
13534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    origurl = 'file://' + filename
13544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                return addinfourl(open(localfile, 'rb'), headers, origurl)
13554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        except OSError, msg:
13564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            # urllib2 users shouldn't expect OSErrors coming from urlopen()
13574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            raise URLError(msg)
13584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        raise URLError('file not on local host')
13594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
13604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass FTPHandler(BaseHandler):
13614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def ftp_open(self, req):
13624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        import ftplib
13634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        import mimetypes
13644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        host = req.get_host()
13654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if not host:
13664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            raise URLError('ftp error: no host given')
13674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        host, port = splitport(host)
13684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if port is None:
13694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            port = ftplib.FTP_PORT
13704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
13714adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            port = int(port)
13724adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
13734adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # username/password handling
13744adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        user, host = splituser(host)
13754adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if user:
13764adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            user, passwd = splitpasswd(user)
13774adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
13784adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            passwd = None
13794adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        host = unquote(host)
13804adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        user = user or ''
13814adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        passwd = passwd or ''
13824adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
13834adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        try:
13844adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            host = socket.gethostbyname(host)
13854adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        except socket.error, msg:
13864adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            raise URLError(msg)
13874adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        path, attrs = splitattr(req.get_selector())
13884adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        dirs = path.split('/')
13894adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        dirs = map(unquote, dirs)
13904adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        dirs, file = dirs[:-1], dirs[-1]
13914adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if dirs and not dirs[0]:
13924adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            dirs = dirs[1:]
13934adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        try:
13944adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
13954adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            type = file and 'I' or 'D'
13964adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            for attr in attrs:
13974adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                attr, value = splitvalue(attr)
13984adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                if attr.lower() == 'type' and \
13994adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                   value in ('a', 'A', 'i', 'I', 'd', 'D'):
14004adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    type = value.upper()
14014adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            fp, retrlen = fw.retrfile(file, type)
14024adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            headers = ""
14034adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            mtype = mimetypes.guess_type(req.get_full_url())[0]
14044adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if mtype:
14054adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                headers += "Content-type: %s\n" % mtype
14064adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            if retrlen is not None and retrlen >= 0:
14074adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                headers += "Content-length: %d\n" % retrlen
14084adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            sf = StringIO(headers)
14094adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            headers = mimetools.Message(sf)
14104adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            return addinfourl(fp, headers, req.get_full_url())
14114adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        except ftplib.all_errors, msg:
14124adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            raise URLError, ('ftp error: %s' % msg), sys.exc_info()[2]
14134adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
14144adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def connect_ftp(self, user, passwd, host, port, dirs, timeout):
14154adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        fw = ftpwrapper(user, passwd, host, port, dirs, timeout,
14164adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                        persistent=False)
14174adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao##        fw.ftp.set_debuglevel(1)
14184adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return fw
14194adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
14204adfde8bc82dd39f59e0445588c3e599ada477dJosh Gaoclass CacheFTPHandler(FTPHandler):
14214adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # XXX would be nice to have pluggable cache strategies
14224adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    # XXX this stuff is definitely not thread safe
14234adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def __init__(self):
14244adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.cache = {}
14254adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.timeout = {}
14264adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.soonest = 0
14274adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.delay = 60
14284adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.max_conns = 16
14294adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
14304adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def setTimeout(self, t):
14314adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.delay = t
14324adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
14334adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def setMaxConns(self, m):
14344adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.max_conns = m
14354adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
14364adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def connect_ftp(self, user, passwd, host, port, dirs, timeout):
14374adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        key = user, host, port, '/'.join(dirs), timeout
14384adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if key in self.cache:
14394adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            self.timeout[key] = time.time() + self.delay
14404adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        else:
14414adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            self.cache[key] = ftpwrapper(user, passwd, host, port, dirs, timeout)
14424adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            self.timeout[key] = time.time() + self.delay
14434adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.check_cache()
14444adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        return self.cache[key]
14454adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
14464adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def check_cache(self):
14474adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # first check for old ones
14484adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        t = time.time()
14494adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if self.soonest <= t:
14504adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            for k, v in self.timeout.items():
14514adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                if v < t:
14524adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    self.cache[k].close()
14534adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    del self.cache[k]
14544adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    del self.timeout[k]
14554adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.soonest = min(self.timeout.values())
14564adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
14574adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        # then check the size
14584adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        if len(self.cache) == self.max_conns:
14594adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            for k, v in self.timeout.items():
14604adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                if v == self.soonest:
14614adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    del self.cache[k]
14624adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    del self.timeout[k]
14634adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao                    break
14644adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            self.soonest = min(self.timeout.values())
14654adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao
14664adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao    def clear_cache(self):
14674adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        for conn in self.cache.values():
14684adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao            conn.close()
14694adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.cache.clear()
14704adfde8bc82dd39f59e0445588c3e599ada477dJosh Gao        self.timeout.clear()
1471