www.wsgi.org Documentation Release 0.9

Feb 25, 2020

Contents

1 Contents 1

2 Contributing 43

3 Indices and tables 45

Bibliography 47

Index 49

i ii CHAPTER 1

Contents

1.1 What is WSGI?

WSGI is the Gateway Interface. It is a specification that describes how a web server communicates with web applications, and how web applications can be chained together to process one request. WSGI is a Python standard described in detail in PEP 3333. For more, see Learn about WSGI.

1.2 Learn about WSGI

• WSGI Tutorial by Clodoaldo Neto • WSGI Explorations in Python by Mike Orr • An Introduction to the Python Web Server Gateway Interface (WSGI) by Titus Brown • A Do-It-Yourself Framework by Ian Bicking • URL Parsing with WSGI by Ian Bicking • WSGI and WSGI Middleware is Easy by Ben Bangert • WSGI - Gateway or Glue by Mark Rees (particularly good as a starting point) • Mix and match Web components with Python WSGI by Uche Ogbuji • ‘Hello World with WSGI’ and WSGI Middleware by Rufus Pollock • Getting started with WSGI by Armin Ronacher • Why so many Python web frameworks? by Joe Gregorio (outlines the creation of a using several WSGI-based tools) • Introducing WSGI: Python’s Secret Web Weapon by James Gardner [xml2006-09] • Introducing WSGI: Python’s Secret Web Weapon, Part Two by James Gardner [xml2006-10]

1 www.wsgi.org Documentation, Release 0.9

• test.wsgi a WSGI test app showing whether your WSGI environment is working (and also outputs some interest- ing informations like Python version, sys.path, WSGI environment, etc.). It can be directly used for mod_wsgi and easily for all other WSGI servers. When started directly from command line, it tries to use wsgiref’s simple server to serve the application.

1.3 Frameworks that run on WSGI

This is an alphabetic list of frameworks known to support WSGI. The level and nature of their support sometimes varies, as do the APIs they provide. The descriptions here focus on that, and not the flavor of the frameworks them- selves. If you want to know more, follow the links!

Note: Some frameworks really only support using pluggable WSGI servers, which means you get a number of options from HTTP, FastCGI, SCGI, threaded, forking, etc. However, not all such frameworks live well alongside other frameworks in the same process, or may require extra configuration. This is what is meant by noting when a framework supports WSGI servers, vs. a framework that supports a greater number of WSGI compositions, especially the kind of things noted in Middleware and libraries for WSGI Please feel free to expand on the list, the descriptions, or to make corrections. appier Appier is an object-oriented Python web framework built for super fast app development. It’s as lightweight as possible, but not too lightweight. It gives you the power of bigger frameworks, without their complexity. bobo Bobo is a light-weight framework. Its goal is to be easy to use and remember. Bottle is a fast and simple micro-framework for small web-applications. It offers request dispatching (Routes) with url parameter support, Templates, key/value Databases, a build-in HTTP Server and adapters for many third party WSGI/HTTP-server and template engines. All in a single file and with no dependencies other than the Python Standard Library. CherryPy CherryPy is a pythonic, object-oriented web development framework. Includes support for WSGI servers. CherryPy 3 includes better support for living alongside other WSGI frameworks, applications, and middleware. Includes support for WSGI servers Falcon Falcon is a high-performance Python framework for building cloud APIs. It encourages the REST architec- tural style, and tries to do as little as possible while remaining highly effective. Flask is a for Python based on Werkzeug, Jinja 2 and good intentions. It inherits its high WSGI usage and compliance from Werkzeug. notmm The notmm toolkit is a fork of Django that doesn’t get in your way. Features includes improved WSGI support (Paste), SQLAlchemy, and very few developers! ;-) PoorWSGI Poor WSGI for Python is light WGI connector with uri routing between WSGI server and your applica- tion. It have mod_python compatible request object, which is post to all uri or http state handler. Pycnic Pycnic is a mimimalist JSON API oriented framework for Python 2.7 and 3.x. It provides routing, cookies, and JSON error handling, while maintaining a small codebase. Pyramid Merger of the Pylons and repoze.bfg projects, Pyramid is a minimalist web framework aiming at compos- ability and making developers paying only for what they use. QWeb Another WSGI framework (not sure what the distinguishing features are) repoze.zope2 A module that implements an analogue of the 2 ZPublisher, with some major simplifications and cleanups. Its core mission is to allow publishing existing Zope2 applications in a WSGI environment that externalizes some of the features of “classic” Zope2 into middleware.

2 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

TurboGears Database-driven app in minutes; inherits its WSGI support from CherryPy. web.py Makes web apps. A small RESTful library. A full stack framework includes its own Database Abstraction Layer (with support for SQLite, MySQL, PostgreSQL, MSSQL, DB2, Informix, Oracle, FireBase, Ingres and ), its own template laguage, and a web based IDE. web2py itself is a WSGI app. Not related to web.py. WebCore A nanoframework (only a few hundred lines of code) offering an entry_points-based dependency graph- ing extension system, MVC separation, reusable namespaces, and universal URL dispatch protocol with tight WebOb integration and natural Python semantics. weblayer weblayer is a lightweight, componentised package for writing WSGI applications. Zope 3 The venerable Python web framework, recreated anew in Zope 3, and now a WSGI application. It seems to have some WSGI bits deep inside the publisher, but they aren’t really documented at this time.

1.3.1 Deprecated Systems

These systems still exist but got replaced by others or are unmaintained. Clever Harold Clever Harold is an ambitious web framework. It has many features for rapid, reusable, and reliable construction. Clever Harold is a complete WSGI framework. To build an application, you pick and choose the servers and components that fit your needs. Colubrid Colubrid is a WSGI publisher which simplifies python web developement. Colubrid is not a framework :-) Although some people like the idea of having found a framework in colubrid. All colubrid does for you is parsing form data / url parameters / cookies and providing a url dispatcher. Colubrid was replaced by Werkzeug. Nettri Nettri is a newcomer of Python World. It is under heavy development. Features includes CMS, Own template Engine, modules and more coming. Paste WebKit An implementation of the Webware servlet API using Paste infrastructure and WSGI. pycoon Pythonic web development framework based on XML pipelines and WSGI Pylons Full-stack Python web development framework combining the very best from the worlds of Ruby, Python and . Pylons has been superseded by pyramid. repoze.bfg A Python WSGI-compliant web framework inspired by Zope, Pylons, and Django with built-in security and templating. repoze.bfg was renamed pyramid and moved under the Pylons project. RhubarbTart A pure-WSGI dispatcher and simple framework, inspired by CherryPy. simpleweb A simple Python WSGI-compliant web framework inspired by Django, TurboGears, and web.py. skunk.web A totally WSGI-ified version of SkunkWeb. Wareweb A rethinking of the Webware/WebKit servlet model, in a pure-WSGI framework. Not used widely. WebStack WebStack is a package which provides a simple, common API for Python Web applications, allowing such applications to run within many different environments with virtually no changes to application code.

1.4 Servers which support WSGI

This is an alphabetic list of WSGI servers. In some cases these are WSGI-only systems, in other cases a package includes a server.

1.4. Servers which support WSGI 3 www.wsgi.org Documentation, Release 0.9

Please feel free to expand the list or descriptions. Direct links to documentation on how to use the server is especially appreciated. ajp-wsgi A threaded/forking WSGI server implemented in C (it embeds a Python interpreter to run the actual application). It communicates with the web server via AJP, and is known to work with mod_jk and mod_proxy_ajp. Also available in an SCGI flavor. Aspen A pure-Python web server (using the CherryPy module mentioned next) with three hooks to hang your WSGI on. .wsgiserver CherryPy’s “high-speed, production ready, thread pooled, generic WSGI server.” Includes SSL support. Supports Transfer-Encoding: chunked. For details on running foreign (non-CherryPy) applications under the CherryPy WSGI server, see WSGI Support. See also the CherryPy wiki ModWSGI page. chiral.web.httpd A fast HTTP server supporting WSGI, with extensions for Coroutine-based pages with deeply-integrated COMET support. cogen.web.wsgi WSGI server with extensions for coroutine oriented programming. FAPWS Fapws is a WSGI binding between Python and libev. See also: author’s block, GoogleGroup. fcgiapp fcgiapp is a Python wrapper for the C FastCGI SDK. It’s used by PEAK’s FastCGI servers to provide WSGI-over-FastCGI. flup Includes threaded and forking versions of servers that support FastCGI, SCGI, and AJP protocols. gevent- WSGI-over-FastCGI server implemented using gevent coroutine-based networking library. Supports FastCGI connection multiplexing. Includes adapters for Django and other frameworks that use Past- eDeploy. WSGI HTTP Server for UNIX, fast clients and nothing else. This is a port of to Python and WSGI. ISAPI-WSGI An implementation of WSGI for running as a ISAPI extension under IIS. James James provides a very simple multi-threaded WSGI server implementation based on the HTTPServer from Python’s standard library. (unmaintained) Julep A WSGI Server inspired by Unicorn, written in pure Python.

4 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9 m2twisted WSGI server built with M2Crypto and .web2 with some SSL related tricks. Used with client side smart cards and it is also possible to run the HTTPS server with a key in a HSM (like a crypto token) modjy Modjy is a servlets to WSGI gateway that enables the running of jython WSGI applications inside java servlet containers. mod_wsgi Python WSGI adapter module for Apache NWSGI NWSGI is a .NET implementation of the Python WSGI specification for IronPython and IIS. This makes it easy to run Python web applications on Windows Server. This is a potential alternative to ISAPI + ISAPI_WSGI modules. netius Netius is a Python network library that can be used for the rapid creation of asynchronous non-blocking servers and clients. It has no dependencies, it’s cross-platform, and brings some sample netius-powered servers out of the box, namely a production-ready WSGI server. paste.httpserver Minimalistic threaded WSGI server built on BaseHTTPServer. Doesn’t support Transfer-Encoding: chun- ked. phusion passenger “proof of concept” WSGI since 2008 (1.x), support upgraded to “beta” in version 3 (with limitations e.g. requires Ruby even when unused) and first-class in Passenger 4. python-fastcgi python-fastcgi is a lightweight wrapper around the Open Market FastCGI C Library/SDK. It includes threaded and forking WSGI server implementations. Spawning twisted.web A WSGI server based on Twisted Web’s HTTP server (requires Twisted 8.2 or later). uWSGI Fast, self-healing, developer-friendly WSGI server, meant for professional deployment and development of Python Web applications. werkzeug.serving Werkzeug’s multithreaded and multiprocessed development server. Wraps wsgiref to add a reloader, multiprocessing, static files handling and SSL. wsgid Wsgid is a generic WSGI handler for mongrel2 webserver. Wsgid offers a complete daemon environment (start/stop/restart) to your app workers, including automatically re-spawning of processes. WSGIserver WSGIserver is a high-speed, production ready, thread pooled, generic WSGI server with SSL support for both Python 2 (2.6 and above) and Python 3 (3.1 and above). WSGIserver is a one file project with no dependency.

1.4. Servers which support WSGI 5 www.wsgi.org Documentation, Release 0.9

WSGIUtils Includes a threaded HTTP server. wsgiref(Python 3) Included as part of thef standard library since Python 2.5; it includes a threaded HTTP server, a CGI server (for running any WSGI application as a CGI script), and a framework for building other servers. For versions prior to Python 2.5, see wsgiref’s original home.

1.5 Applications that run on WSGI

Appwsgi Illustration of applications running on a modwsgi apache server. FSCSI search A syntax-aware web search interface for searching large source code file system trees (using the Python 2.6.1 distribution in the example, but it can be configured for any source tree). It is distributed as part of WHIFF and it uses external functionality from Nucular and Pygments. MoinMoin MoinMoin is a wiki engine written in Python. PyAMF PyAMF provides Action Message Format (AMF) support for Python that is compatible with the Flash Player. pydap pydap is a modular and extensible OPeNDAP server, used by the IPCC to serve model output. Roundup Roundup is a popular issue tracker which includes WSGI support. RUM Rum is a framework to develop CRUD web applications, usually used in the “admin” back-end of a website. soaplib A simple, easily extensible SOAP library that provides several useful tools for creating and publishing SOAP web services in Python such as on-demand WSDL generation for published services, a WSGI- compliant web application, support for complex class structures, binary attachments, a simple framework for creating additional serialization mechanisms, and a client library. Trac Trac is a popular issue tracker. It includes WSGI support in trac.web.wsgi Zine A blog application written in Python.

6 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

1.5.1 Deprecated

BrightContent Python weblog software built from reusable components. It offers many of the usual features of weblog engines, but its basic opration and plug-in model is based on WSGI. Many existing WSGI components can be plugged directly into Bright Content in order to enhance its functionality. Bright Content also has a set of specialized components for common weblog needs. Webskine Webskine is a simple weblog with an AJAX interface.

1.6 Middleware and libraries for WSGI

Barrel Flexible WSGI authentication and authorization tools. Beaker Lighweight WSGI sessions middleware. Beaker’s starts with the Perl Cache::Cache module, which was ported for use in Myghty. Beaker was then extracted from this code, and has been substantially rewritten and modernized since. Deliverance Deliverance is a tool to theme HTML, applying a consistent style to applications and static files regard- less of how they are implemented, and separating site-wide styling from application-level templating. hatom2atom hatom2atom provides Python tools for use with hAtom2Atom.xsl. Includes a test runner that uses html/atom file pairs to test for expected output and a WSGI app that acts as a proxy to transform hAtom docu- ments into Atom (that you are looking at now). lib537.httpy Smooths over WSGI’s worst warts. In addition to calling start_response and returning an iterable, httpy lets you return a string, or return or raise a Response object. Oort A WSGI-enabled toolkit for creating RDF-driven web apps. Paste Roughly a framework, though more of a set of tools for frameworks. Provides Integration layers with other frameworks like CherryPaste, DjangoPaste and zope.paste. Paste Deploy Configuration system for WSGI applications, servers, and middleware; both to configure individual components and to compose those components into a single running system. raptorizemw A layer of WSGI middleware that adds a velociraptor to every page served. Fact: every WSGI app is better with a raptor. Repoze Repoze is an effort to bring Zope technologies to the larger Python web development community by breaking Zope up into pieces that fit into a WSGI deployment model. This effort also allows existing Zope users to make use of WSGI technologies for development and deployment purposes, notably including the ability to run Zope 2 and applications under WSGI servers. SchevoWsgi Provides integration between Schevo and WSGI apps. selector This distribution provides WSGI middleware for “RESTful” mapping of URL paths to WSGI applications. Selector now also comes with components for environ based dispatch and on-the-fly middleware composition. static This distribution provides an easy way to include static content in your WSGI applications. There is a con- venience method for serving files located via pkg_resources. There are also facilities for serving mixed (static and dynamic) content using “magic” file handlers. Python 2.4 string substitution and Kid template support are provided and it is easy to roll your own handlers. Note that this distribution does not require Python 2.4 or Kid unless you want to use those types of templates.

1.6. Middleware and libraries for WSGI 7 www.wsgi.org Documentation, Release 0.9

ToscaWidgets A web widget toolkit for Python to aid in the creation, packaging, and distribution of common view elements normally used in the Web. ToscaWidgets is an almost complete rewrite of TurboGears 1.0’s widgets in the spirit of TurboGears 2.0 philosophy of repackaging its services as independent WSGI components for easier maintenance and reuse in other Python web applications or frameworks. urlrelay Simple RESTful URL dispatcher that passes HTTP requests to an WSGI application based on a matching a URL path regex pattern and, optionally, the HTTP request method. Werkzeug Werkzeug started as a simple collection of various utilities for WSGI applications and has become one of the most advanced WSGI utility modules. It includes a powerful debugger, full featured request and response objects, HTTP utilities to handle entity tags, cache control headers, HTTP dates, cookie handling, file uploads, a powerful URL routing system and a bunch of community contributed addon modules. WFront Front-door dispatcher that directs HTTP requests based on “virtual host”. Includes tools to isolate WSGI apps from server deployment details. WHIFF WSGI HTTP Integrated File System Frames WHIFF reduces application complexity by providing an in- frastructure for managing web application name spaces, a configuration template language for wiring named components into an application, and an applications interface for accessing named components from Python and modules. wsgiakismet Validates form submissions against the Akismet service to verify that they are not comment spam. wsgiauth WSGI authentication middleware. Supports HTTP basic, digest, IP, HTML form, and OpenID-based au- thentication. WSGIFilter A simple framework for doing output-filtering of WSGI content. Works well with WSGIRemote. wsgiform WSGI middleware for validating and parsing HTML form submissions. Supports automatic escaping of HTML and data sterilization. WSGI Intercept Redirects Python HTTP calls to an in-process WSGI application. This can allow HTTP API calls (e.g., REST, XML-RPC, etc) without actually touching the network. wsgilog WSGI logging and event reporting middleware. Supports logging events in WSGI applications to STDOUT, time rotated log files, email, syslog, and web servers. Also supports catching and sending HTML-formatted exception tracebacks to a web browser for debugging. WSGIRemote Client library for doing RPC-style internal subrequests in a WSGI stack. Also works for doing HTTP RPC requests. WSGIRewrite Middleware for URL rewriting, uses the same syntax as Apache’s mod_rewrite. wsgiserialize Object serialization middleware for WSGI. Supported object serialization formats include: XML-RPC, JSON, YaML, marshal, and pickle. wsgistate Session, HTTP cache control, and caching middleware for WSGI. Sessions are flup-compatible. Supports memory, filesystem, database, and memcached based backends. wsgi-statsd WSGI middleware that provides an easy way to time all requests and report to statsd. Measurement key names are automatically generated. WSGIUtils Includes a simple WSGI application (wsgiAdaptor) that provides basic authentication, signed cookies and persistent sessions. wsgiview Turns any TurboGears/Buffet template plug-ins into WSGI middleware. wsgize WSGI without the WSGI. Provides middleware for WSGI-enabling Python callables including: • Middleware that makes non-WSGI Python functions, callable classes, or methods into WSGI applications • Middleware that automatically handles generating WSGI-compliant HTTP response codes, headers, and compliant iterators

8 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

• An HTTP response generator • A secondary WSGI dispatcher yaro This distribution provides Yet Another Request Object (for WSGI) in a way that is intended to be simple and useful for web developers who don’t want to have to know a lot about WSGI to get the job done. It’s also a handy convenience for those who do like to get under the hood but would be happy to eliminate some boilerplate without the encumbrance of some all-singing-all-dancing framework.

1.6.1 deprecated

AuthKit AuthKit is an authentication and authorization toolkit for WSGI applications and frameworks. The authentication middleware part is essentially an extension of paste.auth and there is an adaptor module providing support for Pylons although it works with all WSGI apps. memento This distribution provides code reloading middleware for use with your WSGI applications. Upon recieving each request, it forgets everything that it has imported since the last request so that it is imported all over again. The concept was inspired by the RollBackImporter used by Steve Purcell in PyUnit webstring webstring is a template engine for whose favorite template language is Python. webstring can be used to generate any text format from a template with the additional advantage of advanced XML and HTML templating using the lxml and cElementTree libraries. WSGIOverlay Application-neutral macro templating language. Seems to be superseded by Deliverance. wsgixml WSGI middleware modules for XML processing

1.7 Testing tools for WSGI

Any HTTP-based testing system can be used with WSGI applications. Obviously any HTTP testing system can test any HTTP application. However, some testing frameworks work more intimately with WSGI, and provide the ability the call WSGI applica- tions in a controlled environment, with tracebacks and full use of debugging tools. WSGI Intercept Intercepts normal Python calls to httplib, and redirects them to a WSGI application running in-process. Any testing tools written in Python can be made to test WSGI applications in-process. Twill See Testing WSGI Apps with twill for a description of the specifics on plugging these together. WSGI Intercept was originally written for Twill. WebTest Extraction of paste.fixture.TestApp, rewriting portions to use WebOb. Allows for testing WSGI applications without having to start a WSGI server. cherrypy.test.webtest Extensions to unittest for web frameworks. webunit Unit test your websites with code that acts like a web browser. zope.testbrowser

1.7. Testing tools for WSGI 9 www.wsgi.org Documentation, Release 0.9

An easy to use programmatic web browser with special focus on testing. Used in Zope 3, but not Zope specific.

1.8 Presentations about WSGI

1.8.1 Videos

ReUsable Web Components with Python and Future Python Web Development (Google TechTalk, 2006, Ben Bangert) WSGI: Working together to solve the web’s problems (PyCon 2011, panel)

1.8.2 Slide decks

Developing Applications with the Web Server Gateway Interface (EuroPython 2006, James Gardner) Introduction to Web Programming with WSGI (EuroPython 2007, Michele Simionato)

1.9 Specifications related to WSGI

This page holds specifications (proposed, accepted, and withdrawn) that build on WSGI.

1.9.1 About these specifications

These specifications are written up here and discussed on WEB-SIG. Once accepted, these can all use the wsgiorg. prefix for their keys. Until accepted, please use x-wsgiorg. – this is primarily so that people who implement the specification before it is accepted will not leave out-of-spec implementations around (except the obvious ones due to the x-). To be “accepted” the proposal should have certain qualities: 1. The spec won’t change without good reason, so you can start implementing against it (once it is “approved”). 2. It’s useful in multiple contexts; if one implementation is all anyone will ever need, then just make your imple- mentation. Feel free to discuss it, but you don’t need anyone’s approval. 3. Some eyes have been on it, and it’s been reviewed by multiple people. There are certain advantages: 1. Having implemented either side, you can expect that maybe someone will care (either producing or consuming what you are looking for). 2. Someone won’t implement something they think is the same, but isn’t, because the document specifies the requirements sufficiently. 3. New proposals will take old proposals into account, and so they shouldn’t overlap or repeat their purposes. There’s no particular process for a proposal to become accepted. Someone else should like your proposal (+1, not just +0), and probably no one should be opposed (no -1’s). Unless noted otherwise, everything here can be assumed to be public domain (in keeping with the purpose of posting material here).

10 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

1.9.2 Accepted

Where to put information parsed out of the request path

Title wsgiorg.routing_args Author Ian Bicking Discussions-To Python Web-SIG Status Accepted Created 21-Oct-2006

Contents

• Where to put information parsed out of the request path – Abstract – Rationale – Specification – Types – Example

Abstract

This proposes a new standard environment key environ['wsgiorg.routing_args'] to represent the results of more complicated URL parsing strategies.

Rationale

WSGI currently specifies the meaning of SCRIPT_NAME and PATH_INFO, which allows generic prefix-based dis- patchers to be created. These dispatchers can work with any WSGI application that respects the meaning of these two variables. The basic meaning of SCRIPT_NAME is the portion of the path that has been consumed and PATH_INFO is the portion of the path left to the application. Using these two variables more complex dispatchers cannot represent the information they pull out of the request path. This specification simply defines a place where such dispatchers can put their information: wsgiorg. routing_args.

Specification

This specification defines a new key that can go in the WSGI environment, wsgiorg.routing_args. This key is optional. If a dispatcher (like routes or selector) pulls named information out of the portion of the request path it parses, it can put that information into environ['wsgiorg.routing_args']. routing_args must be a two-tuple of (positional_args, named_args), where positional_args is a sequence of arguments that were captured positionally, and named_args is a dictionary of the arguments that were given names.

1.9. Specifications related to WSGI 11 www.wsgi.org Documentation, Release 0.9

Not all kinds of dispatchers will produce both positional and named arguments – some may only be capable of pro- ducing one or the other. Similarly, not all consumers will know what to do with both positional and named arguments. Implementors putting together producers and consumers of wsgiorg.routing_args will have to choose combi- nations that work for their combination of pieces. Dispatchers that do not produce one of these items must put in an empty tuple/list or empty dictionary in for the missing item. The values in wsgiorg.routing_args need not be strings (except for the keys of named_args). For instance, a dispatcher is allowed to parse /archive/2005/10/01 into ((), {'date': datetime.date(2005, 10, 1)}). Portions of the path that have been parsed should still be moved to SCRIPT_NAME (and removed from PATH_INFO).

Types

The objects in (positional_args, named_args) are intended to be usable as func(*positional_args, **named_args). Therefore positional_args must be coercable to a tuple, and named_args must be a dictionary with string keys (str or unicode-ASCII). Python does not allow dictionary-like but values for **named_args (except for actual dict objects).

Example

This example is a dispatcher that is given regular expressions and matching applications. It checks each regular ex- pression in turn, and when one matches it moves the named groups into wsgiorg.routing_args and dispatches to the associated application.

class RegexDispatch(object):

def __init__(self, patterns): self.patterns= patterns

def __call__(self, environ, start_response): script_name= environ.get('SCRIPT_NAME','') path_info= environ.get('PATH_INFO','') for regex, application in self.patterns: match= regex.match(path_info) if not match: continue extra_path_info= path_info[match.end():] if extra_path_info and not extra_path_info.startswith('/'): # Not a very good match continue pos_args= match.groups() named_args= match.groupdict() cur_pos, cur_named= environ.get('wsgiorg.routing_args', ((), {})) new_pos= list(cur_pos)+ list(pos_args) new_named= cur_named.copy() new_named.update(named_args) environ['wsgiorg.routing_args']= (new_pos, new_named) environ['SCRIPT_NAME']= script_name+ path_info[:match.end()] environ['PATH_INFO']= extra_path_info return application(environ, start_response) return self.not_found(environ, start_response)

def not_found(self, environ, start_response): start_response('404 Not Found', [('Content-type','text/plain')]) (continues on next page)

12 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

(continued from previous page) return ['Not found'] dispatch_app= RegexDispatch([ (re.compile(r'/archive/(?P\d{4})/$'), archive_app), (re.compile(r'/archive/(?P\d{4})/(?P\d{2})/$'), archive_app), (re.compile(r'/archive/(?P\d{4})/(?P\d{2})/(?P\d+)$'), view_article), ])

1.9.3 Proposed

Waiting for File Descriptor Events

Title Waiting for File Descriptor Events Author Christopher Stawarz Discussions-To Python Web-SIG Status Proposed Created 11-May-2008

Contents

• Waiting for File Descriptor Events – Abstract – Rationale – Specification

* Handling of the Input Stream – Examples – Problems – Other Possibilities – Open Issues

Abstract

This specification defines a set of extensions that allow a WSGI application to suspend its execution until an event occurs on a specified file descriptor.

Rationale

The architecture of asynchronous (aka event driven) servers requires all I/O operations, including both interprocess and network communication, to be non-blocking. For a WSGI-compliant server, this requirement extends to all appli- cations run on the server. However, the WSGI specification does not provide sufficient facilities for an application to

1.9. Specifications related to WSGI 13 www.wsgi.org Documentation, Release 0.9 ensure that its I/O is non-blocking. Specifically, it lacks a mechanism by which an application can suspend its execu- tion until an arbitrary file descriptor (such as one belonging to a socket or pipe opened by the application) is ready for reading or writing. This specification defines a standard interface by which servers can provide such a mechanism to applications.

Specification

This specification introduces three new variables to the WSGI environment: x-wsgiorg.fdevent.readable, x-wsgiorg.fdevent.writable, and x-wsgiorg.fdevent.timeout. The variables x-wsgiorg.fdevent.readable and x-wsgiorg.fdevent.writable are callable objects that accept two positional arguments, one required and one optional. In the following description, these arguments are given the names fd and timeout, but they are not required to have these names, and the application must invoke the callables using positional arguments. The first argument, fd, is either an integer representing a file descriptor or an object with a fileno method that returns such an integer. The set of acceptable file descriptors is defined to be those accepted by select.select. (Note that this set is platform dependent: only sockets are allowed on Windows, whereas sockets, pipes, and files are acceptable on Unix-like systems.) The second, optional argument, timeout, is either None or a floating-point value in seconds. If omitted, it defaults to None. When called, x-wsgiorg.fdevent.readable and x-wsgiorg.fdevent.writable return the empty string (''), which must be yielded by the application iterable to the server. (The result of calling x-wsgiorg. fdevent.readable or x-wsgiorg.fdevent.writable and yielding a non-empty string, or making multi- ple calls to x-wsgiorg.fdevent.readable and/or x-wsgiorg.fdevent.writable before yielding the empty string, is undefined.) The server then suspends execution of the application until one of the following conditions is met: • The specified file descriptor is ready for reading (if the application called x-wsgiorg.fdevent. readable) or writing (if the application called x-wsgiorg.fdevent.writable). • timeout seconds have elapsed without the desired file descriptor becoming readable (if the application called x-wsgiorg.fdevent.readable) or writable (if the application called x-wsgiorg.fdevent. writable), unless the value of timeout is None, in which case the wait will never timeout. • The server detects an error or “exceptional” condition (such as out-of-band data) on the file descriptor. Put another way, if the application calls x-wsgiorg.fdevent.readable and yields the empty string, it will be suspended until select.select([fd],[],[fd],timeout) would return. If the application calls x-wsgiorg.fdevent.writable and yields the empty string, it will be suspended until select. select([],[fd],[fd],timeout) would return. The variable x-wsgiorg.fdevent.timeout is an object whose truth value can be changed by the server. (For example, it could be a list instance, whose truth value is false when empty, true otherwise.) If timeout seconds elapse without the desired file descriptor event occurring, x-wsgiorg.fdevent.timeout will be true when the application resumes; otherwise, it will be false. The truth value of x-wsgiorg.fdevent.timeout when the application is first started or after it yields each response-body string is undefined. The server may use any technique it desires to detect events on an application’s file descriptors. (Most likely, it will add them to the same event loop that it uses for accepting new client connections, receiving requests, and sending responses.)

Handling of the Input Stream

While technically outside the scope of this specification, the application’s input stream (wsgi.input) is another source of potentially blocking I/O that deserves mention.

14 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

The methods provided by the input stream follow the semantics of the corresponding methods of the file class. In particular, each of these methods can invoke the underlying I/O function (in this case, recv on the socket connected to the client) more than once, without giving the application the opportunity to check whether each invocation will block. Although authors of asynchronous servers may be tempted to provide a non-standard input stream that supports on-demand, non-blocking reads, such an input stream would be incompatible with WSGI middleware. In order to avoid these problems, it is strongly recommended that asynchronous servers pre-read the entire request body (to an in-memory buffer or temporary file) before invoking the application, either by default or as a configurable option. Doing so will ensure that the input stream is compatible with middleware and that reads from it will not block waiting for data from the client.

Examples

The following application acts as a proxy to python.org. It uses a pycurl.CurlMulti instance to perform the outgoing HTTP request in a non-blocking fashion. When the CurlMulti.perform() method detects that its next I/O operation would block, it returns control to the application, which then yields until the file descriptor of interest becomes readable or writable as required. If the descriptor is not ready after one second, the application sends a 504 Gateway Timeout response to the client and terminates: def pyorg_proxy(environ, start_response): result= StringIO()

c= pycurl.Curl() c.setopt(pycurl.URL,'http://python.org'+ environ['PATH_INFO']) c.setopt(pycurl.WRITEFUNCTION, result.write)

m= pycurl.CurlMulti() m.add_handle(c)

while True: while True: ret, num_handles=m.perform() if ret != pycurl.E_CALL_MULTI_PERFORM: break if not num_handles: break

read, write, exc=m.fdset() if read: yield environ['x-wsgiorg.fdevent.readable'](read[0], 1.0) else: yield environ['x-wsgiorg.fdevent.writable'](write[0], 1.0)

if environ['x-wsgiorg.fdevent.timeout']: msg='The request to python.org timed out.' start_response('504 Gateway Timeout', [('Content-Type','text/plain'), ('Content-Length', str(len(msg)))]) yield msg return

start_response('200 OK', [('Content-Type','application/octet-stream'), ('Content-Length', str(result.len))]) yield result.getvalue()

The following adapter allows an application that uses the x-wsgiorg.fdevent extensions to run on a server that does not support them, without any modification to the application’s code:

1.9. Specifications related to WSGI 15 www.wsgi.org Documentation, Release 0.9

def with_fdevent(application): def wrapper(environ, start_response): select_args=[ None]

def readable(fd, timeout=None): assert (not select_args[0]) select_args[0]= ([fd], [], [fd], timeout) return ''

def writable(fd, timeout=None): assert (not select_args[0]) select_args[0]= ([], [fd], [fd], timeout) return ''

environ['x-wsgiorg.fdevent.readable']= readable environ['x-wsgiorg.fdevent.writable']= writable

timeout= False

class TimeoutWrapper(object): def __nonzero__(self): return timeout

environ['x-wsgiorg.fdevent.timeout']= TimeoutWrapper()

for result in application(environ, start_response): assert (not (result and select_args[0])) if result or (not select_args[0]): yield result else: ready= select.select( *select_args[0]) timeout= (ready == ([], [], [])) select_args[0]= None

return wrapper

Problems

• The empty string yielded by an application after calling x-wsgiorg.fdevent.readable or x-wsgiorg.fdevent.writable must pass through any intervening middleware and be detected by the server. Although WSGI explicitly requires middleware to relay such strings to the server (see Middleware Handling of Block Boundaries), some components may not, making them incompatible with this specification.

Other Possibilities

• To prevent an application that does blocking I/O from blocking the entire server, an asynchronous server could run each instance of the application in a separate thread. However, since asynchronous servers achieve high levels of concurrency by expressly avoiding multithreading, this technique will almost always be unacceptable. • The greenlet package enables the use of cooperatively-scheduled micro-threads in Python programs, and a WSGI server could potentially use it to pause and resume applications around blocking I/O operations. However, such micro-threading is not part of the Python language or standard library, and some server authors may be unwilling or unable to make use of it.

16 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

Open Issues

• Some third-party libraries (such as PycURL) provide non-blocking interfaces that may need to monitor multiple file descriptors for events simultaneously. Since this specification allows an application to wait on only one file descriptor at a time, application authors may find it difficult or impossible to use such libraries, or they may be limited to a subset of the libraries’ capabilities. Although this specification could be extended to include an interface for waiting on multiple file descriptors, it is unclear whether it would be easy (or even possible) for all servers to implement it. Also, the appropriate behavior for a multi-descriptor wait is not obvious. (Should the application be resumed when a single descriptor is ready? All of them? Some minimum number?)

Authentication for developer-oriented tools

Title Developer Auth Author Ian Bicking Discussions-To Python Web-SIG Status Proposed Created 31-Mar-2008

Contents

• Authentication for developer-oriented tools – Abstract – Rationale – Specification – Example – Problems – Other Possibilities – Open Issues – Implementations

Abstract

Many tools can be written for a WSGI stack which should only accessible to developers. For example, an interactive debugger in response to sessions. Or a template system might display the underlying filenames that created a page. Or profiling data. In some cases there are security implications to exposing this data, in other cases it is harmless but undesirable to show this information to normal users. This specification offers a single, simple way to detect if a user should be presented with this information.

Rationale

So far these tools have been controlled by configuration, e.g., debug = True, or --debug on the command line. This works but can be dangerous, as a deployer or developer can forget to turn off tools. Or, if it is controlled through

1.9. Specifications related to WSGI 17 www.wsgi.org Documentation, Release 0.9

Python code, it can be difficult to enable on a site that wasn’t intended to have the tool on, e.g., if you want to debug a live site because you can’t reproduce a problem in development. Also, configuration doesn’t allow some people to see these development tools while hiding them from other people. A per-request and secure authentication method is more desirable. This could be implemented using application-specific authentication methods and permission levels. This is undesir- able because often debugging is orthogonal to users – you may want to debug a problem only present when a low- permission or anonymous user is visiting the site. Also it is difficult to keep application and debugging permissions coherent, which is probably why this technique is not used by any tools.

Specification

Debugging tools should look for a key x-wsgiorg.developer_user. This will contain some kind of user name. If it is empty or not present, then debugging tools should not activate themselves, or should not expose any information in the browser. The user name can be used in logging, but all users are considered to have the same permission level (total access). The username must be a str, but its contents are not constrained (an IP address, for example, would be acceptable, or a name and email, with an embedded space). If a URL is protected except for developers, applications should simply return 403 Forbidden. Seamless login is not part of this specification or its goals. Some systems may be IP-controlled, for example, and no login is possible.

Example

This is a simple exception catcher that uses the key: import sys, traceback class CatchExceptions(object): def __init__(self, app): self.app= app def __call__(self, environ, start_response): if not environ.get('x-wsgiorg.developer_user'): return self.app(environ, start_response) try: return self.app(environ, start_response) except: start_response('500 Server Error', [('content-type','text/plain')], sys.exc_info()) return [traceback.format_exc()]

Here is a IP-restricted middleware that sets the key: class IPDeveloper(object): def __init__(self, app, ips=('127.0.0.1',)): self.app= app self.ips= ips def __call__(self, environ, start_response): if environ.get('REMOTE_ADDR') in self.ips: environ['x-wsgiorg.developer_user']= environ['REMOTE_ADDR'] return self.app(environ, start_response)

18 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

Problems

• With security by obscurity in mind, it might be best if login methods weren’t clear. With ease of use in mind, easy logins are best. • There’s no levels of access. Everyone is assumed to have complete access. (You could add another custom key if you want to share extra information between the authentication and application layer.) • This encourages people to do production deployments with debugging tools enabled.

Other Possibilities

• Configuration • Conditional middleware composition • Application login systems • Some other generalized authentication system (AuthKit, etc).

Open Issues

• Should 401 Authorization Required be returned? Potentially with WWW-Authenticate: x-wsgiorg.developer_user. This would signal to the middleware that a login should occur, which it may or may not ignore (it could translate that to 403 Forbidden). This would make, for example, HTTP Basic authentication doable (since that authentication is per-request, and so you can’t detect if a user already has logged in). But HTTP Basic would probably be inappropriate for many systems, where a page is filtered by authentication, it isn’t blocked.

Implementations

DevAuth implements the authentication portion of this system. Deliverance and Cabochon both use DevAuth for access to backend logging and controls. DevAuth implements a login form (which uses a cookie) and IP restrictions. This allows developers from selected IP addresses to login. No links are provided to the login form, instead developers must know the location, or it should be documented in applications using DevAuth. Similarly there’s no way for applications to reject a request and suggest a login; when a user accesses something they are not allowed to access the applications simply generate 403 Forbidden. This is unlike user-oriented login forms which helpful; this is distinctly unhelpful.

Techniques to avoid serializing the input or output when stacking middleware

Title Avoiding Serialization When Stacking Middleware Author Ian Bicking Discussions-To Python Web-SIG Status Proposed Created 06-03-2007

1.9. Specifications related to WSGI 19 www.wsgi.org Documentation, Release 0.9

Contents

• Techniques to avoid serializing the input or output when stacking middleware – Abstract – Rationale – Specification – Example – Problems – Other Possibilities – Open Issues

Abstract

This proposal gives a strategy for avoiding unnecessary serialization and deserialization of request and response bod- ies. It does so by attaching attributes to wsgi.input and the app_iter, as well as a new environment key x-wsgiorg.want_parsed_response.

Rationale

Output-transforming middleware often has to parse the upstream content, transform it, then serialize it back to a string for output. The original output may have already been in the parsed form that the middleware wanted. Or there may be more middleware that does similar transformations on the same kind of objects. The same things apply to the parsing of wsgi.input, specifically parsing form data. A similar strategy is presented to avoid unnecessarily reparsing that data.

Specification

WSGI applications (or middleware) can return an app_iter that not only serializes the output, but also has extra at- tributes. An attribute is given here, app_iter.x_wsgiorg_parsed_response which is a function/method that takes one argument, the “type” of object that you want to receive. It may return that type of object, or None (meaning it cannot produce that type of object). Consumers should fall back on normal parsing of the response if the method does not exist, or returns None. Similarly the wsgi.input object may have the same method, with the same meaning. WSGI applications that want to lazily serialize their output have a problem: they probably cannot cal- culate Content-Length without doing the actual serialization. Browsers typically want to know about Content-Length, but WSGI middleware seldom cares, since it just can get the content from app_iter re- gardless of its length. WSGI middleware that will transform the output can set environ['x-wsgiorg. want_parsed_response'] = True to give this hint to the application. Applications are thus encouraged to only lazily serialize their output when that key is present and true. (There is no equivalent concept for wsgi.input.) The object returned by x_wsgiorg_parsed_response() may be modified in-place by the WSGI middleware using that object. Producers should make a copy if they do not want consumers modifying the object.

20 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

Example

Two examples are provided: one for output, and one for input. The output transformation parses the page with lxml.etree.HTML (from the lxml library) and replaces all tags with tags. First we show the middleware: import lxml.etree class EmTagMiddleware(object): def __init__(self, app): self.app= app def __call__(self, environ, start_response): parent_wants_parsed= environ.get('x-wsgiorg.want_parsed_response') environ['x-wsgiorg.want_parsed_response']= True written_output=[] captured_headers=[] def repl_start_response(status, headers, exc_info=None): if exc_info: raise exc_info[0], exc_info[1], exc_info[2] captured_headers[:]= [status, headers] return written_output.append app_iter= self.app(environ, repl_start_response) parsed= None if captured_headers and not written_output: method= getattr(app_iter,'x_wsgiorg_parsed_response', None) if method: parsed= method(lxml.etree._Element) if parsed is None: # Have to manually parse, because: # a) start_response was called lazily # b) the start_response writer was used # c) app_iter.x_wsgiorg_parsed_response didn't exist # d) that method returned None try: for item in app_iter: written_output.append(item) finally: if hasattr(app_iter,'close'): app_iter.close() parsed= self.parse_body(''.join(written_output)) status, headers= captured_headers new_body= self.transform_body(parsed) for i in range(len(headers)): if headers[i][0].lower() =='content-length': del headers[i] break if parent_wants_parsed: new_app_iter= self.make_app_iter(new_body) else: serialized_body= serialize(new_body) headers.append(('Content-Length', str(len(serialized_body)))) new_app_iter= [serialized_body] return new_app_iter

def parse_body(self, body): return lxml.etree.HTML(body)

(continues on next page)

1.9. Specifications related to WSGI 21 www.wsgi.org Documentation, Release 0.9

(continued from previous page) def transform_body(self, root): for el in root.xpath('//i'): el.tag='em' return root

def make_app_iter(self, body): return LazyLXML(body) def serialize(element): return lxml.etree.tostring(element) class LazyLXML(object): def __init__(self, body): self.body= body self.have_yielded= False def __iter__(self): return self def next(self): if self.have_yielded: raise StopIteration self.have_yielded= True return serialize(self.body) def x_wsgiorg_parsed_response(self, type): if type is lxml.etree._Element: return self.body return None

Here’s a simpler example for parsing normal form inputs in wsgi.input: import cgi import urllib from cStringIO import StringIO def parse_form(environ): content_type= environ.get('CONTENT_TYPE','') assert content_type in ['application/x-www-form-urlencoded','multipart/form-data

˓→'] wsgi_input= environ['wsgi.input'] method= getattr(wsgi_input,'x_wsgiorg_parsed_response', None) if method: parsed= method(cgi.FieldStorage) if parsed is not None: return parsed form= cgi.FieldStorage(fp=wsgi_input, environ=environ, keep_blank_values= True) environ['wsgi.input']= FakeFormInput(form) return form class FakeFormInput(object): def __init__(self, form): self.form= form self.serialized= None def x_wsgiorg_parsed_response(self, type): if type is cgi.FieldStorage: return self.form return None def read(self): if self.serialized is None: (continues on next page)

22 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

(continued from previous page) self._serialize() return self.serialized.read() def readline(self, *args): if self.serialized is None: self._serialize() return self.serialized.readline(*args) def readlines(self, *args): if self.serialized is None: self._serialize() return self.serialized.readlines(*args) def __iter__(self): if self.serialized is None: self._serialize() return iter(self.serialized) def _serialize(self): # XXX: Doesn't deal with file uploads, and multipart/form-data generally data= urllib.urlencode(self.form.list, True) self.serialized= StringIO(data)

Problems

Obviously the code is not simple, but this is the nature of WSGI output-transforming middleware. Ideally a framework of some sort would be used to construct this kind of middleware. Something that replaces wsgi.input (like the example) may change the CONTENT_LENGTH of the request; nor- malization alone may change the length, even if the data is the same (e.g., there are multiple ways to urlencode a string). However, there’s no way without actually serializing to determine the proper length. Ideally requests like this should allow simply reading to the end of the object, without needing a CONTENT_LENGTH restriction (this is not true for socket objects). Ideally something like CONTENT_LENGTH="-1" would indicate this situation (simply a missing CONTENT_LENGTH generally means 0). Another option is to set it to 1 and simply return the entire serialized response all at once. cgi.FieldStorage actually protects against this. Or set it to a very very large value, and allow reading past the end (returning ""). This is likely to work with most consumers. I’m not sure what effect -1 will have on different code.

Other Possibilities

• You could simply parse everything ever time. • You could pass data through callbacks in the environment (but this can break non-aware middleware). • You can make custom methods and keys for each case. • You can use something other than WSGI. I think this specification offers advantages over all these options.

Open Issues

Should “type” be the class object? A string describing the type? Things like lxml.etree._Element are a little unclean, since the actual class isn’t a public object (only the factory function lxml.etree.Element()). Also, there are occasionally times when multiple classes implement the same interface. The boolean x-wsgiorg.want_parsed_response doesn’t really give any idea of what kind of object you want. This is actually something of a problem, because sometimes it’s impossible to give that kind of object. For instance,

1.9. Specifications related to WSGI 23 www.wsgi.org Documentation, Release 0.9

if you want to transform images you might want the PIL object for the image. But if the response is HTML there’s no way to give this type. Similarly if you are transforming HTML then images don’t mean anything to you, and you probably do want them to come out as normal. And potentially both a image transformer and an HTML transformer are in the stack. Should that key actually hold a list of types that are of interest? x_wsgiorg_parsed_response() isn’t a very good name for the method on wsgi.input, as it’s not a re- sponse.

A very basic description of authentication opportunities in WSGI

Title Simple Authentication Author Ian Bicking Discussions-To Python Web-SIG Status Proposed Created 13-Nov-2006

Contents

• A very basic description of authentication opportunities in WSGI – Abstract – Rationale – Specification – Example – Problems – Other Possibilities – Open Issues

Abstract

This describes a simple pattern for implementing authentication in WSGI middleware. This does not propose any new features or environment keys; it only describes a baseline recommended practice.

Rationale

Authentication is probably the most common detail that should be abstracted away from an application, as it is a concern most often bound to a deployment.

Specification

There are two components to authentication: 1. Indicating when a request is authenticated, and by who 2. Responding that authentication is necessary

24 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

There are already two conventions for this: 1. Put the username in REMOTE_USER 2. Respond with 401 Unauthorized

Note: Please do not confused 401 Unauthorized with “permission denied”. Permission denied should be indicated with 403 Forbidden.

REMOTE_USER: This should be the string username of the user, nothing more. 401 Unauthorized: Because middleware is handling the authentication, additional information is not required. You do not (and should not) include a WWW-Authenticate header. The middleware may include that header, or may change the response in some other way to handle the login.

Example

The first example implements simple HTTP Basic authentication: class HTTPBasic(object):

def __init__(self, app, user_database, realm='Website'): self.app= app self.user_database= user_database self.realm= realm

def __call__(self, environ, start_response): def repl_start_response(status, headers, exc_info=None): if status.startswith('401'): remove_header(headers,'WWW-Authenticate') headers.append(('WWW-Authenticate','Basic realm=" %s"'% self.realm)) return start_response(status, headers) auth= environ.get('HTTP_AUTHORIZATION') if auth: scheme, data= auth.split( None,1) assert scheme.lower() =='basic' username, password= data.decode('base64').split(':',1) if self.user_database.get(username) != password: return self.bad_auth(environ, start_response) environ['REMOTE_USER']= username del environ['HTTP_AUTHORIZATION'] return self.app(environ, repl_start_response)

def bad_auth(self, environ, start_response): body='Please authenticate' headers=[ ('content-type','text/plain'), ('content-length', str(len(body))), ('WWW-Authenticate','Basic realm=" %s"'% self.realm)] start_response('401 Unauthorized', headers) return [body] def remove_header(headers, name): for header in headers: if header[0].lower() == name.lower(): (continues on next page)

1.9. Specifications related to WSGI 25 www.wsgi.org Documentation, Release 0.9

(continued from previous page) headers.remove(header) break

Problems

• Strictly speaking, it is illegal to send a 401 Unauthorized response without the WWW-Authenticate header. If no middleware is installed, most browsers will treat it like a 200 OK. There is also no way to detect if an appropriate middleware is installed. • This doesn’t give any other information about the user. That information can go in other keys, but that is not addressed in this specification currently. • Some login methods will redirect the user, and any POST request data will possibly be lost. (Note that a specification like A specification for how to process POST form requests helps address this problem.)

Other Possibilities

• While you can add to this specification, I think it’s the most logical and useful way to do authentication and better efforts can build on this base.

Open Issues

See Problems.

How to disable error catching through the environment

Title x-wsgiorg.throw_errors Author Ian Bicking Discussions-To Python Web-SIG Status Proposed Created 13 Nov 2006

Contents

• How to disable error catching through the environment – Abstract – Rationale – Specification – Example – Problems – Other Possibilities – Open Issues – Implementations

26 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

Abstract

WSGI applications are generally not supposed to raise exceptions, instead handling their own errors (possibly returning a 500 Server Error response). But in some context it is desired that unexpected exceptions be allowed to bubble up. This specification defines a key to set in this circumstance.

Rationale

When in a testing context it is undesirable for an application to handle its own errors. Typically the test framework is better at handling the errors, either through error formatting or by dropping into a debugger like pdb. Additionally when an exception catcher is installed in a stack, ideally it will be used for all exceptions. This allows for centralized configuration (for example, when emails are sent when errors occur). Dynamically disabling any other exception catchers is often ideal in this situation.

Specification

An exception catcher should check for x-wsgiorg.throw_errors. If it is true, it should not try to catch ex- ceptions. This need only be checked as the application is being entered, it should not be checked later. Applications should not try to set this to effect middleware that wraps them, only to effect applications they may call.

Example

A simple exception catcher:

class ExceptionCatch(object): def __init__(self, app): self.app= app def __call__(self, environ, start_response): if environ.get('x-wsgiorg.throw_errors'): return self.app(environ, start_response) try: return self.app(environ, start_response) except: import sys, traceback, StringIO exc_info= sys.exc_info() start_response('500 Server Error', [('content-type','text/plain')], exc_info=exc_info) out= StringIO.StringIO() traceback.print_exc(file=out) return [out.getvalue()]

Problems

• In theory an application may know better how to format an error response than the middleware exception catcher. Of course, an application can ignore x-wsgiorg.throw_errors if it thinks it is best (or if it has been explicitly configured to do so).

1.9. Specifications related to WSGI 27 www.wsgi.org Documentation, Release 0.9

Other Possibilities

• You can just get the unwrapped application object and test it.

Open Issues

• None I know of

Implementations

WebTest sets a key (paste.throw_errors) during debugging, which allows it to do functional testing of ap- plications that have the paste.exceptions middleware applied to them (that middleware looks for the key and disables itself per-request when it sees it). Zope 2 has its own flag on the (non-WSGI) request to do this, showing substantial history for this technique. Zope 3 uses something like wsgi.handleErrors in the WSGI environ to the same effect (it shouldn’t be using wsgi., but it does).

1.9.4 Withdrawn

Unicode Support for WSGI

Title WSGI Unicode Handling Author Armin Ronacher Status Rejected Created 1-Nov-2006

Contents

• Unicode Support for WSGI – Rejected – Abstract – Motivation – Specification – Problem – Implementation

Rejected

This proposal is rejected mainly because of those reasons: • It’s easy enough for applications to do that on their own • Many applications don’t use unicode objects • there should be an easier and more flexible way for that issue

28 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

From Ian Bicking: I’ll add some commentary here, since I was the primary critic (of the limited audience before Armin with- drew this specification). Leaving this proposal here hopefully will be useful to later people considering this problem. Changing the response app_iter is pretty heavy, and isn’t really an extension to WSGI, it’s a change to the core specification. Current WSGI implementors really expect str responses. When str goes away in Python 3000, they will have to expect bytes responses too, but that’s a relatively straight-forward (though not trivial) change. Dealing with backward compatibility is quite difficult. The use cases I personally see in this is avoiding the confusion and overhead of encoding and decoding responses when there are intermediaries which handle the response in its unicode form. This is not uncommon – for instance, XML processing happens on unicode data, and ideally all text responses should be handled as unicode. Deciding the encoding, and then doing the proper decoding, is not completely trivial (though not terribly hard). It is hard enough that people will and have avoided it, potentially working with str data when that was not correct. Similarly, it is important to send either properly- encoded data, or to change the encoding in the headers. Since encoding information can show up in multiple places (unfortunately) this can also be error-prone. Despite these problems, sending unencoded data opens up a whole bunch of other problems, and realisti- cally we get the union of all problems because we definitely cannot remove the sending of encoding text data. So everyone has to deal with both cases now, instead of just one case. Anyway, that’s my take on this. – Ian

Abstract

This specification proposes a possible implementation of unicode support in WSGI. Current all WSGI application have to output str objects instead.

Motivation

Python ships two types of strings subclassing the abstract base class basestring. str and unicode. In Python 3 unicode will replace str and a new class bytes will be introduced (PEP 3100#atomic-types, PEP 3137). Also today many developers use unicode objects because support a wider range of characters and functions like len() still return the correct output, even when using multibyte encodings like utf-8. But at the moment all WSGI applications have to yield str objects which require that uses encoder their data to a special encoding by hand. WSGI middlewares don’t know about the charset the application is using etc.

Specification

A possible solution would be a new key in the environ called wsgi.charset. The WSGI gateway would set this to None per default which means that yielding of unicode objects results in an exception. But if the charset is correctly defined all returned unicode objects get encoded in the defined encoding by the WSGI gateway. Middlewares could use this value too convert incomming form data to unicode automatically so that the application developer doesn’t have to take care about this issue.

1.9. Specifications related to WSGI 29 www.wsgi.org Documentation, Release 0.9

Problem

If this environment key is updated by the application middlewares would still see None as charset because it’s updated on first iteration only. So an application developer would need to wrap the whole application including middlewares afterwards again with a new middleware that updates this key. Another possibility would be that the WSGI gateway provides a configuration value for the charset. If encoding the output of the wsgi application the gateway must also get the wsgi.charset key each time a unicode object is found. Caching won’t work because the application must be able to change the charset before each iteration: def app(environ, start_response): start_response('200 OK', [('Content-Type','text/plain')]) environ['wsgi.charset']='utf-8' yield u'Hällo Wörld' environ['wsgi.charset']='iso-8895-15' yield u'Hällo Wörld'

Implementation

Here a very simple CGI gateway that implements this functionality: import os import sys def run_with_cgi(app, charset=None): environ= dict(os.environ.items()) environ['wsgi.charset']= charset environ['wsgi.input']= sys.stdin environ['wsgi.errors']= sys.stderr environ['wsgi.version']=(1,0) environ['wsgi.multithread']= False environ['wsgi.multiprocess']= True environ['wsgi.run_once']= True

if environ.get('HTTPS','off').lower() in ('on','1'): environ['wsgi.url_scheme']='https' else: environ['wsgi.url_scheme']='http'

headers_set=[] headers_sent=[]

def write(data): if not headers_set: raise AssertionError('write() before start_response()') elif not headers_sent: status, response_headers= headers_sent[:]= headers_set sys.stdout.write('Status: %s\r\n'% status) for header in response_headers: sys.stdout.write('%s: %s\r\n'% header) sys.stdout.write('\r\n') if isinstance(data, unicode): charset= environ['wsgi.charset'] if charset is None: raise AssertionError('application returned unicode without' 'defined charset') (continues on next page)

30 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

(continued from previous page) data= data.encode(charset) sys.stdout.write(data) sys.stdout.flush()

def start_response(status,response_headers,exc_info=None): if exc_info: try: if headers_sent: raise exc_info[0], exc_info[1], exc_info[2] finally: exc_info= None elif headers_set: raise AssertionError('Headers already set!') headers_set[:]= [status,response_headers] return write

result= app(environ, start_response) try: for data in result: if data: write(data) if not headers_sent: write('') finally: if hasattr(result,'close'): result.close()

A specification for how to process POST form requests

Title Handling POST forms in WSGI Author Ian Bicking Discussions-To Python Web-SIG Status Withdrawn Created 21-Oct-2006

Contents

• A specification for how to process POST form requests – Abstract – Reason for Withdrawl – Rationale – Specification – Query String data – Middleware – Problems – Other Possibilities

1.9. Specifications related to WSGI 31 www.wsgi.org Documentation, Release 0.9

– Open Issues

Abstract

This suggests a way that WSGI middleware, applications, and frameworks can access POST form bodies so that there is less contention for the wsgi.input stream.

Reason for Withdrawl

I decided that there were opportunities to decorate the wsgi.input stream itself, and have been pursing them in WSGIRemote. I may describe that strategy in a specification later.

Rationale

Currently environ['wsgi.input'] points to a stream that represents the body of the HTTP request. Once this stream has been read, it cannot necessarily be read again. It may not have a seek method (none is required by the WSGI specification, and frequently none is provided by WSGI servers). As a result any piece of a system that looks at the request body essentially takes ownership of that body, and no one else is able to access it. This is particularly problematic for POST form requests, as many framework pieces expect to have access to this. One notable case is when a request “enters” a traditional web framework which parses the POST form, then “exits” back to WSGI through some framework-specific WSGI gateway. The specification covers library code that multiple frameworks can implement. This is not functionality that is intended to be added to a WSGI “stack”.

Specification

This applies when certain requirements of the WSGI environment are met:

def is_post_request(environ): if environ['REQUEST_METHOD'].upper() !='POST': return False content_type= environ.get('CONTENT_TYPE','application/x-www-form-urlencoded') return (content_type.startswith('application/x-www-form-urlencoded' or content_type.startswith('multipart/form-data'))

That is, it must be a POST request, and it must be a form request (generally application/ x-www-form-urlencoded or when there are file uploads multipart/form-data). When this happens, the form can be parsed by cgi.FieldStorage. The results of this parsing is put in wsgi. post_form as (new_wsgi_input, old_wsgi_input, FieldStorage_object). The new_wsgi_input can be used to check if an intermediary has replaced the input since wsgi.post_form was calculated. If the input has been changed, the wsgi.post_form data should be discarded. The old_wsgi_input can be used if you want to get access to the original input stream (which may be seekable, and so still useful). The replacement wsgi.input guards against routines that access the data but don’t conform to this specification. Ideally the replacement will act like the original wsgi.input (producing the same data), but if not it should raise an exception. The input should not block or produce inaccurate data.

32 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

def get_post_form(environ): assert is_post_request(environ) input= environ['wsgi.input'] post_form= environ.get('wsgi.post_form') if (post_form is not None and post_form[0] is input): return post_form[2] # This must be done to avoid a bug in cgi.FieldStorage environ.setdefault('QUERY_STRING','') fs= cgi.FieldStorage(fp=input, environ=environ, keep_blank_values=1) new_input= InputProcessed('') post_form= (new_input, input, fs) environ['wsgi.post_form']= post_form environ['wsgi.input']= new_input return fs

class InputProcessed(object): def read(self, *args): raise EOFError('The wsgi.input stream has already been consumed') readline= readlines= __iter__= read

By using this routing multiple consumers can parse a POST form, accessing the form data in any order (later consumers will get the already-parsed data).

Query String data

Note that nothing in this specification touches or applies to the query string (in environ['QUERY_STRING']). This is not parsed as part of the process, and nothing in this specification applies to GET requests, or to the query string which may be present in a POST request.

Middleware

While this proposal makes it more feasible for middleware to access POST form data, it should not be read as encour- aging middleware to do so. In particular, no consumer should ever expect that wsgi.post_form is in the request environment. Also, no intermediary should parse the POST form data unless it actually is interested in that data – access should be deferred until there is a real need for the POST data.

Problems

• This specification only works for parsing with cgi.FieldStorage. This is not the only parser possible, though it is the only parser in common usage. • The API for cgi.FieldStorage is not particularly well defined, so creating compatible parsers is difficult. • cgi.FieldStorage doesn’t have any unicode handling (it has to be done higher up). • Ideally middleware should just not access “envvar:wsgi.input; people can (and have) read this specification as encouraging middleware to do this parsing. • In an ideal world wsgi.input would stick around, either as a temporary file or as a file that was a lazy serialization of the parsed data.

1.9. Specifications related to WSGI 33 www.wsgi.org Documentation, Release 0.9

Other Possibilities

• One of the simplest possibilities is to add this information to environ['wsgi.input'] itself as a separate attribute. E.g.:

fs= getattr(environ['wsgi.input'],'cgi_FieldStorage', None) if fs is None: # parse and replace wsgi.input...

There’s a certain elegance to keeping wsgi.input self-describing and movable.

Open Issues

1. This doesn’t address non-form-submission POST requests. Most of the same issues apply to such requests, except that frameworks tend not to touch the request body in that case. The body may be large, so the actual contents of the request body shouldn’t go in the environment. Perhaps they could go in a temporary file, but this too might be an unnecessary indirection in many cases. Also other kinds of request (like PUT) that have a request body are not covered, for largely the same reason. In both these cases, it is much easier to construct a new wsgi.input that accesses whatever your internal representation of the request body is. 2. Is the tuple of information necessary in wsgi.post_form, or could it just be the FieldStorage in- stance? Should all the information go in wsgi.input directly? 3. Should wsgi.input be replaced by InputProcessed, or just left as is? Or should we look for code that serializes FieldStorage objects back to parseable strings? 4. Does QUERY_STRING actually have to be set for cgi not to mess up, or is that just an issue with GET requests?

1.9.5 Wanted

These don’t exist yet, but they could. Write one? • A standard place to put HTTP proxy scheme and host information (e.g., when a server acts as an HTTP proxy the request looks like GET http://hostname.org/path ..., and we don’t have a place to keep http:/ /hostname.org). • Ben Bangert suggested a simple session standard, focused solely on the session ID (persistence handled else- where). This is fairly modest but still useful. This was in an email: http://mail.python.org/pipermail/web-sig/ 2006-January/001858.html • Maybe a full session interface built on the session ID standard. This is an API proposed earlier: http://svn. colorstudy.com/home/ianb/proposed_session_interface.py • Often debugging tools open security holes (for example, paste.evalexception gives you a Python prompt on every exception). Authentication isn’t really the right way to handle it, because debugging might involve logging in as various users. A specification could just define a key that indicates when these debugging tools should be allowed. This might get set by configuration, IP address, a cookie, etc. • Debugging mode is something that can be used in all sorts of places; to increase verbosity, annotate output pages, displaying errors in the browser, etc. Having a single key for turning on debugging mode would allow its consumption in lots of places. Not as strict as authenticating. • Some systems prefer that unexpected exceptions bubble up, like test frameworks. A key could define this case (modelled on paste.throw_errors) and thus disable exception catchers. • Logging is a tricky situation. The logging module allows for statically setting up logging systems, then configuring them at startup. This often isn’t the best way to set up logging. Putting a logging.Logger

34 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

instance right in the environment might be better. This requires some design and usage before setting on one spec. • Request object wrapping the environment. • Thread-local values are a common technique in web frameworks, allowing global objects or functions to return request-specific information. This pattern could be codified into one core system, using some feedback from existing systems (which have their advantages and flaws). • Configuration takes fairly common forms, usually a dict of some sort. It could be put somewhere standard. • Maybe Paste Deploy’s entry points could be standardized. (Paste Deploy itself only consumes those entry points; other consumers are possible and packages implementing those entry points don’t introduce any dependency on Paste Deploy) • A way to extend wsgiref.validate to add more validation, for all these new specs. (Probably this is an implementation, not a spec) • A way to describe custom keys, maybe associated with the validation. • Anchors for doing recursive calls, similar to paste.recursive. (it’s kind of an old module that is more complicated than it needs to be) • A place to put a database transaction manager • More user-based information than just REMOTE_USER; like wsgiorg.user_info? The basics of this are described in A very basic description of authentication opportunities in WSGI, but it doesn’t cover anything advanced. These can be written based on specifications/specification_template.

1.10 Amendments to WSGI 1.0

This page is intended to collect any ideas related to amendments to the original WSGI 1.0 so that it can be marked as ‘Final’. The purpose of the amendments is to address any mistakes or ambiguities in the 1.0 specification or to change any requirements that in practice could not be implemented for one reason or another. The amendments would also address any differences in how the 1.0 specification should be interpreted for Python 3. See Python 3 for details. Note that this isn’t about changing the 1.0 specification drastically in any way, that is what Proposals related to WSGI 2.0 specification will be about. You should though not construe anything in here as an indication that said change will be made. This is especially the case with Python 3 support as there is a measure of disagreement as to how WSGI should work for Python 3. In other words, you would be unwise to implement any WSGI application or WSGI adapter with information in here as a basis as it could change or simply never be adopted. The page has been created in response to a discussion on the Python WEB-SIG. In addition, Graham Dumpleton gives details and clarifications on WSGI 1.0 amendments on his blog.

1.10.1 readline(size)

Currently the specification does not require servers to provide environ['wsgi.input'].readline(size) (the size argument in particular). But cgi.FieldStorage calls readline this way, so in effect it is required.

1.10. Amendments to WSGI 1.0 35 www.wsgi.org Documentation, Release 0.9

1.10.2 Python 3

Python 3 default string type is now unicode and existing python2 strings correspond to bytes. This changes how terms need to be interpreted. From WSGI, Python 3 and Unicode, the following suggested amendments were proposed for Python 3. • When running under Python 3, applications SHOULD produce bytes output, status line and headers • When running under Python 3, servers and gateways MUST accept strings as application output, status line or headers, under the existing rules (i.e., s.encode('latin-1') must convert the string to bytes without an exception) • When running under Python 3, servers MUST provide CGI HTTP variables and as strings, decoded from the headers using HTTP standard encodings (i.e. latin-1 + RFC 2047) (Open question: are there any CGI or WSGI variables that should NOT be strings?) • When running under Python 3, servers MUST make wsgi.input a binary (byte) stream • When running under Python 3, servers MUST provide a text stream for wsgi.errors See the mailing list archive for the full discussion of issues. Note that this doesn’t address any clarifications that may be required around wsgi.file_wrapper optional exten- sion. Note that current thinking is that the WSGI adaptor should not worry about RFC 2047.

1.10.3 Errata 1

In the “Specification Details” chapter there is this note:

Note: the application must invoke the start_response() callable before the iterable yields its first body string, so that the server can send the headers before any body content. However, this invocation may be performed by the iterable’s first iteration, so servers must not assume that start_response() has been called before they begin iterating over the iterable.) What’s wrong is that the invocation of start_response may be performed at any iteration of the iterable, as long as the application yields empty strings.

See http://mail.python.org/pipermail/web-sig/2007-December/003064.html for more info. • I don’t really think that this is a good assumption to make. I could see how some implementations could allow for this, but strictly speaking, I wouldn’t assume that most implementations would do that. Besides that, what purpose does yielding an empty string serve? For those reasons, I think this is better of left as an undefined behavior. –JasonBaker July 1, 2008

1.10.4 When HTTP response headers can be sent

The WSGI spec explicitly states that HTTP response headers must be sent when the application yields the first non empty strings. However if a WSGI implementation is allowed to send headers early (not when start_response is called, but when the first string is yielded by the WSGI application, even if empty), then in case of an HEAD request no content generation is required (assuming, of course, that the WSGI application returns a generator). See http://mail.python.org/pipermail/web-sig/2007-October/002881.html, http://mail.python.org/pipermail/web-sig/ 2007-October/002799.html, http://mail.python.org/pipermail/web-sig/2007-October/002803.html and http://mail. python.org/pipermail/web-sig/2007-October/002879.html

36 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

That thread is a bit confused.

1.10.5 start_response and error checks

The WSGI spec says that start_response callable must not actually transmit the response headers. Instead, it must store them. The problem is that it says nothing about errors checking. See http://mail.python.org/pipermail/web-sig/2007-September/002771.html

1.10.6 Clarification about start_response

What happens if an application calls start_response with an incorrect status line or headers? Should an implementation consider the function called, so that an application can call it a second time, without the exc_info parameter? See http://mail.python.org/pipermail/web-sig/2007-October/002887.html

1.10.7 Specify the type of SERVER_PORT

Some implementations currently expect it to be an integer, some a string. Can we please specify one or the other or either? The “URL reconstruction” code snippet in PEP 333 presumes it’s a string, the reference to the (defunct) CGI spec would seem to imply it should be a string, but it should be explicit.

1.11 Proposals related to WSGI 2.0

This page is intended to collect any ideas related to WSGI 2.0. In particular, any proposed changes to the specification.

Note: What is described here should not be considered a DRAFT for WSGI 2.0. It is only a list of ideas or issues that need to be considered if there ever is enough momentum towards producing an updated WSGI specification. It is quite possible that there may never be an updated specification which embodies the ideas described here. Thus, if you implement any web application interfaces based on the API described here, call it something else, do not call it WSGI 2.0 as no such thing exists.

1.11.1 start_response and write

We could remove start_response and the writer that it implies. This would lead to a signature like:

def app(environ): return '200 OK', [('Content-type','text/plain')], ['Hello world']

That is, return a three-tuple of (status, headers, app_iter). It’s relatively simple to provide adapters to and from this signature to the WSGI 1.0 signature.

1.11. Proposals related to WSGI 2.0 37 www.wsgi.org Documentation, Release 0.9

1.11.2 Making some keys required

Several keys are optional in WSGI, but required in CGI, in particular SCRIPT_NAME, PATH_INFO and QUERY_STRING. Also REMOTE_ADDR and SERVER_SOFTWARE are supposed to exist, even if empty. All these keys could become required in WSGI.

1.11.3 Unknown-length wsgi.input

There’s no documented way to indicate that there is content in wsgi.input, but the content length is unknown. A value of -1 may work in many situations. A missing CONTENT_LENGTH doesn’t generally work currently (it’s assumed to mean 0 by much code). This is an issue because chunked transfer encoding on request content can’t be supported properly unless there is a way to indicate that there is data with unknown content length. Also an issue with a web server or WSGI middleware component that mutates the input stream (eg. decompression), where it will not know the new content length in advance of mutating the data stream. Any change in this area also needs to take into consideration the current link between CGI and WSGI specifications and whether the CGI requirement to not read more input data than defined by CONTENT_LENGTH and that returning an EOF indicator is optional is really appropriate for WSGI. For more information see thread: http://mail.python.org/pipermail/web-sig/2007-March/002630.html

1.11.4 readline(size)

Currently the specification does not require servers to provide environ['wsgi.input'].readline(size) (the size argument in particular). But cgi.FieldStorage calls readline this way, so in effect it is required.

1.11.5 app_iter and threads

It’s not clear if the app_iter must be used in the same thread as the application. Since the application is blocking, presumably it must be run all in one thread. This should be more explicitly documented.

1.11.6 long response headers

Noted here: http://mail.python.org/pipermail/web-sig/2006-September/002244.html

1.11.7 request trailers and chunked transfer encoding

When using chunked transfer encoding on request content, the RFCs allow there to be request trailers. These are like request headers but come after the final null data chunk. These trailers are only available when the chunked data stream is finite length and when it has all been read in, thus not available at time that start application is called.

1.11.8 Decoding SCRIPT_NAME/PATH_INFO

Because SCRIPT_NAME and PATH_INFO are decoded in WSGI, there’s no way to distinguish %2F from /

38 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

1.11.9 No encoding horrors any more

Analysis see there: http://www.mail-archive.com/[email protected]/msg02483.html Can we have that horror removed for wsgi2 apps, please? A quite easy approach would be to have a set of RAW_* env vars (e.g. RAW_PATH_INFO) that has /Foo%XXBar%YY content (is not decoded, plain ascii like in the http protocol). That also would solve issues with ? and / (see section above) that are encoded as %XX (and NOT meant as query / path component separator). Any wsgi1 app can continue to use the wsgi1 env vars, any wsgi2 app can check whether the wsgi2 RAW_* env vars are there and use them (or fall back to using the wsgi1 env vars).

1.12 Python 3

PEP 3333 aims to resolve issues with Python 3 and WSGI. This page is intended to collect ideas and proposals about WSGI amendments for Python 3. See also Amendments to WSGI 1.0

1.12.1 Presentation, at DjangoCon 2010 (by Armin Ronacher)

• Slides: slideshare/ scribd • Video on blip.tv • Blog post commentary by Armin • Reinout van Rees’s reaction and commentary

1.12.2 Latest discussions

• Main discussions occur on the WEB-SIG mailing list • ‘WSGI on Python 3’ thread • Graham Dupleton’s 2009 A roadmap for the Python WSGI specification which describe all the proposals exten- sively (author: Graham

1.12.3 Proposals

There’s lots of discussions about the type of data (bytes versus unicode) in various places of the specification. The actual competitors are: mod_wsgi [Ochtman2010] all unicode [Ronacher2009] web3 [McDonough2010] flat optimized for ease of validation and low cognitive overhead (inputs are native except for the byte stream, all outputs are bytes)

1.12. Python 3 39 www.wsgi.org Documentation, Release 0.9

Here is a summary table which outlines the bytes/unicode differences between these proposals.

WSGI mod_wsgi Unicode web3 flat 1.0 environ keys bytes native CGI values bytes native unicode bytes native (PEP 383) bytes native unicode (utf- bytes native (PEP SCRIPT_NAME, PATH_INFO, 8) 383) QUERY_STRING bytes native unicode bytes native wsgi.url_scheme bytes wsgi.input status line bytes bytes (or na- unicode (or bytes bytes tive) bytes) headers bytes bytes (or na- unicode or bytes bytes tive) bytes response iterable bytes bytes (or na- bytes bytes bytes tive) write() callback bytes bytes (or na- (deprecated) (re- (removed) tive) moved)

Notes: •a native string is the primary string type for a particular Python implementation: – for Python 2.x this is a byte string, – for Python 3.x this is a Unicode string • unless otherwise stated, all unicode strings are decoded using ISO-8859-1 • when SCRIPT_NAME and PATH_INFO are ‘native’ or ‘unicode’, the environment should contain 2 additional values wsgi.script_name and wsgi.path_info which contain raw-bytes values. (Except in the flat proposal, which assumes CGI variables are decoded as utf-8 using PEP 383 surrogateescape encoding, and that the raw bytes can thus be retrieved by re-encoding.) • details about the mod_wsgi proposal: – it is already implemented in mod_wsgi 3.0 – almost entirely compatible with current WSGI 1.0 for Python 2 – it runs the WSGI 1.0 ‘Hello World!’ unchanged • details about the all unicode proposal: – the SCRIPT_NAME and PATH_INFO will be decoded as UTF-8. If it fails, they are decoded as ISO-8859-1. The name of the successful codec is stored in wsgi.uri_encoding. – the REQUEST_URI variable is optional and stores the full URI as requested by the client. • details about the web3 proposal: – this proposal does not try to be compatible with WSGI 1.0. It targets Python 2.6+ and Python 3.1+. – all wsgi.* variables are intentionally renamed web3.* in the document.

40 Chapter 1. Contents www.wsgi.org Documentation, Release 0.9

1.12.4 Draft implementations

• mod_wsgi 3.0+: see the page about Python 3 support • CherryPy 3.2: see details about CherryPy’s Python 3 WSGI implementation • Experimental WSGI servers for Python 3

1.13 Definitions of keys and classes

1.13.1 Standard environ keys

REQUEST_METHOD The HTTP request method, such as GET or POST. This cannot ever be an empty string, and so is always required. SCRIPT_NAME The initial portion of the request URL’s “path” that corresponds to the application object, so that the application knows its virtual “location”. This may be an empty string, if the application corresponds to the “root” of the server. PATH_INFO The remainder of the request URL’s “path”, designating the virtual “location” of the request’s target within the application. This may be an empty string, if the request URL targets the application root and does not have a trailing slash. QUERY_STRING The portion of the request URL that follows the “?”, if any. May be empty or absent. CONTENT_TYPE The contents of any Content-Type fields in the HTTP request. May be empty or absent. CONTENT_LENGTH The contents of any Content-Length fields in the HTTP request. May be empty or absent. SERVER_NAME SERVER_PORT When combined with SCRIPT_NAME and PATH_INFO, these variables can be used to complete the URL. Note, however, that HTTP_HOST, if present, should be used in preference to SERVER_NAME for recon- structing the request URL. See the URL Reconstruction section below for more detail. SERVER_NAME and SERVER_PORT can never be empty strings, and so are always required. SERVER_PROTOCOL The version of the protocol the client used to send the request. Typically this will be something like “HTTP/1.0” or “HTTP/1.1” and may be used by the application to determine how to treat any HTTP request headers. (This variable should probably be called REQUEST_PROTOCOL, since it denotes the protocol used in the request, and is not necessarily the protocol that will be used in the server’s response. However, for compatibility with CGI we have to keep the existing name.) HTTP_ Variables Variables corresponding to the client-supplied HTTP request headers (i.e., variables whose names begin with HTTP_). The presence or absence of these variables should correspond with the presence or absence of the appropriate HTTP header in the request.

1.13. Definitions of keys and classes 41 www.wsgi.org Documentation, Release 0.9

1.13.2 WSGI environ keys wsgi.version The tuple (1, 0), representing WSGI version 1.0. wsgi.url_scheme A string representing the “scheme” portion of the URL at which the application is being invoked. Normally, this will have the value “http” or “https”, as appropriate. wsgi.input An input stream (file-like object) from which the HTTP request body can be read. (The server or gateway may perform reads on-demand as requested by the application, or it may pre- read the client’s request body and buffer it in-memory or on disk, or use any other technique for providing such an input stream, according to its preference.) wsgi.errors An output stream (file-like object) to which error output can be written, for the purpose of recording program or other errors in a standardized and possibly centralized location. This should be a “text mode” stream; i.e., applications should use “n” as a line ending, and assume that it will be converted to the correct line ending by the server/gateway. For many servers, wsgi.errors will be the server’s main error log. Alternatively, this may be sys.stderr, or a log file of some sort. The server’s documentation should include an explanation of how to configure this or where to find the recorded output. A server or gateway may supply different error streams to different applications, if this is desired. wsgi.multithread This value should evaluate true if the application object may be simultaneously invoked by another thread in the same process, and should evaluate false otherwise. wsgi.multiprocess This value should evaluate true if an equivalent application object may be simultaneously invoked by another process, and should evaluate false otherwise. wsgi.run_once This value should evaluate true if the server or gateway expects (but does not guarantee!) that the application will only be invoked this one time during the life of its containing process. Normally, this will only be true for a gateway based on CGI (or something similar).

42 Chapter 1. Contents CHAPTER 2

Contributing

Found a typo? Or some awkward wording? Want to add a link to a presentation, a tutorial or a new (or old and missing) WSGI-related tool? Fixing a dead link? WSGI.org is open-source and hosted on , contributions are encouraged and appreciated.

43 www.wsgi.org Documentation, Release 0.9

44 Chapter 2. Contributing CHAPTER 3

Indices and tables

• genindex • search

45 www.wsgi.org Documentation, Release 0.9

46 Chapter 3. Indices and tables Bibliography

[xml2006-09] .com, Sept 2006. Part 1: getting started [xml2006-10] xml.com, Oct 2006. Part 2: Making Use of a Middleware [Ochtman2010] Dirkjan Ochtman, (lost link), 2010 [Ronacher2009] Armin Ronacher, http://bitbucket.org/ianb/wsgi-peps/src/tip/pep-XXXX.txt, 2009 [McDonough2010] Chris McDonough, http://github.com/mcdonc/web3/blob/master/web3.rst, 2009

47 www.wsgi.org Documentation, Release 0.9

48 Bibliography Index

C wsgi.uri_encoding, 40 CONTENT_LENGTH, 23, 38 wsgi.url_scheme, 42 wsgi.version, 42 E wsgiorg.routing_args, 11, 12 environment variable wsgiorg.user_info, 35 PATH_INFO, 40 x-wsgiorg.developer_user, 18 QUERY_STRING, 40 x-wsgiorg.fdevent, 15 SCRIPT_NAME, 40 x-wsgiorg.fdevent.readable, 14, 16 wsgi.input, 40 x-wsgiorg.fdevent.timeout, 14 wsgi.post_form, 34 x-wsgiorg.fdevent.writable, 14, 16 wsgi.url_scheme, 40 x-wsgiorg.throw_errors, 27 environment variable x-wsgiorg.want_parsed_response, 20, 23 CONTENT_LENGTH, 23, 38, 41 H CONTENT_TYPE, 41 HTTP_HOST, 41 HTTP_HOST, 41 paste.throw_errors, 28, 34 PATH_INFO, 11, 12, 38, 40, 41 P QUERY_STRING, 38, 41 paste.throw_errors, 28, 34 RAW_PATH_INFO, 39 PATH_INFO, 40 REMOTE_ADDR, 38 PATH_INFO, 11, 12, 38, 40, 41 REQUEST_METHOD, 41 Python Enhancement Proposals REQUEST_PROTOCOL, 41 PEP 3100#atomic-types, 29 REQUEST_URI, 40 PEP 3137, 29 SCRIPT_NAME, 11, 12, 38, 40, 41 PEP 333, 37 SERVER_NAME, 41 PEP 3333,1, 39 SERVER_PORT, 41 PEP 383, 40 SERVER_PROTOCOL, 41 SERVER_SOFTWARE, 38 Q wsgi.charset, 29, 30 QUERY_STRING, 40 wsgi.errors, 36, 42 QUERY_STRING, 38 wsgi.file_wrapper, 36 wsgi.handleErrors, 28 R wsgi.input, 14, 20, 22–24, 32–34, 36, 38, 42 RAW_PATH_INFO, 39 wsgi.multiprocess, 42 REMOTE_ADDR, 38 wsgi.multithread, 42 REQUEST_PROTOCOL, 41 wsgi.path_info, 40 REQUEST_URI, 40 wsgi.post_form, 32, 33 RFC wsgi.run_once, 42 RFC 2047, 36 wsgi.script_name, 40

49 www.wsgi.org Documentation, Release 0.9

S SCRIPT_NAME, 40 SCRIPT_NAME, 11, 12, 38, 40, 41 SERVER_NAME, 41 SERVER_PORT, 41 SERVER_SOFTWARE, 38 W wsgi.charset, 29, 30 wsgi.errors, 36 wsgi.file_wrapper, 36 wsgi.handleErrors, 28 wsgi.input, 40 wsgi.input, 14, 20, 22–24, 32–34, 36, 38 wsgi.path_info, 40 wsgi.post_form, 34 wsgi.post_form, 32, 33 wsgi.script_name, 40 wsgi.uri_encoding, 40 wsgi.url_scheme, 40 wsgiorg.routing_args, 11, 12 wsgiorg.user_info, 35 X x-wsgiorg.developer_user, 18 x-wsgiorg.fdevent, 15 x-wsgiorg.fdevent.readable, 14, 16 x-wsgiorg.fdevent.timeout, 14 x-wsgiorg.fdevent.writable, 14, 16 x-wsgiorg.throw_errors, 27 x-wsgiorg.want_parsed_response, 20, 23

50 Index