LAMP and the REST Architecture Step by step analysis of best practice

Santiago Gala High Sierra Technology S.L.U. Minimalistic design using a Resource Oriented Architecture What is a Software Platform

 (Ray Ozzie )  ...a relevant and ubiquitous common service abstraction  Creates value by leveraging participants (e- cosystem)  Hardware developers (for OS level platforms)  Software developers  Content developers  Purchasers  Administrators  Users Platform Evolution

 Early stage: not “good enough” solution  differentiation, innovation, value flows  Later: modular architecture, commoditiza- tion, cloning  no premium, just speed to market and cost driven  The platform effect - ossification, followed by cloning - is how Chris- tensen-style modularity comes to exist in the software industry. What begins as a value-laden proprietary platform becomes a replaceable component over time, and the most successful of these components finally define the units of exchange that power commodity networks. ( David Stutz ) Platform Evolution (II)

 Example: PostScript  Adobe  Apple LaserWriter  Aldus Pagemaker  Desktop Publishing  Linotype  imagesetters  NeWS (Display PostScript)  OS X  standards (XSL-FO -> PDF, Scribus, OOo) Software Architecture

 ...an abstraction of the runtime elements of a software system during some phase of its oper- ation. A system may be composed of many lev- els of abstraction and many phases of opera- tion, each with its own software architecture.  Roy Fielding (REST) What is Architecture?

 Way to partition a system in components  Way for components to locate and co-operate with other components  Way to information flow through the sys- tem  Way to evolve components independently  Way to describe the above Elements of Architecture

 Components  Abstract units of instructions and internal state that provides data transformation through a given interface  Connectors  Abstract mechanism that mediates communication, coordination or cooperation between components  Data  Element of information that is transferred from a com- ponent, or received by a component, via a connector Architecture characteristics

 Elements  i.e. Components, Connectors, and Data  Properties  what can be achieved by combination of the above in a certain way. Quality attributes  Styles  An Architectural style is a coordinated set of architec- tural constraints that restricts the roles/features of ar- chitectural elements and the allowed relationships among those elements within any architecture that conforms to the style Architecture characteristics (II)

 Configuration  Structure of architectural relationships amongst com- ponents, connectors and data during a period of the system runtime  Patterns  Software patterns, etc.  Views  Global Architecture aspects. Examples:  temporal issues  state and control approaches  data representation  transaction life cycle  security safeguards  peak demand and graceful degradation. Network App. Styles

 Process Oriented  Information is “processed” by components  Data Flow Oriented  information flows in a component pipeline  Resource (Document) Oriented  components serve and receive documents  Message Oriented  components send and receive messages Data-flow styles

● Process is analized as information that moves between components ● Pipe and Filter (PF) ● Uniform P&F (UPF) Replication styles

● Proces is achieved by copying data to the com- ponent that is going to use it ● Replicated Repository (RR) ● Caché ($) Hierarchical

● Process is achieved by a configuration of com- ponents with diverse roles ● Client-Server (CS) ● Layered System, Layered CS (LS, LCS) ● Client-Stateless-Server (CSS) ● Client-Caché-Stateless-Server ($SS) ● Remote Session (RS) ● Remote Data Access (RDA) Mobile Code

● Not just data, but code moves for the process- ing ● (VM) ● Remote Evaluation (REV) ● Code on demand (COD) ● LCODC$SS ● Mobile Agent (MA) Peer-to-Peer

● No hierarchy of clients and servers between the processing nodes ● Event Based Integration (EBI) ● C2 (combines EBI with LCS) ● Distributed Objects (DO) ● Brokered Distributed Objects (BDO) P . y

Style Compay rit son m i t m . l y i y r y y i y . l t z t o t i t i r i i o c . i b y i f l l f i l i b t u i r n c b i i r m i s a l b e g b e l a i b e i o i v n a a f P t s p l a l b P t c e i i

i s n t u r l a o t f m s e P u o e e f c i v x i o

Style Derivation N U E S S E E C C R V P R PF +- + + + + + UPF PF - +- ++ + + ++ ++ + RR ++ + + $ RR + + + + CS + + + LS - + + + + LCS CS+LS - ++ + ++ + + CSS CS - ++ + + + + C$SS CSS+$ - + + ++ + + + + LC$SS LCS+C$SS - +- + +++ ++ ++ + + + + RS CS + - + + - RDA CS + - - + - VM +- + - + REV CS+VM + - +- + + - + - COD CS+VM + + + +- + + - LCODC$SS LC$SS+COD - ++ ++ +4+ + +-+ ++ + + + +- + + MA REV+COD + ++ +- ++ + + - + EBI + -- +- + + + + - - C2 - + + ++ + + ++ +- + +- DO - + + + + + - - BDO - - ++ + + ++ - + REST

 Representational State Transfer  The Web Architectural Style  client-server  browser server  stateless  requests are selfcontained  cache  Protocol has provisions for caching  uniform interface  HTTP/verbs (GET, POST ...)  layered system  proxies, balancers, gateways  code-on-demand  applets, Elements of REST  Data  Resource (Abstract Document)  Resource Identifier (URL, URN)  Representation (HTML, RSS, )  Representation metadata (media type, ... )  Resource metadata  Control data  Connectors  client (libwww)  server (libapr)  caché  resolver  tunnel  Components  origin server (Apache, Tomcat, AOLServer,...)  gateway (mod_proxy, mod_jk)  proxy (Squid)  user agent (Firefox, Safari, MSIE) Ye olde CRUD...

 Resource Oriented State Transfer  Uniform Interface...  HTTP verbs: PUT, GET, POST, DELETE  CRUD verbs: CREATE, READ, UPDATE, DELETE  ...but Document Oriented  The concept of “site” is strong  collection of resources  headlines and logs are typical of publishing  CRUD is too shallow for small grain objects, but right for big grain documents Different things, but related

 A platform usually brings an Architectural Style “natural” to it  There are reinforcement loops:  The Platform makes certain Architectures easier  Style becomes ecosystem “culture”  In a house construction example, wooden American houses do have distinct architec- ture style than brick European ones, or adobe Web Platforms

 LAMP/WAMP  and LAP/WAP :-)  driven by power developers and users  J2EE  driven by traditional EIS providers  more oriented to intranet or complex services  .NET (a second iteration clone of J2EE)  tries to preserve the Client Server Windows ecosystem (VB.NET, ASP)  driven by Microsoft constituents LAMP

 Linux, Apache, MySQL, /PHP/Python  WAMP variant, using Windows  Variant: LAP or WAP (no )  Dynamic Languages...  High Level (abstract types, code is first order)  Open Source (“grass roots”)  Dynamically typed (“type goes with object”)  ...with rich libraries  scrape a web site in 200 LOC  CPAN, PHP-lib or Python  could become even stronger with parrot LAMP works

 Quality: “fitness for a given purpose”  A quality Architectural style/Platform is fit for the requirements and constraints of the system  deliver in changing spec environments  statelessness lead to big scalability  Loose Coupling  Minimizing state  Reactive Programming  for Web Apps, J2EE/.NET follows LAMP (and PyBlosxom)

 Rael Dornfest design and implementation  The Zen of blogging  Minimalistic software  Microcontent rendering engine  Extensible and simple to install and use  Could it be the Zen of Content Manage- ment? Blosxom (and Pyblosxom) (II)

 Blosxom 2: The Zen of blogging  614 LOC, 17k core  413 LOC, 11k plugins  A bit more in docs and samples  Pyblosxom: a clone in python  1780LOC, 58K core  more comments and plugins (150K total) Mombo

 Sam Ruby's blogging code  blosxom derivative  Started as a mix between perl, python, java and mod_rewrite  Now it is python+procmail+mod_rewrite  More sophisticated, enforces xhtml  Less “platform-like” design  still very small and powerful Size does matter (Code Size)

 The less lines of code  less bugs  less maintenance  smaller learning curve Size does matter (Content Granularity)

 It is no longer the Web Document  Content Management Unit -> Entry (Infolet)  Rendering Unit -> Portlet  PIM (Personal Information Manager)  90's, Contacts, Calendar, Email, Documents  Lotus Organizer, Outlook, Notes  PKM (Personal Knowledge Manager)  00's Calendar, Blogroll, Entries, Email, Bookmarks  Blosxom, Groovy?, Chandler?, Dashboard?  Services: del.icio.us, Technorati, Bloglines Size does matter (Platformwise)

 The less lines of code  more visible are the design and architecture trade-offs  easier to deploy and reuse  Value shifts from platform vendor to platform user (trainer, deployer, customizer)  Value shifts to services around *Blosxom Architecture

 CGI (gateway) or Static Generator (caché)  Store (File System)  Micro Pipeline:  Parser  Aggregator  Renderer *Blosxom Components

 File system: hierarchical database  resources  flavours  templates  Title and meta info inside the file  Could be substituted for real metainfo (Reiser, HFS+, ...)  URLs can be mapped  to make a dynamic site look static  to make a static site as a caché of dynamic generated code *Blosxom Data

 URL maps to hierarchy of entries...  symlinks and tags will work in 3.0  ...or temporal period...  ...plus representation...  Decouple “flavours” from resources (html, , atom)  ... via templates  with var interpolation) *Blosxom design

 Pipeline pattern (with plugins)  Extension points for template selection  head  date  story  foot  entries (selection)  filter (entries)  end (called after )  skip (shortcut generation)  interpolate (variables)  sort (entry list)  last (final call) Are *Blosxom RESTy?

 Representational  i.e. flavoured  document (Entries) or lists (sequences of Entries)  State  dynamic/static  static generation  cached parsed entries and entry collection  Transfer  CGI or static GET  POST for comments (plugin)  XML/Atom for trackback (plugin) Does Architecture matter?

 Should it?  The answer is yes  To Platform developers  To application developers  To users Demoed and listed Code

 Blosxom 2.0  http://www.blosxom.com  PyBlosxom (CVS)  http://roughingit.subtlehints.net/pyblosxom/  Mombo  http://intertwingly.net/code/mombo/ References

 Roy T. Fielding's disertation  http://www.ics.uci.edu/~fielding/pubs/dissertation/software_arch.htm  RESTwiki http://rest.blueoxen.net/cgi-bin/wiki.pl  Ray Ozzie on Platform dynamic  http://www.ozzie.net/blog/stories/2002/09/24/softwarePlatformDynamics.html  David Stutz on Software Platforms  http://www.synthesist.net/writing/software_platforms.html  Sam Ruby's REST+SOAP  http://www.intertwingly.net/stories/2002/07/20/restSoap.html  The Atom Wiki  http://intertwingly.net/wiki/pie/FrontPage LAMP and the REST Architecture

Step by step analysis of best practice

Santiago Gala High Sierra Technology S.L.U. Introduction

When the web took the software and content worlds by a surprise raid, the whole world got shaken. A new architectural style (way of doing things) was getting there. Also, developers started writing and combining new tools and languages, to create a software platform which would enable development using this style. The Architectural Style was later called ReST (Representational State Transfer) by Roy T. Fielding, one the the Apache Software Foundation co­founders and member, former Chairman. His Master's dissertation and other papers are an essential part of this session. With a new architectural style, software platforms had to adapt or die. In the process, an spontaneous Open Source was created. It was later called LAMP and given identity by O'Reilly, one of the platform players, together with the Apache Software Foundation, MySQL, the perl, python and PHP communities, ... The Open Source community as a whole has been a player here, including the complex linux ecosystem. This talk will discuss those concepts using blosxom, and pyblosxom and Sam Ruby's mombo as illustrations of the platform concepts. What is a Software Platform

We'll use a definition picked from Ray Ozzie: A platform is defined to be a relevant and ubiquitous common service abstraction. A platform's raison d'être is to create value by generating significant leverage for a variety of constituencies. Software platforms, for example, create leverage for many parties such as hardware developers, software developers, content developers, purchasers, administrators, and users. These parties are collectively referred to as the platform's "ecosystem". The corollary is that platforms do not sustain growth if they do not sustainably create significant unique, relevant and leveragable value for the ecosystem.

Example of mixed Platform

We are going to use PostScript and Desktop publishing for an example of a mixed platform and the dynamics and temportal evolution it showed in time. People older than 40 in our industry will surely have seen it. Other good examples are the IBM PC hardware platform or Microsoft DOS and Windows. I like PostScript as an example because it shows the ecosystem very distinctly, and I found a lots handy documents and web pages describing succinctly PostScript's history:

The concepts of the PostScript language were seeded in 1976 when Dr John Warnock was at Evans and Sutherland Computer Coporation. At that time John Gaffney, of Evans and Sutherland, was developing an for a large three-dimensonal graphics database of New York harbour. Gaffney conceived the "Design System" language (very similar to the language FORTH).

John Warnock then joined the Xerox Coporation's Palo Alto Research Centre (Xerox P.A.R.C) to work with Martin Newell. They reshaped the Design System into JaM (John and Martin) which was used for VLSI design and the investigation of type and graphics printing - culminating in InterPress, Xerox's printing protocol.

In 1982, John Warnock left Xerox, together with Chuck Geschke, and founded Adobe Systems Inc. The name Adobe was taken from an Indian creek near to where John Warnock lived. Their aim was to build a dedicated publishing workstation and the final two-dimensional graphics handling product was named PostScript.

About the same time Steve Jobs, who had earlier founded Apple Computers, was looking for a solution for a high quality office printing system problem. Steve Jobs urged Adobe to develop a system to drive a laser printer. With the drop in price of memory, the first low cost laser printer engine from Canon, and a bit-mapped computer from Apple, the first PostScript printer hit the market in 1985. This was the Apple Laserwriter printer and it sold for $7000.

In the same page, later, the first newsgroup mention of PostScript, by Wang Zeep, shows already platform dynamics going on:

From: Wang Zeep 1) Rumor from a friend in the business of designing laser printers has Apple using the Post Script protocol from Adobe systems to specify their page layouts. This is important because

a) Microsoft and QMS both will support the same protocol. b) Mergenthaler Linotype signed with Adobe to provide high- quality digitized fonts for the PostScript protocol.

2) He also says that it will be massively intelligent and will include a 12 Mhz 68000 and 2 megabytes of 256Kbit RAMs. He cannont see how it will come out for less than $8000. (QMS's version of the Canon is said to be similar and costs $10000).

3) One possibility if this is all true is that Apple may be able to sell IBM compatible laser printers, as Microsoft will surely bring out a PostScript compatible version of Word.

The History of PostScript page gives us further clues:

A computer linked to a powerful laser printer would not have made much of an impact but Apple and Adobe were fortunate enough to stumble upon a third partner, a small startup company that had created an application to utilize the Mac and LaserWriter to their full extent. The company was called Aldus and their software product was called PageMaker. Desktop publishing was born and within a year, the combination of the LaserWriter, PostScript and PageMaker saved Apple and turned Aldus and Adobe into rich companies. Linotype was the first graphic arts supplier to recognize the value of PostScript and offer an imagesetter with its own PostScript RIP. Other manufacturers soon followed and PostScript quickly became the lingua franca of the prepress world. The ecosystem at stake was formed by: – Adobe with PostScript: a innovative language allowing high quality use of the first commoditized laser printer engine. – Apple with the MacIntosh: (relatively) affordable bitmapped graphics – Canon with their Laser Printer engines – Aldus with software to publish – Linotype, selling innovative digital imagesetters for smaller presses

Later, Adobe bought Aldus, Sun tried to clone a PostScript engine for display to use it in NeWS (which was a big failure but made its way through NeXT Computer to the Carbon OS X engine). Curiously enough, James Gosling, creator of Java, and Patrick Naughton, who told the story in an appendix of his Java programming book, were some of the people disappointed with how Sun was playing the platform game with NeWS. The point in the email message about Microsoft adding PostScript support to Word reminds us that we were in DOS times, pre OS standard printer driver, so each application or device supplier had to bring their printer drivers. When Windows started to be a dominant platform, some erosion happened to this idyllic Desktop Publishing landscape, but even so, today, about twenty years later, Apple and Adobe PostScript/PDF are essential players in the publishing industry. In fact, this document is being delivered as PDF in the CD. PDF is an evolution of PostScript, which is being further commoditized as the XSL­FO standard. Software Architecture

Roy T. Fielding defines software architecture as:

...an abstraction of the runtime elements of a software system during some phase of its operation. A system may be composed of many levels of abstraction and many phases of operation, each with its own software architecture. Again from Roy T. Fielding's Master Dissertation, we can think about the Software Architecture as a: – Way to partition a system in components – Way for components to locate and co­operate with other components – Way to make information flow through the system – Way to evolve components independently – Way to describe the above

He studies in particular a network­based application architectures. Being programming a social undertaking for all but initial design efforts or smaller projects we see that a lot of the Software Architecture purpose is “Way to describe the above”. This is specially true in network­based architectures, when typically nobody controls both sides of the network, and there needs to be a strong contract between producers and consumers of data.

According to him, any architecture can be characterized by its – Elements – Components, Connectors and data – Properties – quality attributes, and other properties of the combination of the Arch. elements according to a style – Styles – coordinated set of constraints that restricts the roles or features of the elements and the allowed relationships in a coherent way – Configuration – Patterns – Views

Also, this time taken from a talk on the Web Architecture by Roy T. Fielding: Architectural Styles – Common patterns within system architecture – One system may be composed of multiple styles – Some styles are hybrids of other styles – An architecture is an instantiation of a style – We could equally talk about – computer architecture – network architecture – software architecture [Shaw/Garlan, 1993] – network­based application architecture

In his thesis, he describes several families of Architectural styles: – Data Flow Oriented – Pipe and filter (PF) – Uniform Pipe and filter (UPF) – Replication – Replicated Repository (RR) – Caché ($) – Hierarchical – Client­Server (CS) – Layered System, Layered Client­Server (LS, LCS) – Client­Stateless­Server (CSS) – Client­Caché­Stateless­Server (C$SS) – Remote Session (RS) – Remote Data Access (RDA) – Mobile Code – Virtual Machine (VM) – Remote Evaluation (RE) – Code on demand (COD) – ReST (LCODC$SS) – Mobile Agent – Peer to peer – Event Based Integration – C2 (EBI + LCS) – Distributed Objects (DO) – Brokered Distributed Objects (BDO) and compares all of them according to a set of properties in a table like: P . y y t m i t m . r l y i y r y y i y . l t z t o t i t i r i i o c . i b y i f l l f i l i b t u i r n c b i i r m i s a l b e g b e l a i b e i o i v n a a f P t s p l a l b P t c e i i

i s n t u r l a o t f m s e P u o e e f c i v x i o

Style Derivation N U E S S E E C C R V P R PF +- + + + + + UPF PF - +- ++ + + ++ ++ + RR ++ + + $ RR + + + + CS + + + LS - + + + + LCS CS+LS - ++ + ++ + + CSS CS - ++ + + + + C$SS CSS+$ - + + ++ + + + + LC$SS LCS+C$SS - +- + +++ ++ ++ + + + + RS CS + - + + - RDA CS + - - + - VM +- + - + REV CS+VM + - +- + + - + - COD CS+VM + + + +- + + - LCODC$SS LC$SS+COD - ++ ++ +4+ + +-+ ++ + + + +- + + MA REV+COD + ++ +- ++ + + - + EBI + -- +- + + + + - - C2 - + + ++ + + ++ +- + +- DO - + + + + + - - BDO - - ++ + + ++ - +

ReST, the Web Architecture

It is, as the table above shows, Layered Code­on­demand Client­Caché­Stateless­ Server – Client­Server – There are Web Servers and browsers or other user agents exchanging data – Stateless – requests are self­contained (nit: server­side sessions and cookies) – Cached – HTTP has provisions for caching, and transparent and explicit caches are often found in the network. Also, clients have local caché. – Uniform Interface – HTTP, with a few verbs: GET, POST, PUT, DELETE, HEAD, WebDAV extends slightly the protocol – Layered System – Gateways (CGI, Servlet API, Web Services), proxies (direct and reverse) and balancers are part of the architecture – Code­on­demand – ECMAscript is an important part of the architecture, Java applets, Flash plugins and other media types are also used. I steal again from Fielding's Infrastructure and Evolution: – Web architectural style inherits from – client/server: separation of concerns, scalability – pipe­and­filter: streams, intermediaries, encapsulation – distributed objects: methods, message structure – Advantages of representational state transfer: – application state controlled by the user agent – composed of representations from multiple servers – representations can be cached, shared – matches hypermedia interaction model of combining information and control

Elements of ReST

More from World Wide Web Infrastructure and Evolution:

The Web architectural style revolves around five fundamental notions: – resource – representation of a resource – communication to obtain/modify representations – web “page” as an instance of application state – engines to move from one state to the next – browser – spider – any media type handler

A resource, tells Fielding, is characterized by its identity (URI). It responds to a Document Oriented view of the world, as it is relatively coarse grained. It can be a document, image, collection of those, a service (Weather, TicketBooth) or a person. A resource can have multiple representations, which the system returns as media types (HTML, RSS, Atom, PNG, SVG, XSL­FO/PDF). Some of the resources are suitable for state transitions, and POST is a way to act on the state of a resource. In the TicketBooth example, a GET would return the price and availability of tickets for the different shows, or, if augmented with a transaction number, the state of a reservation; a POST would reserve or cancel a number of tickets.

We can see ReST is Document Oriented, very suitable for coarse grain transfers. In a sense, it is the old CRUD (Create, Read, Update, Delete) method for resource manipulation. But while CRUD, used in a transactional fine grain environment, subspecifies the systems, making for a trivial method, when used in a coarse grained “universal” networked document oriented environment like the web it is the minimum common ground. It shifts the complexity of architecting the system to places where the actors can actually do something: the definition of their documents and the transitions, in the server side, and the handling of them, in the client side. Platform and Architecture

If we compare with the platform definition, we see that there are strong relationships between a platform and the Architectural styles it enforces or suggests. The relationship is of feedback loop, as the more known is an Architectural Style, the easier will be to choose the platform, and the most common the platform, the better known will be the Architectural concepts and constraints it brings. The relationships are: – Cultural, as the platform practitioners do have natural ways of doing things – Technological, as some architectural styles are not viable or practical on certain platforms. As the platform evolves, the Architectural styles characteristic of it gets engraved in the ecosystem, and it looks often like the way of doing things inside this platform. This is part of the ossification process that the platform undergoes, and when changes in the substratum makes the Architectural style unsuitable for application development, it contributes to the displacement effect, as other platform with different components and Architectural constraints and styles substitute it.

Comparison between platform and architecture: the PostScript Desktop

The publishing Architecture as described above has certain distinctive architectural style elements, which were actually very novel at the time. The salient ones are: – Device independent of the data model – Vector oriented formats as bridge – for bitmapped displays – for bitmapped printers – Scalable fonts with metrics – Mobile code Adobe's PostScript, a compact language designed for imagesetters which used extensively device independent Bezier curves, played a central role in the platform. This role was because the PostScript imaging model was the bridging element between the revolutionary bitmapped (inheriting from PARC's blitter engine) Mac computers and the revolutionary bitmapped Canon Laser drum, which HP and other makers used mostly by transmitting characters through a serial port. At the time, printers were seen as character devices, and extensions for handling bitmaps were mostly seen as arbitrary character shapes or bit strips, coming from the dot matrix printer technology. At the time, too, the IBM PC had very poor graphic features and even a CGA (Character Graphic Adaptor) by default. Applications worked in character mode. The Internet Software Platforms

When the internet appeared, disrupting the whole content publishing, IT and software industries, the development platforms were not ready for it. The norm of the moment was: – componentware using VB or VC++ with OLE objects th – Ditto with other competing 4 generation languages, like PowerSoft's Powebuilder – Lotus Notes, again in big companies, as it is a non­trivial deployment – The Corba world, mostly deeply hidden in big companies (Telcos, etc.) – Mainframe applications, using terminal emulation, again only visible in BiCo The internet had a very importan effect: initially, it was not understood by the IT people or software vendors. This meant that quite often it was the marketing department who was in charge of the web site, and the strict rules and constraints of the corporate IT ecosystem did not apply to web apps. LAMP

A new internet web platform was developed because of a number of factors: – when the web took the world by surprise, there were no good tools for the development of internet dynamic sites and applications – The leading was not suited for exposure to the wild internet, due to security flaws – The leading OS was not suited to the REST Architectural style – the early available “platform components” (the NCSA , CGI kits, etc.) were mostly Open Source and coming from a research background – Once people gets used to the mix­and­match of Open Source, it is difficult for them to go back to a vendor­driven platform This Web platform has been called, a posteriori, LAMP. LAMP was first acknowledged and publicized as a platform during 2001 by O'Reilly Network, with the creation of the OnLAMP site. O'Reilly had a strong need to brand their Open Source book series, and thus, labelling it was a natural thing for them to do:

Dale Dougherty, Director of Research and published of the O'Reilly network, puts it as follows in the first reference to the term I could find: Several months ago, David Axmark and Monty Widenius of the MySQL team visited us in Sebastopol and they dropped a new term in our laps: LAMP. This term was popular in Germany, they said, to define how MySQL was used in conjunction with Linux, Apache, and either Perl, Python, or PHP. Their explanation of LAMP made a lightbulb go off in my head. (...) We have felt that the market has ignored the tools that make Linux a great applications development platform, especially for robust web applications that run on Linux servers. The lightbulb that went off in my head was that LAMP represents the open source web platform. Most importantly, LAMP is the platform of choice for the development and deployment of high performance web applications. It is solid and reliable, and if Apache is any indicator, then LAMP sites predominate. (...) Of course, there are plenty of excellent open source variants for any of the pieces of LAMP. Let the L stand for Linux, FreeBSD, NetBSD, OpenBSD, and Darwin/Mac OS X, all of which are open source operating systems and all but the latter have open source GUI layers. Let the M stand for MySQL and PostGreSQL. Let the P stand for PHP, Perl, Python, and Ruby. Any platform must create value for its constituents. In the case of LAMP, during the initial stages of adoption, being able to just develop a working solution was a key element. Writing CGI programs in C or C++ using slow corporate development cycles was simply not an option. Waiting for vendors to develop solutions such as Microsoft IDC, and later ASP/JSP meant getting tied to their slow evolution and getting dependent on their OS. The good tools were outragedly expensive, additionaly, as they were arriving to a virgin territory. Microsoft played hard to have things going on in the client (VBScript, and later ActiveX, etc.), thus showing their deep roots as riders of the IBM PC hardware platform. They managed to sink Netscape and get a mostly MSIE client side. But the monopoly threats, Sun/Netscape/AOL efforts to avoid having a bottleneck between servers and content delivery, and the emergent cellular and embedded markets and making them fail in their effort to force the whole world to use their proprietary . Their server side solutions were sketchy, with a plague of problems, mostly due to their tight integration approach to software development. Tight integration, with modules at the level of libraries and routines had served them well, but was beginning to show flaws. Also, their solutions were not able to scale beyond a certain traffic, which proved a big problem in the Internet, where orders of magnitude traffic differences are the norm. Purchasers and Administrators, quite often the gatekeepers of platform ecosystems, got caught in a trap. Web sites were, in the early stages, often developed by the Marketing Department, and were outsourced, rather than pass through the IT Dept. visa and resources. When the systems were sought to be made dynamic, usually as an afterthought after the rush to the market, the Architectural blocks were already laid, and the IT people were forced to follow suit or risk to spectacular failure (seen plenty of both). LAMP is called sometimes WAMP (when run on top of MS Windows). The Application we will be showing next, the *Blosxom family, is actually using LAP or WAP, as they don't rely on a Relational Database for state, using the filesystem as a hierarchical database. J2EE

The only vendor which managed to get forward was Sun Microsystems, which purchased most of the server products of Netscape when the company sank, and build a platform around it, the , the Servlet API and J2EE. Java was originally conceived as a systems language for networked devices like settop boxes. But it made an excelent platform for server side development for a variety of reasons: – Tight and well integrated security – Exception management, essential to avoid nasty Error screens on the users – A very good network stack integrated with the platform – Hardware independence, and thus scalability to accomodate traffic growth. Even now, most developers develop on Windows desktops and deploy on Linuw, Solaris or Windows machines. – A system level language with a simple API (servlet) on which to build very fast solutions – A community process (JCP/J2EE) able to build an ecosystem of vendors, including Open Source ones (java.apache.org was a very early player, and the set of J2EE vendors using ASF code and has been growing steadily) IBM was re­inventing itself at the time, so they were able to join the party without significant stakes other than their will to commoditize Microsoft's desktop and server platforms. They have managed to keep their Notes ecosystem alive in quite a number of big companies.

A hardware manufacturer commoditizing server hardware platforms looks like they are trying hard to shoot themselves in the foot, and Joel Spolsky has made this case, speaking about Sun, in his Strategy Letter V:

Headline: Sun Develops Java; New "" System Means Write Once, Run Anywhere. The bytecode idea is not new ­­ have always tried to make their code run on as many machines as possible. (That's how you commoditize your complement). For years Microsoft had its own p­code and portable windowing layer which let Excel run on Mac, Windows, and OS/2, and on Motorola, Intel, Alpha, MIPS and PowerPC chips. Quark has a layer which runs Macintosh code on Windows. The C is best described as a hardware­independent assembler language. It's not a new idea to software developers. If you can run your software anywhere, that makes hardware more of a commodity. As hardware prices go down, the market expands, driving more demand for software (and leaving customers with extra money to spend on software which can now be more expensive.) Sun's enthusiasm for WORA is, um, strange, because Sun is a hardware company. Making hardware a commodity is the last thing they want to do. Oooooooooooooooooooooops! Sun is the loose cannon of the computer industry. Unable to see past their raging fear and loathing of Microsoft, they adopt strategies based on anger rather than self­interest. Sun's two strategies are (a) make software a commodity by promoting and developing (Star Office, Linux, Apache, Gnome, etc), and (b) make hardware a commodity by promoting Java, with its bytecode architecture and WORA. OK, Sun, pop quiz: when the music stops, where are you going to sit down? Without proprietary advantages in hardware or software, you're going to have to take the commodity price, which barely covers the cost of cheap factories in Guadalajara, not your cushy offices in Silicon Valley. He exploits, in this 2002 paper, a microeconomics law: The law of complements:

Smart companies try to commoditize their products' complements. but there is a second part to it (mine if no one claims earlier invention):

Smart companies forced by monopolies to sell at reduced margin and shrinking market share try to commoditize the monopoly products holding them off the market. i.e. if a company sees themselves in a shrinking margin and market share situation, a commodity market is better than a marginalized “take the rests” niche market. This corollary explains why Sun apparently, shoots itself in the foot. Sun is being cornered to an ever shrinking Ivory tower of high yield servers by the Intel/Microsoft platform. So they try to blow it with java and start selling boxes that compete by price and performances, not by proprietary markets: blow the monopoly and get ready to compete in a commodity market. Basically, answering Joel, I think that Sun and IBM expect to be faster to Microsoft sitting down when the music stops, even if they are contributing to make the game more difficult by removing chairs. Sun should know about this, as they started they growth following the trend of machines to be smaller as Moore's Law allows, and kill in the process most of the previous wave players: mainframes, then minicomputers, workstations, microcomputers, and cellular phones. Sun grew on the niche left by DEC Vaxes and PDP machines, .NET

After wandering, trying the embrace and extend strategy with java, loosing a monopoly trial in the US and two in the European Union, all this while protecting their customers and negating that there was any problem with their strategy, Microsoft decided to do a second version of J2EE, and called it .NET. .NET is based on C# and the CLI (Common Language Interpreter), which was standardized as the ECMA­334 standard. Microsoft developed C# as an Open Sourced specification, significantly similar to Java, both in the Virtual Machine architecture and the syntax aspects. There is an Open Source version of the C# interpreter and JIT compiler: mono, which provides multiplatform delivery. Both J2EE and .NET seems to be plagued with patents, and Ximian/Novell, which manages mono, is taking the greatest care to have it implements clones of the Windows proprietary APIs together with GTK+ and Neck bindings, which makes it tempting as a platform for Linux Application development. Dynamic Languages

One of the essential features of LAMP is its use of dynamic languages. David Asher (ActiveState) has an interesting essay on Dynamic languages, classifying them, more on ecosystem grounds than on technical ones, as:

Legacy languages Legacy languages, such as Cobol, Fortran, and PL/I, are important because no matter how much one would like to at times, the past can't be wished away, especially in corporate IT systems. Few IT strategies can effectively accept a "closed world" hypothesis; hence, it is important when considering a new language to evaluate its ability to be bridged to preexisting systems. System languages System languages include C, C++, and, more recently, Java and C#. These languages are characterized by strong typing (as explained in Ousterhout (1998)), the ability to build tightly­coupled efficient systems, and, especially for Java and C#, a tight binding between the language and the underlying platforms (the Java Runtime Environment and .NET respectively). One consequence of the tight integration between the language and the platform is that situations which require breaking the "closed world" assumption can be problematic. Proprietary languages We use the term "proprietary languages" to refer to languages which share many technical features with dynamic languages, but which are owned, controlled, and evolved by corporations. The prototypical example is Visual Basic, which is high­level and adaptable for both scripting tasks and building applications, but whose evolution is driven directly by Microsoft's platform plans. For example, the evolution of Visual Basic from version 6 to Visual Basic .NET caused considerable frustration among its users, but makes sense from the Microsoft point of view because Microsoft believes that all of its users should move to using the .NET framework, something that required deep changes in VB6. Dynamic languages Described in detail in the next section, dynamic languages are defined as high­level, dynamically typed, and open source, developed by a grassroots community rather than a corporation or consortium. The classification stroke me, as I've always considered java as a system language, and here it was one of the first times I saw it acknowledged. It is more related to social and use aspects than to precise technical features. Modern developers are using increasingly java (or C#) and less C++ and C. My personal experience, is that C++ is the worst language I've had to deal with. Having programmed in Fortran, IBM 360 Assembly, Basic, Lisp, Smalltalk, C++, perl, java and python, in this temporal sequence along 15 years, I found C++ lacking dynamicity and flexibility at the same time. i.e., I found it “the worst of both worlds”. At least java has some reflection capabilities and automated memory management to boost productivity. He goes to discuss the criteria. High level is characterized as:

– more abstract built­in data types – syntactic choices emphasizing readabily (perl readable? :­) ), concision or other “soft” aspects of software design – loose/dynamic/weak typing, i.e. not static typing – automated memory management and – interpretation over compilation, i.e. favour inmediacy of response and ability to generate code dynamically over machine efficiency, They are typically slower than system languages but they offer faster startup and more inmediate response. The second criterion, grassroots Open Source, is referred by David Asher in an interesting way: It's not that much about the code as about the community and libraries. i.e. CPAN, or the python or PHP modules. All those dynamic languages blur the frontiers between the core interpreter and the libraries, much in the same way that C did with the stdio, becoming the first machine­independent assembly language in the world. He goes into discussing the licensing aspect: While each of the successful dynamic languages have chosen different specific licenses, it is far from accidental that none selected the more extreme GPL license used by the Linux kernel. All of the successful language communities have deliberately picked licenses that fit equally well with corporate requirements for non­viral licenses and the Free Software Foundation's goals (although clearly not the tactics, given the license differences). In general, the language communities view themselves as on the "liberal" side of the open source debate (inasmuch as any large group can be described as having a consistent opinion), and aren't compelled to pick sides on the morality of proprietary licenses. The third criterion, dynamically typed, is where they actually differ from modern system languages like java or C#. As he points, dynamic typing makes sense when the problem being solved is likely to change fast, or our programs interact with other programs doing so, so that speed writing or refactoring primes over the efficiency and runtime safety of static typing. He goes to discuss the most popular dynamic languages: – perl – python – PHP – Tcl – Javascript/ECMAScript – Ruby, Groovy, Prothon, ... Language libraries

One of the important success factor of dynamic languages over others is their . Even languages such as java, in which you can find hundreds of thousands of server side code just in *.apache.org, it takes some time to understand APIs and interfaces and write the glue to use them. In python, one page of code, typically learned by copying from a different one, goes a long way. A good example: my first python program, apachecon.py, scraps http://apachecon.com to get a iCal file of the data there. I copied most of the code from delicious.py, a module of interesting hacks, in a couple days, including tests and learning enough python syntax to do it. As an example of library use, we can look at Sam Ruby's code to parse blog entries via email: #!/usr/bin/python import email, os, re, sys, time from xml.sax.saxutils import escape from post import sanitize, writeComment

# change directory to the data dir, ensuring that this code remains in the path from config import directory if directory.codebase not in sys.path: sys.path.insert(0, directory.codebase) os.chdir(directory.data)

# parse the e-mail message msg = email.message_from_file(sys.stdin) title = escape(msg['Subject']) name = escape(msg['From'].split(' <')[0]).replace('"','') addr = escape(msg.get_unixfrom().split(' ')[1]).replace('"','"') path = escape(re.sub('\s+', ' ', msg['Received'])).replace('"','"') if 'Sender' in msg: name = escape(msg['Sender'].split(' <')[0]).replace('"','')

# support X-URL Mime header (known to be supported by emacs) if 'X-URL' in msg: href=msg['X-URL'] else: href='mailto:'+addr

# determine the target blog entry. If this fails, python will throw # an exception, and procmail will resume target = re.findall("(^|\W)blog\W(\d+)(\W|$)",msg['To'])[0][1]

# bail on spamAssasin marked e-mail if title.find('*****SPAM*****')==0: sys.exit(99) if msg['X-Spam-Flag']=='YES': sys.exit(99)

# bail on open relays that have proven problematic if path.find('62.118.249.10')>=0: sys.exit(99) if path.find('194.226.128.51')>=0: sys.exit(99) if path.find('209.61.183.90')>=0: sys.exit(99) # unknown prankser

# only allow email for 30 days entry=directory.data+target+'.txt' age=(time.mktime(time.localtime())-os.stat(entry).st_mtime)/86400 if age>=31: sys.exit(99)

# extract and escape the text portion of the payload body = msg.get_payload() if isinstance(body,list): body=body[0].get_payload() body = re.compile("^-- $",re.M).split(body)[0].strip() body = sanitize(body.strip()) body = body.replace("
\n","\n")

# write the comment out where blosxom will find it writeComment(target, title, """%s

Emailed by %s """ % (body, href, path, name))

To be noted that the language is very expressive, and parsing of an email is trivial. To be noted too that any email with a 20Megs attachment will may the email.py module crash. This is typical of dynamic languages: they use worse is better as a guiding principle. Similarly, when I had the idea that an web gateway to email would be a useful app (my boss dismissed it as “nobody will ever give their password to a central server, shortly before Microsoft bought Hotmail), I was able to put a proof of concept working in less than a week, using the perl CPAN modules to parse email, including MIME attachments. One interesting factor in the mix is that some of the languages being discussed here are in scope of Virtual Machine providers (though not all the libraries run in those “managed” environments): – Sun, with the JVM (Jython, stagnating) – Microsoft hired recently Jython's main developer to develop IronPython – Parrot, the next generation , is going to be able to run python and possibly PHP too, in addition to perl If Parrot succeeds and turns into the common runtime for LAMP, and provided also bridges are done between the different calling conventions, the mixture and leveraging of CPAN, python and PHP libraries could be a winer combination. LAMP works

Systems people often dismisses LAMP, bacause of the very reasons that make it a success:

– It does not try to be all things for all people, but quality is defined as “fitness for a given purpose”, and it is good for its purpose

– deliver quickly in changing spec environments

– statelessness, which leads to adaptability and big scalability gains

– loose coupling between components

– minimizing state maintenance

– reactive programming One of the key reasons why dynamic languages are increasingly used in changing environments, is that Open Source is changing the rules of the game. It is not that much about programming; it is about integrating, copying and pasting, and learning by reading. Thus, languages with big libraries available in source form, and which encourage quick execution of written code are suitable for those environments. for the development of dynamic webapps, J2EE and .NET follow LAMP. A recent example is how Friendster, originally developed using JSP and J2EE techniques, was re­engineered using PHP. While the re­engineering could possibly have used velocity/turbine instead of PHP, for example, or even JSP with the proper de­ coupled design, people trained to use EJB don't think in terms of state decoupling and minimize state, which leads to big overhead. A recent reference about the relations between Open Source, vendors and software platforms can be found in Werner Vogels Amazon move FAQ written as he was moving to Seattle to work as Director of Systems Research at Amazon.com:

FAQ #4 ­ Why not Microsoft? (why he would not go to Microsoft to work?) Now that I have professed my love again for the evil empire, I still haven't answered the question at hand. When I started to take Amazon's offer seriously I also asked myself the question: If I was now willing to move to Seattle, why not move to Redmond? In the end there are a number of different reasons, but two fundamental technology ones stick out:

● The challenges that Amazon faces are orders of magnitude different from the solutions that Microsoft is working on. Microsoft and others are developing their software targeting the largest market. Internet giants such as Amazon are not a big market, and their challenges need to be addressed by significantly different architectures, which would take a huge investment by MS and colleagues to deliver on. And it will not sell them millions of additional OS licenses. For someone like me who is interested in scalability and robustness Amazon clearly has the more challenging problem set.

● Related to this is that at Microsoft, and at many other middleware and system software vendors, you develop software that will be used by other developers who will build products for their customers to use. This always puts you two or sometime three steps away from the ultimate requirements. To support this you need to be as generic as possible, as you have to cater to a very large, diverse customer base (developers). At Amazon one has a view over the whole pipeline. The end­to­end requirements are clear from the start, and every step in the software construction process allows you to look for solutions that are specific for the problem, for the service or for the data used. There is no illusion that one single approach or platform will solve all the problems. You need a rather diverse toolbox to be able to build composable, highly scalable internet services at the scope and scale Amazon wants to offer. For a long time I have worked on generic approaches and the criticism you always get makes it feel that it is never good enough. I am very excited to for a change be able work on very specific solutions for which it is clear when it is good enough, or what exactly is needed to make it better. FAQ #5 ­ Will Amazon now use .NET & Windows? Somehow I don't think Amazon.com hired me for my involvement with CLR based technologies or my knowledge of the Windows Server Platform. Amazon has invested a lot of effort in their current Linux based platform, and it works well for them. In developing their services they use quite a few different tools, also at the language level. Scripting, virtual machine based language, as well as old­fashion languages. There seems to be a culture within Amazon to use the best tools for the problem, so developers have a freedom to choose the language they feel is most appropriate. In that culture I certain see a role for CLR/Mono in the future, but it will be something that it will have to earn itself, not force from higher up. Amazon.com, like many other large sites, increasingly relies on open source software. The reason for this is not just financial, but mainly that the availability of source code and shared knowledge allows them to debug problems and confutations more effectively. The black box approach forced upon these large sites by vendors does not work very well, given that these sites push the software to the limits and beyond, and internal views are necessary to monitor and debug the situation.

Blosxom

Rael Dornfest wrote Blosxom, “the Zen of Blogging”. Version 2.0, the current version, is a small perl CGI. 614 lines of code, which comes with 413 lines of plugin code, and slightly more in documents and samples. All in all, less than 30K uncompressed size. PyBlosxom

PyBlosxom is a Blosxom clon written in python. It is, at 58K core, about double size than Blosxom, This is to be expected for several reasons: python is more verbose than perl, and the code has much more comment lines. Also pyblosxom is developed in community, while blosxom is a one man's job. This lends blosxom code more to pruning. Rael told me proudly, when asked about Blosxom 3.0, that “it will have one less line of code”. It is currently in beta. Mombo

Sam Ruby was inspired by Blosxom when he wrote his blogging system, which you can see at http://intertwingly.net/blog/. The code is at http://intertwingly.net/code/mombo/. It is written in python+procmail+mod_rewrite:

– mod_rewrite for URL management and the caching decisions (pages not found are generated on the fly) – procmail for the SMTP gateway. Posts and Comments can be done using email

– python for the rest of the code. Cheetah is the templating system, TBD Size matters

Why am I emphasizing so much the size of blosxom? because code size matters, and matters a lot. The less lines of code, the easiest it will be to maintain, the less bugs will contain, the smaller learning curve it will show. This would be true if, as the saying goes, perl was not a write­only language. Size, and this is a side effect, also matters for this kind of example, The minimalistic, bare bones size of the application exposes all its architecture decisions promptly, so it turns it into a good example for us to use. Size matters also at a different level. Blosxom is also taking an approach to microcontent which I find interesting. The content management unit is no longer the document or the web page. The entry is typically a micro­unit, we could call it infolet, The rendering unit also getting smaller granularity, with portlets substituting servlets or static pages as rendering units. In the 90's, there was a concept called PIM (Personal Information Management). A salient example was Lotus Organizer, which was later cloned by Outlook and Evolution, and also Lotus Notes. The concept was about getting together all the information a person is using:

– Contacts – Meetings and events (Calendar) – Email – PostIt­like notes Currently the concept seems to be morphing into a PKM (Personal Knowledge Manager). A place where we will hold (and publish): – Contacts, Meetings and events (Calendar) – Blogroll – Email – Entries (notes) – Bookmarks This seems to be the are where Chandler, Groove, and possibly Dashboard are playing, with a tight integration with services like Technorati, del.icio.us, Bloglines, Qmail, etc. I feel the *Blosxom design particularly fit for this kind of networked application. *Blosxom Architecture

The main architectural features blosxom shows are: *Blosxom is a CGI (dynamic gateway) or a static page generator (caché). It uses the file system as a store, making it act as a hierarchical database the execution path in the CGI is similar to what other CMS or portal engines show: – entry parsing – aggregation – rendering Components

Store

The store is a sub tree in the file system. Typically it is handled as a hierarchy of categories, like in computers/software/java. The store holds entries (content elements) which are plain files, with a one line title and (optionally) meta information, separated by a blank line from the content. Entries are located by extension, which allows holding information (drafts) from publishing. Also, in blosxom3, symlinks work and do the right thing (never duplicated). In blosxom3, Rael introduces a technique for representing tags, both in the URI and in the entry name or meta­info: as a comma separated tag list. For URIs, tags can optionally be prefixed by “+” or “­”, again with the right semantics.

Flavours and Templates

The “Representational” in ReST is mapped in blosxom through flavours. The system can serve entries in a series of different flavours, typically HTML, RSS or Atom, but it could also be Text or other possibilities. The Store holds a set of templates for each flavour (the CGI has default minimalistic ones inside), which can be overriden at each level in the hierarchy.

Plugins

*Blosxom use extension points for plugins. Plugins are given the possibility of change most features of the operation:

– template selection – head – date – story – foot – Entry selection – Entry filtering – Skip (for shorcut of generation) – interpolate (template variables) – sort (Entry list) – last (final call for plugin bookkeeping)

*Blosxom data

The way it works is:

– Each URL maps to a point in the hierarchy, to a concrete entry... – or to a point in time... (//2004/11/ works, for instance)

– plus representation... (flavour as resource extension or ?flav=)

– via templates (with variable interpolation It can even be used to generate statically different parts of a site, while keeping other parts dynamic. It can even be used, as in Mombo, via mod_rewrite to generate the caché on demand.

Are *blosxom resty?

– Representational

– flavours

– resources are entries or collections of entries

– State (document oriented)

– dynamic/static

– can cache parsed entries to speed the process in the dynamic case

– Transfer

– CGI or static GET

– POST in the comment (plugins)

– XML/Atom/Trackback ... (plugins) Architecture matters blosxom (and pyblosxom) are interesting hacks. They are ReSTful applications, and their architecture shows a few useful concepts:

– file system as a hierarchical database (now that there have been talks about Microsoft using a DB engine for a relational file system, here we are using the file system as a data base). Leveraging the filesystem as DB via webdav or subversion looks easier than doing the same with a relational DB. And a file system is really easy to administer.

– Instead of generating pages “on demand”, generate them “on supply”. This is a good style whenever the reads are far more common than writes. Having static generation after publishing, comments, or in a crontab can save a lot of processing power.

– Leveraging the power of Apache mod_rewrite for a number of tasks. mod_rewrite can be effectively used as a rule­based engine for URL management.

– It is so stateless that everything is done again in each request (loading plugins, searching for templates, etc.). This simplifies greatly the logic, at the cost of more cycles burned. But premature optimization is the root of all evil, and caching plugins or static generation are there for fighting those problems when they come. References

Demoed and listed Code

Blosxom 2.0 http://www.blosxom.com

PyBlosxom (CVS) http://roughingit.subtlehints.net/pyblosxom/ Mombo http://intertwingly.net/code/mombo/

Documents

Roy T. Fielding's disertation http://www.ics.uci.edu/~fielding/pubs/dissertation/software_arch.htm RESTwiki http://rest.blueoxen.net/cgi­bin/wiki.pl Ray Ozzie on Platform dynamic http://www.ozzie.net/blog/stories/2002/09/24/softwarePlatformDynamics.html David Stutz on Software Platforms http://www.synthesist.net/writing/software_platforms.html Sam Ruby's REST+SOAP http://www.intertwingly.net/stories/2002/07/20/restSoap.html The Atom Wiki http://intertwingly.net/wiki/pie/FrontPage Appendix A: Commenting blosxom 2.0

It starts by displaying a few configuration variables

#!/usr/bin/perl

# Blosxom # Author: Rael Dornfest # Version: 2.0 # Home/Docs/Licensing: http://www.raelity.org/apps/blosxom/ package blosxom;

# --- Configurable variables -----

# What's this blog's title? $blog_title = "My Weblog";

# What's this blog's description (for outgoing RSS feed)? $blog_description = "Yet another Blosxom weblog.";

# What's this blog's primary language (for outgoing RSS feed)? $blog_language = "en";

# Where are this blog's entries kept? $datadir = "/Library/WebServer/Documents/blosxom";

# What's my preferred base URL for this blog (leave blank for automatic)? $url = "";

# Should I stick only to the datadir for items or travel down the # directory hierarchy looking for items? If so, to what depth? # 0 = infinite depth (aka grab everything), 1 = datadir only, n = n levels down $depth = 0;

# How many entries should I show on the home page? $num_entries = 40;

# What file extension signifies a blosxom entry? $file_extension = "txt";

# What is the default flavour? $default_flavour = "html";

# Should I show entries from the future (i.e. dated after now)? $show_future_entries = 0;

Plugin configuration

# --- Plugins (Optional) -----

# Where are my plugins kept? $plugin_dir = "";

# Where should my modules keep their state information? $plugin_state_dir = "$plugin_dir/state";

# --- Static Rendering -----

# Where are this blog's static files to be created? $static_dir = "/Library/WebServer/Documents/blog";

# What's my administrative password (you must set this for static rendering)? $static_password = "";

# What flavours should I generate statically? @static_flavours = qw/html rss/;

# Should I statically generate individual entries? # 0 = no, 1 = yes $static_entries = 0;

# ------

Initialization use vars qw! $version $blog_title $blog_description $blog_language $datadir $url %template $template $depth $num_entries $file_extension $default_flavour $static_or_dynamic $plugin_dir $plugin_state_dir @plugins %plugins $static_dir $static_password @static_flavours $static_entries $path_info $path_info_yr $path_info_mo $path_info_da $path_info_mo_num $flavour $static_or_dynamic % month2num @num2month $interpolate $entries $output $header $show_future_entries %files %indexes %others !; use strict; use FileHandle; use File::Find; use File::stat; use Time::localtime; use CGI qw/:standard :netscape/;

$version = "2.0"; my $fh = new FileHandle;

%month2num = (nil=>'00', Jan=>'01', Feb=>'02', Mar=>'03', Apr=>'04', May=>'05', Jun=>'06', Jul=>'07', Aug=>'08', Sep=>'09', Oct=>'10', Nov=>'11', Dec=>'12'); @num2month = sort { $month2num{$a} <=> $month2num{$b} } keys %month2num;

# Use the stated preferred URL or figure it out automatically $url ||= url(); $url =~ s/^included:/http:/; # Fix for Server Side Includes (SSI) $url =~ s!/$!!;

# Drop ending any / from dir settings $datadir =~ s!/$!!; $plugin_dir =~ s!/$!!; $static_dir =~ s!/$!!;

# Fix depth to take into account datadir's path $depth and $depth += ($datadir =~ tr[/][]) - 1;

# Global variable to be used in head/foot.{flavour} templates $path_info = '';

$static_or_dynamic = (!$ENV{GATEWAY_INTERFACE} and param('-password') and $static_password and param('-password') eq $static_password) ? 'static' : 'dynamic'; $static_or_dynamic eq 'dynamic' and param(-name=>'-quiet', -value=>1);

# Path Info Magic # Take a gander at HTTP's PATH_INFO for optional blog name, archive yr/mo/day my @path_info = split m{/}, path_info() || param('path'); shift @path_info; while ($path_info[0] and $path_info[0] =~ /^[a-zA-Z].*$/ and $path_info[0] !~ / (.*)\.(.*)/) { $path_info .= '/' . shift @path_info; }

# Flavour specified by ?flav={flav} or index.{flav} $flavour = ''; if ( $path_info[$#path_info] =~ /(.+)\.(.+)$/ ) { $flavour = $2; $1 ne 'index' and $path_info .= "/$1.$2"; pop @path_info; } else { $flavour = param('flav') || $default_flavour; }

# Strip spurious slashes $path_info =~ s!(^/*)|(/*$)!!g;

# Date fiddling ($path_info_yr,$path_info_mo,$path_info_da) = @path_info; $path_info_mo_num = $path_info_mo ? ( $path_info_mo =~ /\d{2}/ ? $path_info_mo : ($month2num{ucfirst(lc $path_info_mo)} || undef) ) : undef;

Standard template selection: up from the path, looking for files with . extension. Overriden by default templates at the end of the file:

# Define standard template subroutine, plugin-overridable at Plugins: Template $template = sub { my ($path, $chunk, $flavour) = @_;

do { return join '', <$fh> if $fh->open("< $datadir/$path/$chunk.$flavour"); } while ($path =~ s/(\/*[^\/]*)$// and $1);

return join '', ($template{$flavour}{$chunk} || $template{error}{$chunk} || ''); }; # Bring in the templates %template = (); while () { last if /^(__END__)?$/; my($ct, $comp, $txt) = /^(\S+)\s(\S+)\s(.*)$/; $txt =~ s/\\n/\n/mg; $template{$ct}{$comp} = $txt; }

Select active plugins:

# Plugins: Start if ( $plugin_dir and opendir PLUGINS, $plugin_dir ) { foreach my $plugin ( grep { /^\w+$/ && -f "$plugin_dir/$_" } sort readdir (PLUGINS) ) { my($plugin_name, $off) = $plugin =~ /^\d*(\w+?)(_?)$/; my $on_off = $off eq '_' ? -1 : 1; require "$plugin_dir/$plugin"; $plugin_name->start() and ( $plugins{$plugin_name} = $on_off ) and push @plugins, $plugin_name; } closedir PLUGINS; } Plugins can override the template selection:

# Plugins: Template # Allow for the first encountered plugin::template subroutine to override the # default built-in template subroutine my $tmp; foreach my $plugin ( @plugins ) { $plugins{$plugin} > 0 and $plugin- >can('template') and defined($tmp = $plugin->template()) and $template = $tmp and last; }

# Provide backward compatibility for Blosxom < 2.0rc1 plug-ins sub load_template { return &$template(@_); } The default find routine is defined, and plugins get a change to change the routine before it is called:

# Define default find subroutine $entries = sub { my(%files, %indexes, %others); find( sub { my $d; my $curr_depth = $File::Find::dir =~ tr[/][]; return if $depth and $curr_depth > $depth;

if ( # a match $File::Find::name =~ m!^$datadir/(?:(.*)/)?(.+)\.$file_extension$! # not an index, .file, and is readable and $2 ne 'index' and $2 !~ /^\./ and (-r $File::Find::name) ) {

# to show or not to show future entries ( $show_future_entries or stat($File::Find::name)->mtime < time )

# add the file and its associated mtime to the list of files and $files{$File::Find::name} = stat($File::Find::name)->mtime

# static rendering bits and ( param('-all') or !-f "$static_dir/$1/index." . $static_flavours[0] or stat("$static_dir/$1/index." . $static_flavours[0])->mtime < stat($File::Find::name)->mtime ) and $indexes{$1} = 1 and $d = join('/', (nice_date($files{$File::Find::name})) [5,2,3])

and $indexes{$d} = $d and $static_entries and $indexes{ ($1 ? "$1/" : '') . "$2.$file_extension" } = 1

} else { !-d $File::Find::name and -r $File::Find::name and $others {$File::Find::name} = stat($File::Find::name)->mtime } }, $datadir );

return (\%files, \%indexes, \%others); };

# Plugins: Entries # Allow for the first encountered plugin::entries subroutine to override the # default built-in entries subroutine my $tmp; foreach my $plugin ( @plugins ) { $plugins{$plugin} > 0 and $plugin- >can('entries') and defined($tmp = $plugin->entries()) and $entries = $tmp and last; } my ($files, $indexes, $others) = &$entries(); %files = %$files; %indexes = %$indexes; %others = ref $others ? %$others : ();

Plugins are given the option to filter. There is no default filter.

# Plugins: Filter foreach my $plugin ( @plugins ) { $plugins{$plugin} > 0 and $plugin->can ('filter') and $entries = $plugin->filter(\%files, \%others) }

For static content, generation is done here, by looping and calling generate on the proper subhierarchy/date/flavour:

# Static if (!$ENV{GATEWAY_INTERFACE} and param('-password') and $static_password and param('-password') eq $static_password) {

param('-quiet') or print "Blosxom is generating static index pages...\n";

# Home Page and Directory Indexes my %done; foreach my $path ( sort keys %indexes) { my $p = ''; foreach ( ('', split /\//, $path) ) { $p .= "/$_"; $p =~ s!^/!!; $path_info = $p; $done{$p}++ and next; (-d "$static_dir/$p" or $p =~ /\.$file_extension$/) or mkdir "$static_dir/$p", 0755; foreach $flavour ( @static_flavours ) { my $content_type = (&$template($p,'content_type',$flavour)); $content_type =~ s!\n.*!!s; my $fn = $p =~ m!^(.+)\.$file_extension$! ? $1 : "$p/index"; param('-quiet') or print "$fn.$flavour\n"; my $fh_w = new FileHandle "> $static_dir/$fn.$flavour" or die "Couldn't open $static_dir/$p for writing: $!"; $output = ''; print $fh_w $indexes{$path} == 1 ? &generate('static', $p, '', $flavour, $content_type) : &generate('static', '', $p, $flavour, $content_type); $fh_w->close; } } } }

For dynamic content, the CGI prints the output of the generate function # Dynamic else { my $content_type = (&$template($path_info,'content_type',$flavour)); $content_type =~ s!\n.*!!s;

$header = {-type=>$content_type};

print generate('dynamic', $path_info, "$path_info_yr/$path_info_mo_num/$path_info_da", $flavour, $content_type); }

If plugins define a End sub, it gets called now

# Plugins: End foreach my $plugin ( @plugins ) { $plugins{$plugin} > 0 and $plugin->can('end') and $entries = $plugin->end() }

Generation function

When called, it goes through a generation step, unless a plugin returns true in a skip() function. It gets called with:

– a boolean for static generation

– a category path (the directory for which we are generating)

– the date path

– the flavour

– the content­type (two flavours can have the same content type, for instance different templates for aural or print media)

# Generate sub generate { my($static_or_dynamic, $currentdir, $date, $flavour, $content_type) = @_;

my %f = %files;

# Plugins: Skip # Allow plugins to decide if we can cut short story generation my $skip; foreach my $plugin ( @plugins ) { $plugins{$plugin} > 0 and $plugin- >can('skip') and defined($tmp = $plugin->skip()) and $skip = $tmp and last; }

Variable interpolation: default is to substitute any $word{::word}* which appears in the environment, via regexp. Plugins can override to have different behaviour (special case for zero, plurals, etc.)

# Define default interpolation subroutine $interpolate = sub { package blosxom; my $template = shift; $template =~ s/(\$\w+(?:::)?\w*)/"defined $1 ? $1 : ''"/gee; return $template; }; unless (defined($skip) and $skip) {

# Plugins: Interpolate # Allow for the first encountered plugin::interpolate subroutine to # override the default built-in interpolate subroutine my $tmp; foreach my $plugin ( @plugins ) { $plugins{$plugin} > 0 and $plugin->can('interpolate') and defined($tmp = $plugin->interpolate()) and $interpolate = $tmp and last; }

# Head my $head = (&$template($currentdir,'head',$flavour));

# Plugins: Head foreach my $plugin ( @plugins ) { $plugins{$plugin} > 0 and $plugin->can ('head') and $entries = $plugin->head($currentdir, \$head) }

$head = &$interpolate($head);

$output .= $head;

# Stories my $curdate = ''; my $ne = $num_entries;

if ( $currentdir =~ /(.*?)([^\/]+)\.(.+)$/ and $2 ne 'index' ) { $currentdir = "$1$2.$file_extension"; $files{"$datadir/$1$2.$file_extension"} and %f = ( "$datadir/$1$2.$file_extension" => $files{"$datadir/$1$2.$file_extension"} ); } else { $currentdir =~ s!/index\..+$!!; }

# Define a default sort subroutine my $sort = sub { my($files_ref) = @_; return sort { $files_ref->{$b} <=> $files_ref->{$a} } keys %$files_ref; };

# Plugins: Sort # Allow for the first encountered plugin::sort subroutine to override the # default built-in sort subroutine my $tmp; foreach my $plugin ( @plugins ) { $plugins{$plugin} > 0 and $plugin->can('sort') and defined($tmp = $plugin->sort()) and $sort = $tmp and last; }

$ne is the number of elements to be shown (pagination, sort of)

foreach my $path_file ( &$sort(\%f, \%others) ) { last if $ne <= 0 && $date !~ /\d/; use vars qw/ $path $fn /; ($path,$fn) = $path_file =~ m!^$datadir/(?:(.*)/)?(.*)\.$file_extension!;

# Only stories in the right hierarchy $path =~ /^$currentdir/ or $path_file eq "$datadir/$currentdir" or next;

# Prepend a slash for use in templates only if a path exists $path &&= "/$path";

# Date fiddling for by-{year,month,day} archive views use vars qw/ $dw $mo $mo_num $da $ti $yr $hr $min $hr12 $ampm /; ($dw,$mo,$mo_num,$da,$ti,$yr) = nice_date($files{"$path_file"}); ($hr,$min) = split /:/, $ti; ($hr12, $ampm) = $hr >= 12 ? ($hr - 12,'pm') : ($hr, 'am'); $hr12 =~ s/^0//; $hr12 == 0 and $hr12 = 12;

# Only stories from the right date my($path_info_yr,$path_info_mo_num, $path_info_da) = split /\//, $date; next if $path_info_yr && $yr != $path_info_yr; last if $path_info_yr && $yr < $path_info_yr; next if $path_info_mo_num && $mo ne $num2month[$path_info_mo_num]; next if $path_info_da && $da != $path_info_da; last if $path_info_da && $da < $path_info_da;

# Date my $date = (&$template($path,'date',$flavour));

# Plugins: Date foreach my $plugin ( @plugins ) { $plugins{$plugin} > 0 and $plugin->can ('date') and $entries = $plugin->date($currentdir, \$date, $files{$path_file}, $dw,$mo,$mo_num,$da,$ti,$yr) }

$date = &$interpolate($date);

$curdate ne $date and $curdate = $date and $output .= $date;

use vars qw/ $title $body $raw /; if (-f "$path_file" && $fh->open("< $path_file")) { chomp($title = <$fh>); chomp($body = join '', <$fh>); $fh->close; $raw = "$title\n$body"; } my $story = (&$template($path,'story',$flavour));

# Plugins: Story foreach my $plugin ( @plugins ) { $plugins{$plugin} > 0 and $plugin->can ('story') and $entries = $plugin->story($path, $fn, \$story, \$title, \$body) }

if ($content_type =~ m{\Wxml$}) { # Escape <, >, and &, and to produce valid RSS my %escape = ('<'=>'<', '>'=>'>', '&'=>'&', '"'=>'"'); my $escape_re = join '|' => keys %escape; $title =~ s/($escape_re)/$escape{$1}/g; $body =~ s/($escape_re)/$escape{$1}/g; }

$story = &$interpolate($story);

$output .= $story; $fh->close;

$ne--; }

# Foot my $foot = (&$template($currentdir,'foot',$flavour));

# Plugins: Foot foreach my $plugin ( @plugins ) { $plugins{$plugin} > 0 and $plugin->can ('foot') and $entries = $plugin->foot($currentdir, \$foot) }

$foot = &$interpolate($foot); $output .= $foot;

# Plugins: Last foreach my $plugin ( @plugins ) { $plugins{$plugin} > 0 and $plugin->can ('last') and $entries = $plugin->last() }

} # End skip

# Finally, add the header, if any and running dynamically $static_or_dynamic eq 'dynamic' and $header and $output = header($header) . $output;

$output; }

Auxiliary function for turning dates into components sub nice_date { my($unixtime) = @_;

my $c_time = ctime($unixtime); my($dw,$mo,$da,$ti,$yr) = ( $c_time =~ /(\w{3}) +(\w{3}) +(\d{1,2}) +(\d{2}:\d {2}):\d{2} +(\d{4})$/ ); $da = sprintf("%02d", $da); my $mo_num = $month2num{$mo};

return ($dw,$mo,$mo_num,$da,$ti,$yr); }

Default Templates

The are minimalistic, for the sake of having a functional self­contained program and include html and rss as default flavours. For each flavour, there is a content_type, a head, a story and a foot template. This corresponds to the classic (non­paginated) report scheme. Pagination (item count) is a concept dealt inside the generate routine here.

# Default HTML and RSS template bits __DATA__ html content_type text/html html head $blog_title $path_info_da $path_info_mo $path_info_yr

$blog_title
$path_info_da $path_info_mo $path_info_yr

html story

$title
$body

posted at: $ti | path: $path | permanent link to this entry

\n html date

$dw, $da $mo $yr

\n html foot

rss content_type text/xml rss head \n\n\n\n\n \n $blog_title $path_info_da $path_info_mo $path_info_yr\n $url\n $blog_description\n $blog_language\n rss story \n $title\n $url/$yr/$mo_num/$da#$fn\n $body\n \n rss date \n rss foot \n error content_type text/html error head

Error: I'm afraid this is the first I've heard of a "$flavour" flavoured Blosxom. Try dropping the "/+$flavour" bit from the end of the URL.\n\n error story

$title
$body #

\n error date

$dw, $da $mo $yr

\n error foot __END__ Appendix B: Sam Ruby's htaccess

RewriteEngine on

Deny from 66.6.223.190 Deny from 24.69.156.45 Deny from 205.252.49.146 Deny from 69.50.191.130 Deny from 66.230.165.42 Deny from 210.82.106.156 Deny from 80.132.73.105 Deny from 195.53.31.35 Deny from 194.158.202.4 Deny from 66.192.31.98 Deny from 24.169.156.4 Deny from 65.41.249.27 Deny from 217.159.201.131 Deny from 66.154.38.18

# RewriteCond %{REMOTE_ADDR} ^66\.57\.27\.65$ RewriteCond %{HTTP_USER_AGENT} 6\.00\.8169$ RewriteRule ^.* - [F]

RewriteCond %{HTTP_USER_AGENT} AdultGods RewriteRule ^.* - [F]

# RewriteRule ^.* %{HTTP_REFERER} [R,L] # RewriteRule ^.* http://www.xxx-database.com/zz/ [R,L] # RewriteRule ^.* busted/goaway [R,L]

# # Ensure that POST requests and requests with query strings are not # served from the cache. # RewriteCond %{QUERY_STRING} "!^$" [OR] RewriteCond %{REQUEST_METHOD} "POST" [NC] RewriteCond %{REQUEST_URI} "!^/mombo/\w+.cgi" RewriteRule (.*) /mombo/gateway.cgi/$1 [PT]

# # Redirect all missing files to the CGI script # RewriteCond %{REQUEST_FILENAME} !-s RewriteCond %{REQUEST_FILENAME}/index.html !-s RewriteRule (.*) /mombo/gateway.cgi/$1 [PT]

# # The following needs to be maintained in synch with the templates: # Add in the necessary headers when responses are served from the cache. #

Header set X-Pingback http://intertwingly.net/blog/pingback

AddType text/xml rss ffkar opml soap sn2 tb tbrss wsdl xss AddType text/plain txt esf AddType application/xml atom AddType application/xhtml+xml xhtml AddType application/rss+xml rss2 rss21 AddType application/rdf+xml rdf AddType application/x-netcdf cdf

AddDefaultCharset utf-8 AddCharset utf-8 atom rss ffkar soap tb tbrss wsdl xss html AddCharset utf-8 txt esf xhtml rss2 rss21 rdf cdf

# # Serve up XHTML with the proper mime type to browsers that will accept it # RewriteBase / RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml RewriteCond %{HTTP_ACCEPT} !application/xhtml\+xml\s*;\s*q=0 RewriteCond %{REQUEST_URI} \.html$ RewriteCond %{THE_REQUEST} HTTP/1\.1 RewriteRule .* - [T=application/xhtml+xml;charset=utf-8]

# # Serve up XHTML with the proper mime type to browsers that will accept it # RewriteEngine on # RewriteBase / RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml RewriteCond %{HTTP_ACCEPT} !application/xhtml\+xml\s*;\s*q=0 RewriteCond %{REQUEST_URI} !\. RewriteCond %{THE_REQUEST} HTTP/1\.1 RewriteRule .* - [T=application/xhtml+xml;charset=utf-8]

# # RFC 3229 support # RewriteCond %{REQUEST_URI} /blog/index.atom$ RewriteCond %{HTTP:A-IM} \bfeed\b RewriteCond /home/rubys/blog/history/index.%{HTTP:If-None-Match}.asis -s RewriteRule index.atom$ /blog/history/index.%{HTTP:If-None-Match}.asis # RewriteCond %{REQUEST_URI} /blog/comments.atom$ RewriteCond %{HTTP:A-IM} \bfeed\b RewriteCond /home/rubys/blog/history/comments.%{HTTP:If-None-Match}.asis -s RewriteRule comments.atom$ /blog/history/comments.%{HTTP:If-None-Match}.asis