| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
| <html> |
| <head> |
| <META http-equiv="Content-Type" content="text/html; charset=UTF-8"> |
| <title>Introducing Cocoon</title> |
| <link href="http://purl.org/DC/elements/1.0/" rel="schema.DC"> |
| <meta content="Stefano Mazzocchi" name="DC.Creator"> |
| </head> |
| <body> |
| |
| |
| <h1>The XML Hype</h1> |
| |
| |
| <p> |
| Everybody talks about XML. XML here, XML there. All application servers |
| support XML, everybody wants to do B2B using XML, web services using |
| XML, even databases using XML. |
| </p> |
| |
| |
| <p> |
| Should you care about it? Given the amount of hype, you can't afford to |
| go around ignoring XML, for that would be like ignoring the World Wide |
| Web 10 years ago: a clear mistake. But why is this so for XML? What is |
| this "magic" that XML seems to have in solving your problems? Isn't this |
| another hype to change once again the IT infrastructure that you spent |
| so much time implementing and fixing in the last few years? Isn't |
| another way to spill money out of your pockets? |
| </p> |
| |
| |
| <p> |
| If you ever asked yourself one of the above questions, this paper is for |
| you. You won't find singing-and-dancing marketing hype, you won't find |
| boring and useless feature lists, you won't find the usual acronym |
| bombing or those good looking vaporware schemas that connect your |
| databases to your coffee machines via CORBA or stuff like that. |
| </p> |
| |
| |
| <p> |
| This document will explain you what the Cocoon project is about and what we are |
| doing to solve the problems that we encountered in our web engineering |
| experiences, but from an executive perspective, yes, because we all had |
| the problems of managing a web site, dealing with our colleagues, rushing |
| to the graphical guru to have the little GIF with the new title, or |
| calling the web administrator at night because the database is returning |
| errors without reasons. |
| </p> |
| |
| |
| <p> |
| It was frustrating to see the best and most clever information |
| technology ever invented--the Web--ruined by the lack of engineering |
| practices, tortured by those "let's-reinvent-the-wheel-once-again" |
| craftsmen who were great at doing their jobs as individuals but |
| could not scale within teams, imposing a growth saturation to their projects. |
| </p> |
| |
| |
| <p> |
| There had to be a better way of doing things. |
| </p> |
| |
| |
| |
| |
| |
| <h1>Personal Experiences</h1> |
| |
| |
| <p> |
| In 1998, Stefano Mazzocchi volunteered to create the documentation infrastructure for |
| the java.apache.org project, which is composed of a bunch of different |
| codebases, maintained by a bunch of different people, with different |
| skills, different geographical locations and different degree of will |
| and time to dedicate to the documentation effort. |
| </p> |
| |
| |
| <p> |
| But pretty soon he realized that no matter how great and well designed the |
| system was, HTML was a problem: it was *not* designed for those kinds of |
| things. By looking at the main page (<a class="external" href="http://java.apache.org/">http://java.apache.org/</a>) from the |
| browser, you can clearly identify the areas of the screen: sidebar, |
| topbar, news, status. But if you viewed the underlying HTML, boom: a nightmare of |
| table tags and nesting and small little tricks to make the HTML appear |
| the same on every browser. |
| </p> |
| |
| |
| <p> |
| So he looked around for alternative technologies, but *all* of them were |
| trying to add more complexity at the GUI level (Microsoft Frontpage, |
| Macromedia Dreamweaver, Adobe GoLive, etc...) hoping to "hide" the |
| design problems of HTML under a thick layer of WYSIWYG looks. |
| </p> |
| |
| |
| <p> |
| What you see is what you get. |
| </p> |
| |
| |
| <p> |
| But what you see is all you've got. |
| </p> |
| |
| |
| <p> |
| How can you tell your web server to extract the information contained within the |
| sidebar? How can you tell it to find the news articles within a complex HTML page? |
| </p> |
| |
| |
| <p> |
| It's certainly easy for a human reader: just look at the page and you should have |
| no problem distinguishing between a sidebar, a banner, a news and a stock |
| quote. Why is it so hard for a machine? |
| </p> |
| |
| |
| |
| |
| <h1>The HTML Model</h1> |
| |
| |
| <p> |
| HTML is a language that tells your browser how to "draw" things on its |
| window. An image here, a letter there, a color down here. Nothing more. |
| The browser doesn't have the "higher level" notion of "sidebar": it |
| lacks the ability to perform "semantic analysis" of the HTML content. |
| </p> |
| |
| |
| <p> |
| Semantic analysis? Yeah, it's the kind of thing the human brain is |
| simply great at doing, while computer programs simply fail at big time. |
| </p> |
| |
| |
| <p> |
| So, with HTML, we went a step up and created a highly visual and |
| appealing web of HTML content, but we went two steps back by removing |
| all the higher level semantic information from the content itself. |
| </p> |
| |
| |
| <p> |
| Ok, let's make an example... most of you have seen an HTML |
| page... if not, here is an example: |
| </p> |
| |
| |
| <pre class="code"> |
| <html> |
| <body> |
| <p>Hi, I'm an HTML page</p> |
| <p align="center">Written by Stefano</p> |
| </body> |
| </html> |
| </pre> |
| |
| |
| <p> |
| which says to the browser: |
| </p> |
| |
| |
| <ul> |
| |
| <li>I'm a HTML page</li> |
| |
| <li>I have a body</li> |
| |
| <li>I have a paragraph</li> |
| |
| <li>I contain the sentence "Hi, I'm an HTML page."</li> |
| |
| <li>I contain the sentence "Written by Stefano"</li> |
| |
| </ul> |
| |
| |
| <p> |
| Suppose you are a Chinese guy that doesn't understand our alphabet, try |
| to answer the following question: |
| </p> |
| |
| |
| <p> |
| Who wrote the page? |
| </p> |
| |
| |
| <p> |
| You can't perform semantic analysis, you are as blind as a web browser. |
| The only thing you can do is draw it on the screen since this is what |
| you were programmed to do. In other words, your semantic capacity is |
| fixed to the drawing capabilities and a few other things (like linking), |
| thus limited. |
| </p> |
| |
| |
| |
| |
| <h1>Semantic Markup</h1> |
| |
| |
| <p> |
| Suppose you receive this page: |
| </p> |
| |
| |
| <pre class="code"> |
| <page> |
| <author>sflkjoiuer</author> |
| <content> |
| <para>sofikdjflksj</para> |
| </content> |
| </page> |
| </pre> |
| |
| |
| <p> |
| Can you now tell me who wrote the page? Easy, you say, "sflkjoiuer" did. Good, but later |
| you receive: |
| </p> |
| |
| |
| <pre class="code"> |
| <dlkj> |
| <ruijfl>sofikdjflksj</ruijfl> |
| <wijlkjf> |
| <oamkfkj>sflkjoiuer</oamkfkj> |
| </wijlkjf> |
| </dlkj> |
| </pre> |
| |
| |
| <p> |
| Now, who wrote the page? You could guess by comparing the structure, |
| but how do you know the two structures reflect the same semantic |
| information? |
| </p> |
| |
| |
| <p> |
| The above two pages are both XML documents. |
| </p> |
| |
| |
| <p> |
| Are they going to help you? Are they doing to simplify your work? Are |
| they going to simplify your problems? |
| </p> |
| |
| |
| <p> |
| At this point, clearly not, rather the opposite. |
| </p> |
| |
| |
| <p> |
| So, you could be wondering, why did we spend so much effort to |
| write an XML publishing framework? This document was written exactly |
| to clear your doubts on this, so let's keep going. |
| </p> |
| |
| |
| |
| |
| |
| <h1>The XML Language</h1> |
| |
| |
| <p> |
| XML is most of the times referred to as the "eXtensible Markup Language" |
| specification. A fairly small yet complex specification that indicates |
| how to write languages. It's a syntax. To tell you the truth, nothing fancy at all. So |
| </p> |
| |
| |
| <pre class="code"> |
| <hello></hello> |
| </pre> |
| |
| |
| <p> |
| is correct, while |
| </p> |
| |
| |
| <pre class="code"> |
| <hello></hi> |
| </pre> |
| |
| |
| <p> |
| is not, but |
| </p> |
| |
| |
| <pre class="code"> |
| <hello><hi/></hello> |
| </pre> |
| |
| |
| <p> |
| is correct. That's more than this, but I'll skip the technical details here. |
| </p> |
| |
| |
| <p> |
| XML is the ASCII for the new millenium, it's a step forward from ASCII |
| or UNICODE (the international extension to ASCII that includes all |
| characters from all modern languages). It defines a "lingua franca" for |
| textual languages. |
| </p> |
| |
| |
| <p> |
| Ok, great, so now instead of having one uniform language with visual |
| semantics (HTML) we have a babel of languages each with its own |
| semantics. How this can possibly help you? |
| </p> |
| |
| |
| |
| |
| <h1>XML Transformations</h1> |
| |
| |
| <p> |
| This was the point where Stefano was more or less two years ago for |
| java.apache.org: I could use XML and define my own semantics with |
| <sidebar>, <news>, <status> |
| and all that and I'm sure people would have |
| found those XML documents much easier to write (since the XML syntax is |
| very similar to the HTML one and very user friendly)... but I would have |
| moved from "all browsers" to "no browser". |
| </p> |
| |
| |
| <p> |
| And having documentation that nobody can browse is totally useless. |
| </p> |
| |
| |
| <p> |
| The turning point was the creation of the XSL specification which |
| included a way to "transform" an XML page into something else. (It's |
| more complex than this, but, again, I'll skip the technical details). |
| </p> |
| |
| |
| <p> |
| So now you have: |
| </p> |
| |
| |
| <pre class="code"> |
| XML page ---(transformation)--> HTML page |
| ^ |
| | |
| transformation rules |
| </pre> |
| |
| |
| <p> |
| that allows you to write your pages in XML, create your "graphics" as |
| transformation rules and generate HTML pages on the fly directly from your |
| web server. |
| </p> |
| |
| |
| <p> |
| Apache Cocoon 1.0 did exactly this. |
| </p> |
| |
| |
| |
| |
| <h1>The Model Evolves</h1> |
| |
| |
| <p> |
| If XML is a lingua franca, it means that XML software can work on almost |
| anything without caring about what it is. So, if a cell phone requests |
| the page, Cocoon just has to change transformation rules and send the |
| WAP page to the phone. Or, if you want a nice PDF to printout your |
| monthly report, you change the transformation rules and Cocoon creates |
| the PDF for you, or the VRML, or the VoiceML, or your own proprietary |
| B2B markup. |
| </p> |
| |
| |
| <p> |
| Anything without changing the basic architecture that is simply based on |
| the simple "angle bracket" XML syntax. |
| </p> |
| |
| |
| |
| |
| <h1>Separation of Concerns (SoC)</h1> |
| |
| |
| <p> |
| Cocoon was not the first product to perform server side XML |
| transformations, nor will be the last one (in a few years, these |
| solutions will be the rule rather than the exception). So, what is the |
| "plus" that the Cocoon project adds? |
| </p> |
| |
| |
| <p> |
| We believe the single most important Cocoon innovation is SoC-based design. |
| </p> |
| |
| |
| <p> |
| SoC is something that you've always been aware of: not everybody is |
| equal, not everybody performs the same job with the same ability. |
| </p> |
| |
| |
| <p> |
| It can be observed that separating people with common skills in |
| different working groups increases productivity and reduces management |
| costs, but only if the groups do not overlap and have clear "contracts" |
| that define their operability and their concerns. |
| </p> |
| |
| |
| <p> |
| For a web publishing system, the Cocoon project uses what we call the |
| <em>pyramid of contracts</em> which outlines four major concern areas and five |
| contracts between them. Here is the picture: |
| </p> |
| |
| |
| <div align="center"> |
| <img class="figure" alt="The Cocoon Pyramid Model of Contracts" src="images/pyramid-model.gif" height="159" width="313"></div> |
| |
| |
| <p> |
| Cocoon is <em>engineered</em> to provide you a way to isolate these four |
| concern areas using just those 5 contracts, removing the contract |
| between style and logic that has been bugging web site development since |
| the beginning of the Web. |
| </p> |
| |
| |
| <p> |
| Why? because programmers and graphic people have very different skills |
| and work habits... so, instead of creating GUIs to hide the things that |
| can be harmful (like graphic to programmers or logic to designers), |
| Cocoon allows you to separate the things into different files, allowing |
| you to "seal" your working groups into separate virtual rooms connected |
| with the other rooms only by those "pipes" (the contracts), that you |
| give them from the management area. |
| </p> |
| |
| |
| <p> |
| Let's have an example: |
| </p> |
| |
| |
| <pre class="code"> |
| <page> |
| <content> |
| <para>Today is <dynamic:today/></para> |
| </content> |
| </page> |
| </pre> |
| |
| |
| <p> |
| is written by the content writers and you give them the |
| "contract" that states that the tag |
| <dynamic:today/> prints out the time of the day |
| when included in the page. Content writers don't care (nor |
| should) about what language has been used for that, nor they |
| can mess up with the programming logic that generates the |
| content since it's stored in another part of the system they |
| don't have access to. |
| </p> |
| |
| |
| <p> |
| So <dynamic:today/> is the "logic - content" contract. |
| </p> |
| |
| |
| <p> |
| At the same time, the structure of the page is given as a contract to |
| the graphic designers who have to come up with the transformation rules |
| that transform this structure in a language that the browser can |
| understand (HTML, for example). |
| </p> |
| |
| |
| <p> |
| So, the page structure is the "content - style" contract. |
| </p> |
| |
| |
| <p> |
| As long as these contracts don't change, the three areas can work in a |
| completely parallel way without overwhelming the human resources used to |
| manage them: costs decrease because time to market is reduced and |
| maintenance costs is decreased because errors do not propagate out of |
| the concern areas. |
| </p> |
| |
| |
| <p> |
| For example, you can tell your designers to come up with a "Xmas look" |
| for your web site, without even telling the other people: just switch to |
| the Xmas transformation rules on Xmas morning and you're done.... just |
| imagine how painful it would be to do this on your web site today. |
| </p> |
| |
| |
| <p> |
| With the Cocoon architecture all this is a couple of line changes away. |
| </p> |
| |
| |
| |
| |
| <h1>Here we go</h1> |
| |
| |
| <p> |
| If you've reached this far in my text, you should be able to grasp the |
| value of the Cocoon Project as well as distinguish most of the marketing |
| hype that surrounds XML and friends. |
| </p> |
| |
| |
| <p> |
| Just like you shouldn't care if somebody offers you software that is |
| "ASCII compliant" or "ASCII based", you shouldn't care about "XML |
| compliant" or "XML based": it doesn't mean anything. |
| </p> |
| |
| |
| <p> |
| Cocoon uses XML as a core piece of its framework, but improves the model |
| to give you the tools you need and is designed to be flexible enough to |
| follow your current needs as well as paradigm shifts that may happen in the |
| future. |
| </p> |
| |
| |
| |
| |
| </body> |
| </html> |