blob: e099c58b5c89df956c6a9d0551fa978ffb5d9178 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE preface PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
<!--
====================================================================
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
====================================================================
-->
<chapter id="fundamentals">
<title>Fundamentals</title>
<section>
<title>HTTP messages</title>
<section>
<title>Structure</title>
<para>
A HTTP message consists of a header and an optional body. The message header of an HTTP
request consists of a request line and a collection of header fields. The message header
of an HTTP response consists of a status line and a collection of header fields. All
HTTP messages must include the protocol version. Some HTTP messages can optionally
enclose a content body.
</para>
<para>
HttpCore defines the HTTP message object model to follow this definition closely, and
provides extensive support for serialization (formatting) and deserialization
(parsing) of HTTP message elements.
</para>
</section>
<section>
<title>Basic operations</title>
<section>
<title>HTTP request message</title>
<para>
HTTP request is a message sent from the client to the server. The first line of
that message includes the method to apply to the resource, the identifier of
the resource, and the protocol version in use.
</para>
<programlisting><![CDATA[
HttpRequest request = new BasicHttpRequest("GET", "/",
HttpVersion.HTTP_1_1);
System.out.println(request.getRequestLine().getMethod());
System.out.println(request.getRequestLine().getUri());
System.out.println(request.getProtocolVersion());
System.out.println(request.getRequestLine().toString());
]]></programlisting>
<para>stdout &gt;</para>
<programlisting><![CDATA[
GET
/
HTTP/1.1
GET / HTTP/1.1
]]></programlisting>
</section>
<section>
<title>HTTP response message</title>
<para>
HTTP response is a message sent by the server back to the client after having
received and interpreted a request message. The first line of that message
consists of the protocol version followed by a numeric status code and its
associated textual phrase.
</para>
<programlisting><![CDATA[
HttpResponse response = new BasicHttpResponse(HttpVersion.HTTP_1_1,
HttpStatus.SC_OK, "OK");
System.out.println(response.getProtocolVersion());
System.out.println(response.getStatusLine().getStatusCode());
System.out.println(response.getStatusLine().getReasonPhrase());
System.out.println(response.getStatusLine().toString());
]]></programlisting>
<para>stdout &gt;</para>
<programlisting><![CDATA[
HTTP/1.1
200
OK
HTTP/1.1 200 OK
]]></programlisting>
</section>
<section>
<title>HTTP message common properties and methods</title>
<para>
An HTTP message can contain a number of headers describing properties of the
message such as the content length, content type, and so on. HttpCore provides
methods to retrieve, add, remove, and enumerate such headers.
</para>
<programlisting><![CDATA[
HttpResponse response = new BasicHttpResponse(HttpVersion.HTTP_1_1,
HttpStatus.SC_OK, "OK");
response.addHeader("Set-Cookie",
"c1=a; path=/; domain=localhost");
response.addHeader("Set-Cookie",
"c2=b; path=\"/\", c3=c; domain=\"localhost\"");
Header h1 = response.getFirstHeader("Set-Cookie");
System.out.println(h1);
Header h2 = response.getLastHeader("Set-Cookie");
System.out.println(h2);
Header[] hs = response.getHeaders("Set-Cookie");
System.out.println(hs.length);
]]></programlisting>
<para>stdout &gt;</para>
<programlisting><![CDATA[
Set-Cookie: c1=a; path=/; domain=localhost
Set-Cookie: c2=b; path="/", c3=c; domain="localhost"
2
]]></programlisting>
<para>
There is an efficient way to obtain all headers of a given type using the
<interfacename>HeaderIterator</interfacename> interface.
</para>
<programlisting><![CDATA[
HttpResponse response = new BasicHttpResponse(HttpVersion.HTTP_1_1,
HttpStatus.SC_OK, "OK");
response.addHeader("Set-Cookie",
"c1=a; path=/; domain=localhost");
response.addHeader("Set-Cookie",
"c2=b; path=\"/\", c3=c; domain=\"localhost\"");
HeaderIterator it = response.headerIterator("Set-Cookie");
while (it.hasNext()) {
System.out.println(it.next());
}
]]></programlisting>
<para>stdout &gt;</para>
<programlisting><![CDATA[
Set-Cookie: c1=a; path=/; domain=localhost
Set-Cookie: c2=b; path="/", c3=c; domain="localhost"
]]></programlisting>
<para>
It also provides convenience methods to parse HTTP messages into individual
header elements.
</para>
<programlisting><![CDATA[
HttpResponse response = new BasicHttpResponse(HttpVersion.HTTP_1_1,
HttpStatus.SC_OK, "OK");
response.addHeader("Set-Cookie",
"c1=a; path=/; domain=localhost");
response.addHeader("Set-Cookie",
"c2=b; path=\"/\", c3=c; domain=\"localhost\"");
HeaderElementIterator it = new BasicHeaderElementIterator(
response.headerIterator("Set-Cookie"));
while (it.hasNext()) {
HeaderElement elem = it.nextElement();
System.out.println(elem.getName() + " = " + elem.getValue());
NameValuePair[] params = elem.getParameters();
for (int i = 0; i < params.length; i++) {
System.out.println(" " + params[i]);
}
}
]]></programlisting>
<para>stdout &gt;</para>
<programlisting><![CDATA[
c1 = a
path=/
domain=localhost
c2 = b
path=/
c3 = c
domain=localhost
]]></programlisting>
<para>
HTTP headers are tokenized into individual header elements only on demand. HTTP
headers received over an HTTP connection are stored internally as an array of
characters and parsed lazily only when you access their properties.
</para>
</section>
</section>
<section>
<title>HTTP entity</title>
<para>
HTTP messages can carry a content entity associated with the request or response.
Entities can be found in some requests and in some responses, as they are optional.
Requests that use entities are referred to as entity-enclosing requests. The HTTP
specification defines two entity-enclosing methods: POST and PUT. Responses are
usually expected to enclose a content entity. There are exceptions to this rule such
as responses to HEAD method and 204 No Content, 304 Not Modified, 205 Reset Content
responses.
</para>
<para>
HttpCore distinguishes three kinds of entities, depending on where their content
originates:
</para>
<itemizedlist>
<listitem>
<formalpara>
<title>streamed:</title>
<para>
The content is received from a stream, or generated on the fly. In particular,
this category includes entities being received from a connection. Streamed
entities are generally not repeatable.
</para>
</formalpara>
</listitem>
<listitem>
<formalpara>
<title>self-contained:</title>
<para>
The content is in memory or obtained by means that are independent from
a connection or other entity. Self-contained entities are generally repeatable.
</para>
</formalpara>
</listitem>
<listitem>
<formalpara>
<title>wrapping:</title>
<para>
The content is obtained from another entity.
</para>
</formalpara>
</listitem>
</itemizedlist>
<section>
<title>Repeatable entities</title>
<para>
An entity can be repeatable, meaning its content can be read more than once. This
is only possible with self-contained entities (like
<classname>ByteArrayEntity</classname> or <classname>StringEntity</classname>).
</para>
</section>
<section>
<title>Using HTTP entities</title>
<para>
Since an entity can represent both binary and character content, it has support
for character encodings (to support the latter, i.e. character content).
</para>
<para>
The entity is created when executing a request with enclosed content or when the
request was successful and the response body is used to send the result back to
the client.
</para>
<para>
To read the content from the entity, one can either retrieve the input stream via
the <methodname>HttpEntity#getContent()</methodname> method, which returns an
<classname>java.io.InputStream</classname>, or one can supply an output stream to
the <methodname>HttpEntity#writeTo(OutputStream)</methodname> method, which will
return once all content has been written to the given stream. Please note that
some non-streaming (self-contained) entities may be unable to represent their
content as a <classname>java.io.InputStream</classname> efficiently. It is legal
for such entities to implement <methodname>HttpEntity#writeTo(OutputStream)
</methodname> method only and to throw <classname>UnsupportedOperationException
</classname> from <methodname>HttpEntity#getContent()</methodname> method.
</para>
<para>
The <classname>EntityUtils</classname> class exposes several static methods to simplify
extracting the content or information from an entity. Instead of reading
the <classname>java.io.InputStream</classname> directly, one can retrieve the complete
content body in a string or byte array by using the methods from this class.
</para>
<para>
When the entity has been received with an incoming message, the methods
<methodname>HttpEntity#getContentType()</methodname> and
<methodname>HttpEntity#getContentLength()</methodname> methods can be used for
reading the common metadata such as <literal>Content-Type</literal> and
<literal>Content-Length</literal> headers (if they are available). Since the
<literal>Content-Type</literal> header can contain a character encoding for text
mime-types like <literal>text/plain</literal> or <literal>text/html</literal>,
the <methodname>HttpEntity#getContentEncoding()</methodname> method is used to
read this information. If the headers aren't available, a length of -1 will be
returned, and <literal>NULL</literal> for the content type. If the
<literal>Content-Type</literal> header is available, a Header object will be
returned.
</para>
<para>
When creating an entity for a outgoing message, this meta data has to be supplied
by the creator of the entity.
</para>
<programlisting><![CDATA[
StringEntity myEntity = new StringEntity("important message",
Consts.UTF_8);
System.out.println(myEntity.getContentType());
System.out.println(myEntity.getContentLength());
System.out.println(EntityUtils.toString(myEntity));
System.out.println(EntityUtils.toByteArray(myEntity).length);
]]></programlisting>
<para>stdout &gt;</para>
<programlisting><![CDATA[
Content-Type: text/plain; charset=UTF-8
17
important message
17
]]></programlisting>
</section>
<section>
<title>Ensuring release of system resources</title>
<para>
In order to ensure proper release of system resources one must close the content
stream associated with the entity.
</para>
<programlisting><![CDATA[
HttpResponse response;
HttpEntity entity = response.getEntity();
if (entity != null) {
InputStream instream = entity.getContent();
try {
// do something useful
} finally {
instream.close();
}
}
]]></programlisting>
<para>
When working with streaming entities, one can use the
<methodname>EntityUtils#consume(HttpEntity)</methodname> method to ensure that
the entity content has been fully consumed and the underlying stream has been
closed.
</para>
</section>
</section>
<section>
<title>Creating entities</title>
<para>
There are a few ways to create entities. HttpCore provides the following implementations:
</para>
<itemizedlist>
<listitem>
<para>
<link linkend="basic-entity">
<classname>BasicHttpEntity</classname>
</link>
</para>
</listitem>
<listitem>
<para>
<link linkend="byte-array-entity">
<classname>ByteArrayEntity</classname>
</link>
</para>
</listitem>
<listitem>
<para>
<link linkend="string-entity">
<classname>StringEntity</classname>
</link>
</para>
</listitem>
<listitem>
<para>
<link linkend="input-stream-entity">
<classname>InputStreamEntity</classname>
</link>
</para>
</listitem>
<listitem>
<para>
<link linkend="file-entity">
<classname>FileEntity</classname>
</link>
</para>
</listitem>
<listitem>
<para>
<link linkend="entity-template">
<classname>EntityTemplate</classname>
</link>
</para>
</listitem>
<listitem>
<para>
<link linkend="entity-wrapper">
<classname>HttpEntityWrapper</classname>
</link>
</para>
</listitem>
<listitem>
<para>
<link linkend="buffered-entity">
<classname>BufferedHttpEntity</classname>
</link>
</para>
</listitem>
</itemizedlist>
<section id="basic-entity">
<title><classname>BasicHttpEntity</classname></title>
<para>
Exactly as the name implies, this basic entity represents an underlying stream.
In general, use this class for entities received from HTTP messages.
</para>
<para>
This entity has an empty constructor. After construction, it represents no content,
and has a negative content length.
</para>
<para>
One needs to set the content stream, and optionally the length. This can be done
with the <methodname>BasicHttpEntity#setContent(InputStream)</methodname> and
<methodname>BasicHttpEntity#setContentLength(long)</methodname> methods
respectively.
</para>
<programlisting><![CDATA[
BasicHttpEntity myEntity = new BasicHttpEntity();
myEntity.setContent(someInputStream);
myEntity.setContentLength(340); // sets the length to 340
]]></programlisting>
</section>
<section id="byte-array-entity">
<title><classname>ByteArrayEntity</classname></title>
<para>
<classname>ByteArrayEntity</classname> is a self-contained, repeatable entity
that obtains its content from a given byte array. Supply the byte array to the
constructor.
</para>
<programlisting><![CDATA[
ByteArrayEntity myEntity = new ByteArrayEntity(new byte[] {1,2,3},
ContentType.APPLICATION_OCTET_STREAM);
]]></programlisting>
</section>
<section id="string-entity">
<title><classname>StringEntity</classname></title>
<para>
<classname>StringEntity</classname> is a self-contained, repeatable entity that
obtains its content from a <classname>java.lang.String</classname> object. It has
three constructors, one simply constructs with a given <classname>java.lang.String
</classname> object; the second also takes a character encoding for the data in the
string; the third allows the mime type to be specified.
</para>
<programlisting><![CDATA[
StringBuilder sb = new StringBuilder();
Map<String, String> env = System.getenv();
for (Map.Entry<String, String> envEntry : env.entrySet()) {
sb.append(envEntry.getKey())
.append(": ").append(envEntry.getValue())
.append("\r\n");
}
// construct without a character encoding (defaults to ISO-8859-1)
HttpEntity myEntity1 = new StringEntity(sb.toString());
// alternatively construct with an encoding (mime type defaults to "text/plain")
HttpEntity myEntity2 = new StringEntity(sb.toString(), Consts.UTF_8);
// alternatively construct with an encoding and a mime type
HttpEntity myEntity3 = new StringEntity(sb.toString(),
ContentType.create("text/plain", Consts.UTF_8));
]]></programlisting>
</section>
<section id="input-stream-entity">
<title><classname>InputStreamEntity</classname></title>
<para>
<classname>InputStreamEntity</classname> is a streamed, non-repeatable entity that
obtains its content from an input stream. Construct it by supplying the input
stream and the content length. Use the content length to limit the amount of data
read from the <classname>java.io.InputStream</classname>. If the length matches
the content length available on the input stream, then all data will be sent.
Alternatively, a negative content length will read all data from the input stream,
which is the same as supplying the exact content length, so use the length to limit
the amount of data to read.
</para>
<programlisting><![CDATA[
InputStream instream = getSomeInputStream();
InputStreamEntity myEntity = new InputStreamEntity(instream, 16);
]]></programlisting>
</section>
<section id="file-entity">
<title><classname>FileEntity</classname></title>
<para>
<classname>FileEntity</classname> is a self-contained, repeatable entity that
obtains its content from a file. Use this mostly to stream large files of different
types, where you need to supply the content type of the file, for
instance, sending a zip file would require the content type <literal>
application/zip</literal>, for XML <literal>application/xml</literal>.
</para>
<programlisting><![CDATA[
HttpEntity entity = new FileEntity(staticFile,
ContentType.create("application/java-archive"));
]]></programlisting>
</section>
<section id="entity-wrapper">
<title><classname>HttpEntityWrapper</classname></title>
<para>
This is the base class for creating wrapped entities. The wrapping entity holds
a reference to a wrapped entity and delegates all calls to it. Implementations
of wrapping entities can derive from this class and need to override only those
methods that should not be delegated to the wrapped entity.
</para>
</section>
<section id="buffered-entity">
<title><classname>BufferedHttpEntity</classname></title>
<para>
<classname>BufferedHttpEntity</classname> is a subclass of <classname>
HttpEntityWrapper</classname>. Construct it by supplying another entity. It
reads the content from the supplied entity, and buffers it in memory.
</para>
<para>
This makes it possible to make a repeatable entity, from a non-repeatable entity.
If the supplied entity is already repeatable, it simply passes calls through to
the underlying entity.
</para>
<programlisting><![CDATA[
myNonRepeatableEntity.setContent(someInputStream);
BufferedHttpEntity myBufferedEntity = new BufferedHttpEntity(
myNonRepeatableEntity);
]]></programlisting>
</section>
</section>
</section>
<section>
<title>HTTP protocol processors</title>
<para>
HTTP protocol interceptor is a routine that implements a specific aspect of the HTTP
protocol. Usually protocol interceptors are expected to act upon one specific header or a
group of related headers of the incoming message or populate the outgoing message with one
specific header or a group of related headers. Protocol interceptors can also manipulate
content entities enclosed with messages; transparent content compression / decompression
being a good example. Usually this is accomplished by using the 'Decorator' pattern where
a wrapper entity class is used to decorate the original entity. Several protocol
interceptors can be combined to form one logical unit.
</para>
<para>
HTTP protocol processor is a collection of protocol interceptors that implements the
'Chain of Responsibility' pattern, where each individual protocol interceptor is expected
to work on the particular aspect of the HTTP protocol it is responsible for.
</para>
<para>
Usually the order in which interceptors are executed should not matter as long as they do
not depend on a particular state of the execution context. If protocol interceptors have
interdependencies and therefore must be executed in a particular order, they should be
added to the protocol processor in the same sequence as their expected execution order.
</para>
<para>
Protocol interceptors must be implemented as thread-safe. Similarly to servlets, protocol
interceptors should not use instance variables unless access to those variables is
synchronized.
</para>
<section>
<title>Standard protocol interceptors</title>
<para>
HttpCore comes with a number of most essential protocol interceptors for client and
server HTTP processing.
</para>
<section>
<title><classname>RequestContent</classname></title>
<para>
<classname>RequestContent</classname> is the most important interceptor for
outgoing requests. It is responsible for delimiting content length by adding
the <literal>Content-Length</literal> or <literal>Transfer-Content</literal> headers
based on the properties of the enclosed entity and the protocol version. This
interceptor is required for correct functioning of client side protocol processors.
</para>
</section>
<section>
<title><classname>ResponseContent</classname></title>
<para>
<classname>ResponseContent</classname> is the most important interceptor for
outgoing responses. It is responsible for delimiting content length by adding
<literal>Content-Length</literal> or <literal>Transfer-Content</literal> headers
based on the properties of the enclosed entity and the protocol version. This
interceptor is required for correct functioning of server side protocol processors.
</para>
</section>
<section>
<title><classname>RequestConnControl</classname></title>
<para>
<classname>RequestConnControl</classname> is responsible for adding the
<literal>Connection</literal> header to the outgoing requests, which is essential
for managing persistence of <literal>HTTP/1.0</literal> connections. This
interceptor is recommended for client side protocol processors.
</para>
</section>
<section>
<title><classname>ResponseConnControl</classname></title>
<para>
<classname>ResponseConnControl</classname> is responsible for adding
the <literal>Connection</literal> header to the outgoing responses, which is essential
for managing persistence of <literal>HTTP/1.0</literal> connections. This
interceptor is recommended for server side protocol processors.
</para>
</section>
<section>
<title><classname>RequestDate</classname></title>
<para>
<classname>RequestDate</classname> is responsible for adding the
<literal>Date</literal> header to the outgoing requests. This interceptor is
optional for client side protocol processors.
</para>
</section>
<section>
<title><classname>ResponseDate</classname></title>
<para>
<classname>ResponseDate</classname> is responsible for adding the
<literal>Date</literal> header to the outgoing responses. This interceptor is
recommended for server side protocol processors.
</para>
</section>
<section>
<title><classname>RequestExpectContinue</classname></title>
<para>
<classname>RequestExpectContinue</classname> is responsible for enabling the
'expect-continue' handshake by adding the <literal>Expect</literal> header. This
interceptor is recommended for client side protocol processors.
</para>
</section>
<section>
<title><classname>RequestTargetHost</classname></title>
<para>
<classname>RequestTargetHost</classname> is responsible for adding the
<literal>Host</literal> header. This interceptor is required for client side
protocol processors.
</para>
</section>
<section>
<title><classname>RequestUserAgent</classname></title>
<para>
<classname>RequestUserAgent</classname> is responsible for adding the
<literal>User-Agent</literal> header. This interceptor is recommended for client
side protocol processors.
</para>
</section>
<section>
<title><classname>ResponseServer</classname></title>
<para>
<classname>ResponseServer</classname> is responsible for adding the
<literal>Server</literal> header. This interceptor is recommended for server side
protocol processors.
</para>
</section>
</section>
<section>
<title>Working with protocol processors</title>
<para>
Usually HTTP protocol processors are used to pre-process incoming messages prior to
executing application specific processing logic and to post-process outgoing messages.
</para>
<programlisting><![CDATA[
HttpProcessor httpproc = HttpProcessorBuilder.create()
// Required protocol interceptors
.add(new RequestContent())
.add(new RequestTargetHost())
// Recommended protocol interceptors
.add(new RequestConnControl())
.add(new RequestUserAgent("MyAgent-HTTP/1.1"))
// Optional protocol interceptors
.add(new RequestExpectContinue(true))
.build();
HttpCoreContext context = HttpCoreContext.create();
HttpRequest request = new BasicHttpRequest("GET", "/");
httpproc.process(request, context);
]]></programlisting>
<para>
Send the request to the target host and get a response.
</para>
<programlisting><![CDATA[
HttpResponse = <...>
httpproc.process(response, context);
]]></programlisting>
<para>
Please note the <classname>BasicHttpProcessor</classname> class does not synchronize
access to its internal structures and therefore may not be thread-safe.
</para>
</section>
</section>
<section>
<title>HTTP execution context</title>
<para>Originally HTTP has been designed as a stateless, response-request oriented protocol.
However, real world applications often need to be able to persist state information
through several logically related request-response exchanges. In order to enable
applications to maintain a processing state HttpCpre allows HTTP messages to be
executed within a particular execution context, referred to as HTTP context. Multiple
logically related messages can participate in a logical session if the same context is
reused between consecutive requests. HTTP context functions similarly to
a <interfacename>java.util.Map&lt;String, Object&gt;</interfacename>. It is
simply a collection of logically related named values.</para>
<para>Please nore <interfacename>HttpContext</interfacename> can contain arbitrary objects
and therefore may be unsafe to share between multiple threads. Care must be taken
to ensure that <interfacename>HttpContext</interfacename> instances can be
accessed by one thread at a time.</para>
<section>
<title>Context sharing</title>
<para>
Protocol interceptors can collaborate by sharing information - such as a processing
state - through an HTTP execution context. HTTP context is a structure that can be
used to map an attribute name to an attribute value. Internally HTTP context
implementations are usually backed by a <classname>HashMap</classname>. The primary
purpose of the HTTP context is to facilitate information sharing among various
logically related components. HTTP context can be used to store a processing state for
one message or several consecutive messages. Multiple logically related messages can
participate in a logical session if the same context is reused between consecutive
messages.
</para>
<programlisting><![CDATA[
HttpProcessor httpproc = HttpProcessorBuilder.create()
.add(new HttpRequestInterceptor() {
public void process(
HttpRequest request,
HttpContext context) throws HttpException, IOException {
String id = (String) context.getAttribute("session-id");
if (id != null) {
request.addHeader("Session-ID", id);
}
}
})
.build();
HttpCoreContext context = HttpCoreContext.create();
HttpRequest request = new BasicHttpRequest("GET", "/");
httpproc.process(request, context);
]]></programlisting>
</section>
</section>
</chapter>