| ------ |
| Guide to Advanced HTTP Wagon Configuration |
| ------ |
| John Casey |
| ------ |
| 22 June 2009 |
| ------ |
| |
| Advanced Configuration of the HttpClient HTTP Wagon |
| |
| *Notice on Maven Versioning and Availability |
| |
| **Maven 2.2.0 |
| |
| Starting in <<Maven 2.2.0>>, the HttpClient wagon was the implementation in use. The remainder of this document |
| deals specifically with the differences between the HttpClient- and Sun-based HTTP wagons. |
| |
| **Maven 2.2.1 |
| |
| Due to several critical issues introduced by the HttpClient-based HTTP wagon, <<Maven 2.2.1>> reverted back to |
| the Sun implementation (a.k.a. 'lightweight') of the HTTP wagon as the default for HTTP/HTTPS transfers. |
| The issues with the HttpClient-based wagon were mainly related to checksums, transfer timeouts, and NTLM proxies, |
| and served as the primary cause for the release of <<2.2.1>> in the first place. |
| |
| <<However>>, starting in <<Maven 2.2.1>> you have a choice: you can use the default wagon implementation for a given |
| protocol, or you can select an alternative wagon <<<provider>>> you'd like to use on a per-protocol basis. |
| For more information, see the {{{./guide-wagon-providers.html}Guide to Wagon Providers}} \[3\]. |
| |
| *Introduction |
| |
| Using the HttpClient-based HTTP wagon, you have a lot more control over the |
| configuration used to access HTTP-based Maven repositories. For starters, you have fine-grained control over what HTTP headers |
| are used when resolving artifacts. In addition, you can also configure a wide range of parameters to control the behavior |
| of HttpClient itself. Best of all, you have the ability to control these headers and parameters for all requests, or individual |
| request types (Maven issues GET, HEAD, and PUT requests for different parts of the artifact-management subsystem). |
| |
| *The Basics |
| |
| Without any special configuration, Maven's HTTP wagon will use some default HTTP headers and client parameters when managing |
| artifacts. The default headers are: |
| |
| +---+ |
| Cache-control: no-cache |
| Cache-store: no-store |
| Pragma: no-cache |
| Expires: 0 |
| Accept-Encoding: gzip |
| +---+ |
| |
| In addition, PUT requests made with the HTTP wagon will use the following HttpClient parameter: |
| |
| +---+ |
| http.protocol.expect-continue=true |
| +---+ |
| |
| From the HttpClient documentation\[2\], this parameter provides the following functionality: |
| |
| ----- |
| Activates 'Expect: 100-Continue' handshake for the entity enclosing methods. |
| The 'Expect: 100-Continue' handshake allows a client that is sending a request |
| message with a request body to determine if the origin server is willing to |
| accept the request (based on the request headers) before the client sends the |
| request body. |
| |
| The use of the 'Expect: 100-continue' handshake can result in noticeable performance |
| improvement for entity enclosing requests (such as POST and PUT) that require |
| the target server's authentication. |
| |
| 'Expect: 100-continue' handshake should be used with caution, as it may cause |
| problems with HTTP servers and proxies that do not support HTTP/1.1 protocol. |
| ----- |
| |
| Without this setting, PUT requests that require authentication will transfer their entire payload to the server before that server |
| issues an authentication challenge. In order to complete the PUT request, the client must then re-send the payload with the proper |
| credentials specified in the HTTP headers. This results in twice the bandwidth usage, and twice the time to transfer each artifact. |
| |
| Another option to avoid this double transfer is what's known as preemptive authentication, which involves sending the authentication |
| headers along with the original PUT request. However, there are a few potential issues with this approach. For one thing, in the event |
| you have an unused <<<\<server\>>>> entry that specifies an invalid username/password combination, some servers may respond with |
| a <<<401 Unauthorized>>> even if the server doesn't actually require any authentication for the request. In addition, blindly sending |
| authentication credentials with every request regardless of whether the server has made a challenge can result in a security hole, |
| since the server may not make provisions to secure credentials for paths that don't require authentication. |
| |
| We'll discuss preemptive authentication in another example, below. |
| |
| *Configuring GET, HEAD, PUT, or All of the Above |
| |
| In all of the examples below, it's important to understand that we can configure the HTTP settings for all requests made to a given |
| server, or for only one method. To configure all methods for a server, you'd use the following section of the <<<settings.xml>>> file: |
| |
| +---+ |
| <settings> |
| [...] |
| <servers> |
| <server> |
| <id>the-server</id> |
| <configuration> |
| <httpConfiguration> |
| <all> |
| [ Your configuration here. ] |
| </all> |
| </httpConfiguration> |
| </configuration> |
| </server> |
| </servers> |
| </settings> |
| +---+ |
| |
| On the other hand, if you can live with the default configuration for most requests - say, HEAD and GET requests, which are used to |
| check for the existence of a file and retrieve a file respectively - maybe you only need to configure the PUT method: |
| |
| +---+ |
| <settings> |
| [...] |
| <servers> |
| <server> |
| <id>the-server</id> |
| <configuration> |
| <httpConfiguration> |
| <put> |
| [ Your configuration here. ] |
| </put> |
| </httpConfiguration> |
| </configuration> |
| </server> |
| </servers> |
| </settings> |
| +---+ |
| |
| For clarity, the other two sections are <<<\<get\>>>> for GET requests, and <<<\<head\>>>> for HEAD requests. I know that's going to |
| be hard to remember... |
| |
| *Taking Control of Your HTTP Headers |
| |
| As you may have noticed above, the default HTTP headers do have the potential to cause problems. For instance, some websites |
| set the encoding for downloading GZipped files as <<<gzip>>>, in spite of the fact that the HTTP request itself isn't being |
| sent using GZip compression. If the client is using the <<<Accept-Encoding: gzip>>> header, this can result in the client itself |
| decompressing the GZipped file <during the transfer> and writing the decompressed file to the local disk with the original filename. |
| This can be misleading to say the least, and can use up an inordinate amount of disk space on the local computer. |
| |
| To turn off this default behavior, we'll simply disable the default headers. Then, we'll need to respecify the other headers |
| that we are still interested in, like this: |
| |
| +---+ |
| <settings> |
| [...] |
| <servers> |
| <server> |
| <id>openssl</id> |
| <configuration> |
| <httpConfiguration> |
| <put> |
| <useDefaultHeaders>false</useDefaultHeaders> |
| <headers> |
| <header> |
| <name>Cache-control</name> |
| <value>no-cache</value> |
| </header> |
| <header> |
| <name>Cache-store</name> |
| <value>no-store</value> |
| </header> |
| <header> |
| <name>Pragma</name> |
| <value>no-cache</value> |
| </header> |
| <header> |
| <name>Expires</name> |
| <value>0</value> |
| </header> |
| <header> |
| <name>Accept-Encoding</name> |
| <value>*</value> |
| </header> |
| </headers> |
| </put> |
| </httpConfiguration> |
| </configuration> |
| </server> |
| [...] |
| </servers> |
| [...] |
| </settings> |
| +---+ |
| |
| *Fine-Tuning HttpClient Parameters |
| |
| Going beyond the power of HTTP request parameters, HttpClient provides a host of other configuration options. In most cases, |
| you won't need to customize these. But in case you do, Maven provides access to specify your own fine-grained configuration |
| for HttpClient. Again, you can specify these parameter customizations per-method (HEAD, GET, or PUT), or for all methods of |
| interacting with a given server. For a complete list of supported parameters, see the link\[2\] in Resources section below. |
| |
| **Non-String Parameter Values |
| |
| Many of the configuration parameters for HttpClient have simple string values; however, there are important exceptions to |
| this. In some cases, you may need to specify boolean, integer, or long values. In others, you may even need to specify |
| a collection of string values. You can specify these using a simple formatting syntax, as follows: |
| |
| [[1]] <<booleans:>> <<<%b,\<value\>>>> |
| |
| [[2]] <<integer:>> <<<%i,\<value\>>>> |
| |
| [[3]] <<long:>> <<<%l,\<value\>>>> (yes, that's an 'L', not a '1') |
| |
| [[4]] <<double:>> <<<%d,\<value\>>>> |
| |
| [[5]] <<collection of strings:>> <<<%c,\<value1\>,\<value2\>,\<value3\>,...>>>, which could also be specified as: |
| |
| +---+ |
| %c, |
| <value1>, |
| <value2>, |
| <value3>, |
| ... |
| +---+ |
| |
| [] |
| |
| As you may have noticed, this syntax is similar to the format-and-data strategy used by functions like <<<sprintf()>>> |
| in many languages. The syntax has been chosen with this similarity in mind, to make it a little more intuitive to use. |
| |
| **Example: Using Preemptive Authentication |
| |
| Using the above syntax, we can configure preemptive authentication for PUT requests using the boolean HttpClient parameter |
| <<<http.authentication.preemptive>>>, like this: |
| |
| +---+ |
| <settings> |
| <servers> |
| <server> |
| <id>my-server</id> |
| <configuration> |
| <httpConfiguration> |
| <put> |
| <params> |
| <param> |
| <name>http.authentication.preemptive</name> |
| <value>%b,true</value> |
| </param> |
| </params> |
| </put> |
| </httpConfiguration> |
| </configuration> |
| </server> |
| </servers> |
| </settings> |
| +---+ |
| |
| **Ignoring Cookies |
| |
| Like the example above, telling the HttpClient to ignore cookies for all methods of request is a simple matter of |
| configuring the <<<http.protocol.cookie-policy>>> parameter (it uses a regular string value, so no special syntax |
| is required): |
| |
| +---+ |
| <settings> |
| <servers> |
| <server> |
| <id>my-server</id> |
| <configuration> |
| <httpConfiguration> |
| <all> |
| <params> |
| <param> |
| <name>http.protocol.cookie-policy</name> |
| <value>ignore</value> |
| </param> |
| </params> |
| </all> |
| </httpConfiguration> |
| </configuration> |
| </server> |
| </servers> |
| </settings> |
| +---+ |
| |
| The configuration above can be useful in cases where the repository is using cookies - like the session cookies |
| that are often mistakenly turned on or left on in appservers - alongside HTTP redirection. In these cases, it |
| becomes far more likely that the cookie issued by the appserver will use a <<<Path>>> that is inconsistent with |
| the one used by the client to access the server. If you have this problem, and know that you don't need to use |
| this session cookie, you can ignore cookies from this server with the above configuration. |
| |
| *Support for General-Wagon Configuration Standards |
| |
| It should be noted that configuration options previously available in the HttpClient-driven HTTP wagon are still |
| supported in addition to this new, fine-grained approach. These include the configuration of HTTP headers |
| and connection timeouts. Let's examine each of these briefly: |
| |
| **HTTP Headers |
| |
| In all HTTP Wagon implementations, you can add your own HTTP headers like this: |
| |
| +---+ |
| <settings> |
| <servers> |
| <server> |
| <id>my-server</id> |
| <configuration> |
| <httpHeaders> |
| <httpHeader> |
| <name>Foo</name> |
| <value>Bar</value> |
| </httpHeader> |
| </httpHeaders> |
| </configuration> |
| </server> |
| </servers> |
| </settings> |
| +---+ |
| |
| It's important to understand that the above approach doesn't allow you to turn off all of the default HTTP headers; nor |
| does it allow you to specify headers on a per-method basis. However, this configuration remains available in both the |
| lightweight and httpclient-based Wagon implementations. |
| |
| **Connection Timeouts |
| |
| All wagon implementations that extend the <<<AbstractWagon>>> class, including those for SCP, HTTP, FTP, and more, |
| allow the configuration of a connection timeout, to allow the user to tell Maven how long to wait before giving |
| up on a connection that has not responded. This option is preserved in the HttpClient-based wagon, but this wagon |
| also provides a fine-grained alternative configuration that can allow you to specify timeouts per-method for a |
| given server. The old configuration option - which is still supported - looks like this: |
| |
| +---+ |
| <settings> |
| <servers> |
| <server> |
| <id>my-server</id> |
| <configuration> |
| <timeout>6000</timeout> <!-- milliseconds --> |
| </configuration> |
| </server> |
| </servers> |
| </settings> |
| +---+ |
| |
| ...while the new configuration option looks more like this: |
| |
| +---+ |
| <settings> |
| <servers> |
| <server> |
| <id>my-server</id> |
| <configuration> |
| <httpConfiguration> |
| <put> |
| <connectionTimeout>10000</connectionTimeout> <!-- milliseconds --> |
| </put> |
| </httpConfiguration> |
| </configuration> |
| </server> |
| </servers> |
| </settings> |
| +---+ |
| |
| If all you need is a per-server timeout configuration, you still have the option to use the old <<<\<timeout\>>>> |
| parameter. If you need to separate timeout preferences according to HTTP method, you can use one more like that |
| specified directly above. |
| |
| *Resources |
| |
| [[1]] {{{http://hc.apache.org/httpclient-3.x/}HttpClient website}} |
| |
| [[2]] {{{http://hc.apache.org/httpclient-3.x/preference-api.html}HttpClient preference architecture and configuration guide}} |
| |
| [[3]] {{{./guide-wagon-providers.html}Guide to Wagon Providers}} |
| |
| [] |