blob: 9d23093fae3ce1ac11bce448af00bd34e39623c4 [file] [log] [blame]
<?xml version="1.0"?>
<!--
This is the authentication configuration file for protocol-httpclient.
Different credentials for different authentication scopes can be
configured in this file. If a set of credentials is configured for a
particular authentication scope (i.e. particular host, port number,
scheme and realm), then that set of credentials would be sent only to
servers falling under the specified authentication scope. Apart from
this at most one set of credentials can be configured as 'default'.
When authentication is required to fetch a resource from a web-server,
the authentication-scope is determined from the host, port, scheme and
realm (if present) obtained from the URL of the page and the
authentication headers in the HTTP response. If it matches any
'authscope' in this configuration file, then the 'credentials' for
that 'authscope' is used for authentication. Otherwise, it would use
the 'default' set of credentials (with an exception which is described
in the next paragraph), if present. If any attribute is missing, it
would match all values for that attribute.
If there are several pages having different authentication realms and
schemes on the same web-server (same host and port, but different
realms and schemes), and credentials for one or more of the realms and
schemes for that web-server is specified, then the 'default'
credentials would be ignored completely for that web-server (for that
host and port). So, credentials to handle all realms and schemes for
that server may be specified explicitly by adding an extra 'authscope'
tag with the 'realm' and 'scheme' attributes missing for that server.
This is demonstrated by the last 'authscope' tag for 'example:8080' in
the following example.
Example:-
<credentials username="susam" password="masus">
<default realm="sso"/>
<authscope host="192.168.101.33" port="80" realm="login"/>
<authscope host="example" port="8080" realm="blogs"/>
<authscope host="example" port="8080" realm="wiki"/>
<authscope host="example" port="80" realm="quiz" scheme="NTLM"/>
</credentials>
<credentials username="admin" password="nimda">
<authscope host="example" port="8080"/>
</credentials>
In the above example, 'example:8080' server has pages with multiple
authentication realms. The first set of credentials would be used for
'blogs' and 'wiki' authentication realms. The second set of
credentials would be used for all other realms. For 'login' realm of
'192.168.101.33', the first set of credentials would be used. For any
other realm of '192.168.101.33' authentication would not be done. For
the NTLM authentication required by 'example:80', the first set of
credentials would be used. For 'sso' realms of all other servers, the
first set of credentials would be used, since it is configured as
'default'.
NTLM does not use the notion of realms. The domain name may be
specified as the value for 'realm' attribute in case of NTLM.
More information on Basic, Digest and NTLM authentication
support can be located at https://wiki.apache.org/nutch/HttpAuthenticationSchemes
HTTP-POST Authentication Support
Http Form-based Authentication is a very common used authentication
mechanism to protect web resources. We extend the 'auth-configuration'
to include information about http form authentication properties as shown
in the following example:
Example:-
<credentials authMethod="formAuth"
loginUrl="http://localhost:44444/Account/Login.aspx"
loginFormId="ctl01"
loginRedirect="true">
<loginPostData>
<field name="ctl00$MainContent$LoginUser$UserName"
value="admin"/>
<field name="ctl00$MainContent$LoginUser$Password"
value="admin123"/>
</loginPostData>
<additionalPostHeaders>
<field name="User-Agent"
value="Mozilla/5.0 ... Firefox/35.0" />
</additionalPostHeaders>
<removedFormFields>
<field name="ctl00$MainContent$LoginUser$RememberMe"/>
</removedFormFields>
<loginCookie>
<policy>BROWSER_COMPATIBILITY</policy>
</loginCookie>
</credentials>
it is critical that the following fields are substituted:
* loginUrl - the URL containing the actual <form>
* loginFormId - the <form id="$formId" attribute value
(or the 'name' attribute if no form is referenced by 'id' attribute)
* loginRedirect - if http post login returns redirect code: 301 or 302,
and value is true, Http Client will automatically follow the redirect.
* <field name="ctl00$MainContent$LoginUser$UserName" value="admin"
- the <input name"name" and user defined username value used to represent
the field and username respectively
* <field name="ctl00$MainContent$LoginUser$Password" value="admin123"
- the <input name"name" and user defined password value used to represent
the field and password respectively
* <field name="ctl00$MainContent$LoginUser$RememberMe"/>
- form element attributes for which we wish to skip fields
* <policy> value from <loginCookie> is a constant value symbol from
org.apache.commons.httpclient.cookie.CookiePolicy, like BROWSER_COMPATIBILITY,
DEFAULT, RFC_2109, etc.
More information on HTTP POST can be located at
https://wiki.apache.org/nutch/HttpPostAuthentication
-->
<auth-configuration>
</auth-configuration>