| <?xml version="1.0"?> |
| <!-- |
| This is the authentication configuration file for protocol-httpclient. |
| Different credentials for different authentication scopes can be |
| configured in this file. If a set of credentials is configured for a |
| particular authentication scope (i.e. particular host, port number, |
| scheme and realm), then that set of credentials would be sent only to |
| servers falling under the specified authentication scope. Apart from |
| this at most one set of credentials can be configured as 'default'. |
| |
| When authentication is required to fetch a resource from a web-server, |
| the authentication-scope is determined from the host, port, scheme and |
| realm (if present) obtained from the URL of the page and the |
| authentication headers in the HTTP response. If it matches any |
| 'authscope' in this configuration file, then the 'credentials' for |
| that 'authscope' is used for authentication. Otherwise, it would use |
| the 'default' set of credentials (with an exception which is described |
| in the next paragraph), if present. If any attribute is missing, it |
| would match all values for that attribute. |
| |
| If there are several pages having different authentication realms and |
| schemes on the same web-server (same host and port, but different |
| realms and schemes), and credentials for one or more of the realms and |
| schemes for that web-server is specified, then the 'default' |
| credentials would be ignored completely for that web-server (for that |
| host and port). So, credentials to handle all realms and schemes for |
| that server may be specified explicitly by adding an extra 'authscope' |
| tag with the 'realm' and 'scheme' attributes missing for that server. |
| This is demonstrated by the last 'authscope' tag for 'example:8080' in |
| the following example. |
| |
| Example:- |
| <credentials username="susam" password="masus"> |
| <default realm="sso"/> |
| <authscope host="192.168.101.33" port="80" realm="login"/> |
| <authscope host="example" port="8080" realm="blogs"/> |
| <authscope host="example" port="8080" realm="wiki"/> |
| <authscope host="example" port="80" realm="quiz" scheme="NTLM"/> |
| </credentials> |
| <credentials username="admin" password="nimda"> |
| <authscope host="example" port="8080"/> |
| </credentials> |
| |
| In the above example, 'example:8080' server has pages with multiple |
| authentication realms. The first set of credentials would be used for |
| 'blogs' and 'wiki' authentication realms. The second set of |
| credentials would be used for all other realms. For 'login' realm of |
| '192.168.101.33', the first set of credentials would be used. For any |
| other realm of '192.168.101.33' authentication would not be done. For |
| the NTLM authentication required by 'example:80', the first set of |
| credentials would be used. For 'sso' realms of all other servers, the |
| first set of credentials would be used, since it is configured as |
| 'default'. |
| |
| NTLM does not use the notion of realms. The domain name may be |
| specified as the value for 'realm' attribute in case of NTLM. |
| |
| More information on Basic, Digest and NTLM authentication |
| support can be located at https://wiki.apache.org/nutch/HttpAuthenticationSchemes |
| |
| HTTP-POST Authentication Support |
| Http Form-based Authentication is a very common used authentication |
| mechanism to protect web resources. We extend the 'auth-configuration' |
| to include information about http form authentication properties as shown |
| in the following example: |
| |
| Example:- |
| <credentials authMethod="formAuth" |
| loginUrl="http://localhost:44444/Account/Login.aspx" |
| loginFormId="ctl01" |
| loginRedirect="true"> |
| <loginPostData> |
| <field name="ctl00$MainContent$LoginUser$UserName" |
| value="admin"/> |
| <field name="ctl00$MainContent$LoginUser$Password" |
| value="admin123"/> |
| </loginPostData> |
| <additionalPostHeaders> |
| <field name="User-Agent" |
| value="Mozilla/5.0 ... Firefox/35.0" /> |
| </additionalPostHeaders> |
| <removedFormFields> |
| <field name="ctl00$MainContent$LoginUser$RememberMe"/> |
| </removedFormFields> |
| <loginCookie> |
| <policy>BROWSER_COMPATIBILITY</policy> |
| </loginCookie> |
| </credentials> |
| |
| it is critical that the following fields are substituted: |
| * loginUrl - the URL containing the actual <form> |
| * loginFormId - the <form id="$formId" attribute value |
| (or the 'name' attribute if no form is referenced by 'id' attribute) |
| * loginRedirect - if http post login returns redirect code: 301 or 302, |
| and value is true, Http Client will automatically follow the redirect. |
| * <field name="ctl00$MainContent$LoginUser$UserName" value="admin" |
| - the <input name"name" and user defined username value used to represent |
| the field and username respectively |
| * <field name="ctl00$MainContent$LoginUser$Password" value="admin123" |
| - the <input name"name" and user defined password value used to represent |
| the field and password respectively |
| * <field name="ctl00$MainContent$LoginUser$RememberMe"/> |
| - form element attributes for which we wish to skip fields |
| * <policy> value from <loginCookie> is a constant value symbol from |
| org.apache.commons.httpclient.cookie.CookiePolicy, like BROWSER_COMPATIBILITY, |
| DEFAULT, RFC_2109, etc. |
| |
| More information on HTTP POST can be located at |
| https://wiki.apache.org/nutch/HttpPostAuthentication |
| |
| --> |
| |
| <auth-configuration> |
| |
| </auth-configuration> |