commit | 7df3c203f36738f9139d625dd9db3287cd44af6a | [log] [tgz] |
---|---|---|
author | Daniel Gruno <humbedooh@apache.org> | Wed Mar 03 12:41:19 2021 +0100 |
committer | GitHub <noreply@github.com> | Wed Mar 03 12:41:19 2021 +0100 |
tree | af57f645b5f76e5712026730661ed8f842099bac | |
parent | 513075030021c381957d7ad42de63b1dedca2201 [diff] |
Add HTTP request headline, only use async write if supported async writing currently is unsupported on Linux 4.18 and above. Thus, if we're on Linux and below (or not Linux), fall back to sync writing.
Aardvark acts as a middleman between frontend web servers and (typically) ticket submission services such as JIRA or BugZilla, and intercepts all data sent. POST Data is scanned for known offending words that are common in spam, and if found to be spam, the request is blocked. Aardvark keeps an internal list of offending IPs, and will block any subsequent POST requests from those IPs (until restarted).
Aardvark is written in Python3 and uses aiohttp for its server/client capabilities.
port
: Which port to listen on for scans. For security reasons, Aardvark will bind to localhost. Default is 1729proxy_url
: The backend service to proxy to if request is saneipheader
: The header to look for the client's IP in. Typically X-Forwarded-For.naive_spam_threshold
: This is the spam score threshold for the naïve scanner, spamfilter.py
. It uses a pre-generated English corpus for detecting spam.spamurls
: Specific honey-pot URLs that trigger a block regardless of the actionignoreurls
: Specific URLs that are exempt from spam detectionpostmatches
: A list of keywords and/or regexes that, if matched, will block the requestmultimatch
: A combination blocker. If a required
keyword or regex is matched, the request will be blocked only if one or more auxiliary
keywords/regexes are also matchedAardvark contains a very naïve spam scanner in spamfilter.py
that uses a very simplified Bayes-esque formula for determining whether something is spam. It is enabled for form data only, and can be disabled entirely by setting enable_naive_scan
to false
. It has a built-in corpus with ham and spam in English, and works...sometimes :)
It is very much a work in progress, but should be safe to have enabled.
To enable as a pipservice, add the following minimal hiera yaml to your node config:
pipservice: aardvark-proxy: tag: main
Follow these steps to run manually (assuming you have pipenv installed):
git clone https://github.com/apache/infrastructure-aardvark-proxy.git aardvark-proxy
cd aardvark-proxy
pipenv install -r requirements.txt
pipenv run python3 aardvark.py
As Aardvark is a proxy middleman for specific purposes, you will preferably need a web server in front. The example below relays all POST requests for /foo/bar through Aardvark, while letting all GETs etc go directly to the backend service.
Assuming Aardvark is listening on port 1729
and the real backend service is on port 8080
:
# Send all POST requests through Aardvark RewriteEngine On RewriteCond %{REQUEST_METHOD} POST RewriteRule ^/(.*)$ http://localhost:1729/$1 [P] # Rest goes to backend directly ProxyPass / http://localhost:8080/foo/bar/