blob: 0c5318accfa33ef1df6a30d1b4ee725f202fa068 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
Setting up Apache Kibble
========================
.. toctree::
:maxdepth: 2
:caption: Contents:
****************************
Understanding the Components
****************************
Kibble consists of three major components:
- web application - this is the user facing part of Apache Kibble. Via this
ui users can create organizations, configure scanners and most importantly
view and analyze the data.
- scanners - as the name suggest are application designed to work
with a specific type of resource (a git repo, a mailing list, a JIRA
instance etc) and push compiled data objects to the Kibble Server.
Some resources only have one scanner plugin, while others may have
multiple plugins capable of dealing with specific aspects of a
resource.
- database - an instance of ElasticSearch used by both web application and
scanners to share the information.
The following diagram shows Kibble architecture:
.. figure:: _static/images/kibble-architecture.png
**********************
Component Requirements
**********************
################
Server Component
################
The Kibble Server is a hub for scanners, and as such, is only ever needed on
one machine. It is recommended that, for large instances of kibble, you place
the application on a machine or VM with sufficient resources to handle the
database load and memory requirements.
As a rule of thumb, the Server does not require a lot of disk space
(enough to hold the compiled database), but it does require CPU and RAM.
The scanners require more disk space, but can operate with limited CPU
and RAM.
As an example, let us examine the Apache Kibble demo instance:
- 100 sources (git repos, mailing lists, bug trackers and so on)
- 3,5 million source objects currently (commits, emails, tickets etc)
- 10 concurrent users (actual people uing the web UI)
The recommended minimal specs for the Server component on an instance of
this size would be approximately 4-8GB RAM, 4 cores and at least 10GB
disk space. As this is a centralized component, you will want to spec
this to be able to efficiently deal with the entire database in memory
for best performance.
#################
Scanner Component
#################
The scanner components can either consist of one instance, or be spread
out in a clustered setup. Thus, the requirements can be spread out on
multiple machines or VMs. Scanners will auto-adjust the scanning speed
to match the number of CPU cores available to it; a scanner with two
cores available will run two simultaneous jobs, whereas a scanner with
eight cores will run eight simultaneous jobs to speed up processing.
A scanner will typically require somewhere between 512 and 1GB of memory,
and thus can safely run on a VM with 2GB memory (or less).
********************
Source Code Location
********************
.. This needs to change once we have released Kibble
*Apache Kibble does not currently have any releases.*
*You are however welcome to try out the development version.*
For the time being, we recommend that you use the ``main`` branch for
testing Kibble. All source code can be found in our repository at:
https://github.com/apache/kibble
*********************
Installing Kibble
*********************
###############
Pre-requisites
###############
Before you install the Kibble, please ensure you have the
following components installed and set up:
- Python 3.8
- git binaries (GPL License)
- cloc version 1.76 or later (GPL License)
- An ElasticSearch instance, version 6.x or newer (5.x is supported for
existing databases, but not for new setups). Does not have to be on
the same machine, but it may help speed up processing.
- A web server of your choice (Apache HTTP Server, NGINX, lighttp etc)
###########################################
Configuring and Priming the Kibble Instance
###########################################
Once you have the components installed and Kibble downloaded, you will
need to prime the ElasticSearch instance and create a configuration file.
To install ``kibble`` do the following
::
git clone https://github.com/apache/kibble.git
cd kibble
pip install .
As a good practice it is recommended to use virtual environment for installation.
Once ``kibble`` is installed you may wish to adjust the ``kibble.ini`` configuration
file, especially the ``elasticsearch`` section which is required to connect to database.
Then you can run the following command to configure the database and create initial
administrator account for the UI:
::
kibble setup --autoadmin --skiponexist
#####################
Setting up the Web UI
#####################
Once you have finished the initial setup, you will need to enable the
web UI. Kibble is built as a WSGI application, and as such you can
use mod_wsgi for apache, or proxy to Gunicorn. In this example, we will
be using the Apache HTTP Server and proxy to Gunicorn:
- Make sure you have mod_proxy and mod_proxy_http loaded (on
debian/ubuntu, you would run: `a2enmod proxy_http`)
- Set up a virtual host in Apache:
::
<VirtualHost *:80>
# Set this to your domain, or add kibble.localhost to /etc/hosts
ServerName kibble.localhost
DocumentRoot /var/www/kibble/ui/
# Proxy to gunicorn for /api/ below:
ProxyPass /api/ http://localhost:8000/api/
</VirtualHost>
- Launch gunicorn as a daemon on port 8000 (if your distro calls
gunicorn for Python3 `gunicorn3`, make sure you use that instead):
::
cd /var/www/kibble/api/
gunicorn -w 10 -b 127.0.0.1:8000 -t 120 -D kibble.api.handler:application
Once httpd is (re)started, you should be able to browse to your new
Kibble instance.
######################
Configuring a Scanners
######################
Scanners are configured via ``kibble.ini`` configuration file.
Remember that the scanner must have enough disk space to fully store
any resources you may be scanning. If you are scanning a large git repository,
the scanner should have sufficient disk space to store it locally.
If you plan to make use of the optional text analysis features of
Kibble, you should also configure the API service you will be using
(Watson/Azure/picoAPI etc).
##############################
Balancing Load Across Machines
##############################
If you wish to spread out the analysis load over several machines/VMs,
you can do so by specifying a ``scanner.balance`` on each node. The balance
directive uses the syntax X/Y, where Y is the total number of nodes in
your scanner cluster, and X is the ID of the current scanner. Thus, if
you have decided to use four machines for scanning, the first would have
a balance of 1/4, the next would be 2/4, then 3/4 and finally 4/4 on the
last machine. This will balance the load and storage requirements evenly
across all machines.
.. _runscan:
###############
Running a Scan
###############
Once you have both scanners and the data server set up, you can begin
scanning resources for data. Please refer to :ref:`configdatasources`
for how to set up various resources for scanning via the Web UI.
Scans can be initiated manually, but you may want to set up a cron job to
handle daily scans of resources. To start a scan on a scanner machine,
run the following:
::
kibble scan
This will load all plugins and use them in a sensible order on each
resource that matches the appropriate type. The collected data will be
pushed to the main data server and be available for visualizations
instantly.
It may be worth your while to run the scanner inside a timer wrapper,
as such: ``time kibble scan`` in order to gauge the
amount of time a scan will take, and adjusting your cron jobs to match
this.
************
Docker Image
************
If you want to spin up a development instance of Apache Kibble you can do:
::
docker-compose -f docker-compose-dev.yaml run kibble setup --autoadmin --skiponexist
docker-compose -f docker-compose-dev.yaml up ui
The ui should be available under ``http://0.0.0.0:8000`` or ``http://localhost:8000``.
To log in you can use the dummy admin account ``admin@kibble`` and password ``kibbleAdmin``.
You can also start only the API server:
::
docker-compose -f docker-compose-dev.yaml up api
To trigger scanners run:
::
docker-compose -f docker-compose-dev.yaml run kibble scan