This is an example program that shows how to use the Kudu API in Python to load data into a new / existing Kudu table generated by an external program.
Make sure you have the Kudu client library installed and the kudu Python bindings are available. If you have the Kudu client library and Python bindings in a special place, you'll need to set the environment variables:
To the according directories. In addition you'll need the dstat program, it should be available from your typical package repository.
In this case the
dstat program is used to generate data about the system load and pipe this data into a named pipe that is then read and pipe to the python program.
To execute this script simply run:
This will create a table assuming that you have a kudu-master running locally. You can use the Web UI to access some information about the table using the following link: http://localhost:8051. The program will run until it is terminated via C-c.
To drop the table in Kudu and start fresh start the program with:
python kudu_dstat.py drop
To query the data via Impala, create a new Kudu table in Impala using the following command in the impala-shell.
CREATE EXTERNAL TABLE dstat ( `ts` BIGINT, `usr` FLOAT, `sys` FLOAT, `idl` FLOAT, `wai` FLOAT, `hiq` FLOAT, `siq` FLOAT, `read` FLOAT, `writ` FLOAT, `recv` FLOAT, `send` FLOAT, `in` FLOAT, `out` FLOAT, `int` FLOAT, `csw` FLOAT ) TBLPROPERTIES( 'storage_handler' = 'com.cloudera.kudu.hive.KuduStorageHandler', 'kudu.table_name' = 'dstat', 'kudu.master_addresses' = '127.0.0.1:7051', 'kudu.key_columns' = 'ts' );
Now you can query your local system's load using:
-- How many rows are stored right now? select count(*) from dstat; -- Average load in 10s windows select (ts - ts % 10 ) as mod_ts, avg(usr), avg(sys), avg(idl) from dstat group by mod_ts order by mod_ts