This document will introduce how to start the online processing on user profiles. Assume Apache Eagle has been installed and Eagle service is started.
Step 1: Start Apache Spark if not started
Step 2: start offline scheduler
Option 1: command line
$ cd <eagle-home>/bin $ bin/eagle-userprofile-scheduler.sh --site sandbox start
Option 2: start via Apache Ambari
Step 3: generate a model
Two options to start the topology are provided.
Option 1: command line
submit userProfiles topology if it's not on topology UI
$ bin/eagle-topology.sh --main org.apache.eagle.security.userprofile.UserProfileDetectionMain --config conf/sandbox-userprofile-topology.conf start
Option 2: Apache Ambari
userprofile-validate.txt
file which contains data points that you can try to test the modelsCopy the files (downloaded in the previous step) into a location in sandbox For example: /usr/hdp/current/eagle/lib/userprofile/data/
Modify <Eagle-home>/conf/sandbox-userprofile-scheduler.conf
update training-audit-path
to set to the path for training data sample (the path you used for Step 1.a) update detection-audit-path to set to the path for validation (the path you used for Step 1.b)
Run ML training program from eagle UI
Produce Apache Kafka data using the contents from validate file (Step 1.b) Run the command (assuming the eagle configuration uses Kafka topic sandbox_hdfs_audit_log
)
./kafka-console-producer.sh --broker-list sandbox.hortonworks.com:6667 --topic sandbox_hdfs_audit_log
Paste few lines of data from file validate onto kafka-console-producer Check http://localhost:9099/eagle-service/#/dam/alertList for generated alerts