tutorial-userprofile.md

layout: doc title: “User Profile Tutorial” permalink: /docs/tutorial/userprofile.html

This document will introduce how to start the online processing on user profiles. Assume Apache Eagle has been installed and Eagle service is started.

User Profile Offline Training

Step 1: Start Apache Spark if not started
Step 2: start offline scheduler
- Option 1: command line
```
$ cd <eagle-home>/bin
$ bin/eagle-userprofile-scheduler.sh --site sandbox start
```
- Option 2: start via Apache Ambari
Step 3: generate a model

User Profile Online Detection

Two options to start the topology are provided.

Option 1: command line

submit userProfiles topology if it's not on topology UI

$ bin/eagle-topology.sh --main org.apache.eagle.security.userprofile.UserProfileDetectionMain --config conf/sandbox-userprofile-topology.conf start

Option 2: Apache Ambari

Evaluate User Profile in Sandbox

Prepare sample data for ML training and validation sample data

a. Download following sample data to be used for training
- user1.hdfs-audit.2015-10-11-00.txt
- user1.hdfs-audit.2015-10-11-01.txt
b. Downlaod userprofile-validate.txtfile which contains data points that you can try to test the models

Copy the files (downloaded in the previous step) into a location in sandbox For example: /usr/hdp/current/eagle/lib/userprofile/data/
Modify <Eagle-home>/conf/sandbox-userprofile-scheduler.conf update training-audit-path to set to the path for training data sample (the path you used for Step 1.a) update detection-audit-path to set to the path for validation (the path you used for Step 1.b)
Run ML training program from eagle UI
Produce Apache Kafka data using the contents from validate file (Step 1.b) Run the command (assuming the eagle configuration uses Kafka topic sandbox_hdfs_audit_log)
```
 ./kafka-console-producer.sh --broker-list sandbox.hortonworks.com:6667 --topic sandbox_hdfs_audit_log
```
Paste few lines of data from file validate onto kafka-console-producer Check http://localhost:9099/eagle-service/#/dam/alertList for generated alerts