HDFS Data Activity Monitoring

Monitor Requirements

This application aims to monitor user activities on HDFS via the hdfs audit log. Once any abnormal user activity is detected, an alert is sent in several seconds. The whole pipeline of this application is

  • Kafka ingest: this application consumes data from Kafka. In other words, users have to stream the log into Kafka first.

  • Data re-procesing, which includes raw log parser, ip zone joiner, sensitivity information joiner.

  • Kafka sink: parsed data will flows into Kafka again, which will be consumed by the alert engine.

  • Policy evaluation: the alert engine (hosted in Alert Engine app) evaluates each data event to check if the data violate the user defined policy. An alert is generated if the data matches the policy.

HDFSAUDITLOG

Setup & Installation

  • Choose a site to install this application. For example ‘sandbox’

  • Install “Hdfs Audit Log Monitor” app step by step

    Install Step 2