layout: doc title: “Data Classification Tutorial” permalink: /docs/tutorial/classification-0.3.0.html

Apache Eagle data classification feature provides the ability to classify data with different levels of sensitivity. Currently this feature is available ONLY for applications monitoring HDFS, Apache Hive and Apache HBase. For example, HdfsAuditLog, HiveQueryLog and HBaseSecurityLog.

The main content of this page are

  • Connection Configuration
  • Data Classification

Connection Configuration

To monitor a remote cluster, we first make sure the connection to the cluster is configured. For more details, please refer to Site Management

Data Classification

After the configuration is The first part is about how to add/remove sensitivity to files/directories; the second part shows how to monitor these sensitive data. In the following, we take HdfsAuditLog as an example.

Part 1: Sensitivity Edit

  • add the sensitive mark to files/directories.

    • Basic: Label sensitivity files directly (recommended)

      HDFS classification HDFS classification HDFS classification

    • Advanced: Import json file/content

      HDFS classification HDFS classification HDFS classification

  • remove sensitive mark on files/directories

    • Basic: remove label directly

      HDFS classification HDFS classification

    • Advanced: delete lin batch

      HDFS classification

Part 2: Sensitivity Usage in Policy Definition

You can mark a particular folder/file as “PRIVATE”. Once you have this information you can create policies using this label.

For example: the following policy monitors all the operations to resources with sensitivity type “PRIVATE”.

sensitivity type policy