After Apache Eagle has been deployed (please reference deployment), you can enter deployment directory and use commands below to control Apache Eagle Server.
./bin/eagle-server.sh start|stop|status
After starting the Eagle server, please type http://<EAGLE_SERVER_HOST>:/ to open the web ui of Eagle.
This is the typical Web Interface (short for WI) after setting up your Eagle monitoring environment. WI majorly contain the right main panel and left function menu.
This is the aggregated UI for configured sites, and the applications. It will show those created sites created, how many application installed for each sites, and alerts generated from that cluster. You can click “More info” link to view the details for particular site.
The “Widgets” section is customizable; if the application developer have its application registered to Home page, you can find that in “Widgets” section. Please check the application developer guide about how to register applications to home widgets. It give you a shortcut to go directly to the application home.
Eagle has an extensible framework to dynamically add new monitoring applications in Eagle environment. It also ships some built-in big data monitoring applications.
Go to “Integration” -> “Applications”, it will list a set of available monitoring applications which you can choose to monitor your services.
The “Application” column is the display name for an application, “Streams” is a logical name for the data stream from the monitored source after pre-processing, which will consumed by Alert Engine.
At the moment, we have the below built-in applications shipped with Apache Eagle. You can refer to the application documentation to understand how to do the configuration for each monitoring application.
Application | Description |
---|---|
Topology Health Check | This application can be used to monitor the service healthiness for HDFS, HBase and YARN. You can get alerted once the master role or the slave role got crashed. |
Hadoop JMX Metrics Monitoring | This application can be used to monitor the JMX metrics data from the master nodes of HDFS, HBase and YARN, e.g. NameNode, HBase Master and YARN Resource Manager. |
HDFS Audit Log Monitor | This application can be used to monitor the data operations in HDFS, to detect sensitive data access and malicious operations; to protect from data leak or data loss. |
HBase Audit Log Monitor | Same as HDFS Audit Log Monitor, this application is used to monitor the data operations in HBase. |
Map Reduce History Job | This application is used to get the MapReduce history job counters from YARN history server and job running history from HDFS log directory. |
Map Reduce Running Job | This application is used to get the MapReduce running job counter information using YARN Rest API. |
Hadoop Queue Monitor | This application is used to get the resource scheduling and utilization info from YARN. |
MR Metrics Aggregation | This application is used to aggregate the job counters and some resource utilization in a certain period of time (daily, weekly or monthly). |
Job Performance Monitor Web | This application only contains the frontend, and depends on Map Reduce History Job and Map Reduce Running Job. |
Alert Engine | Alert Engine is a special application and used to process the output data from other applications. |
To enable a real monitoring use case, you have to create a site first, and install a certain application for this site, and finally start the application. We use site concept to group the running applications and avoid the application conflict.
Go to “Integration” -> “Sites”, there will be a table listing the managed sites.
Click “New Site” on the bottom right of the Sites page. You can fill the information in site creation dialog.
The “Site Id” should not be duplicated. After the creation, you can find it in sites page.
By clicking “Edit” button or the Site column in Sites table, you can have the Site configuration page, there you can install monitoring applications.
Choose the particular application which you want to install, you probably have something to fill, e.g. the HDFS NameNode address, Zookeeper address and port. Please check each application documentation for how to configure each application.
After doing the installation, you can start the application by clicking or stop the application by
. You can check the “Status” column about the running status. Usually, it should have “INITIALIZED” or “RUNNING” for a healthy application.
After setting up the monitoring applications, you probably want to setup some alert policies against the monitored data, so you can get notified once any violation on the data. Eagle has a centralized place for policy definition.
Go to “Alert” -> “Policies”, you can check the policies defined and take control on whether to enable the policy:
You can apply the below actions for a certain policy:
: enable a policy
: disable a policy
: edit a policy
: purge a policy
If you want to create a new policy, click “Alert” -> “Define Policy”, or you can enter into the policy definition page by editing an existing policy. After that, you can go to the policy list to enable the policy dynamically.
In this section, you can define the alert publishment method by clicking the “+Add Publisher”.
You can choose the publishment method from an existing policy or by creating new publisher.
There are four built-in publisher types:
EmailPublisher: org.apache.eagle.alert.engine.publisher.impl.AlertEmailPublisher
KafkaPublisher: org.apache.eagle.alert.engine.publisher.impl.AlertKafkaPublisher
SlackPublisher: org.apache.eagle.alert.engine.publisher.impl.AlertSlackPublisher
EagleStoragePlugin: org.apache.eagle.alert.engine.publisher.impl.AlertEagleStoragePlugin
Currently, we support SiddhiQL(please view Siddhi Query Language Specification here)
In order to explain how stream data is processed, let us take policy below as an example:
from map_reduce_failed_job_stream[site=="sandbox" and currentState=="FAILED"] select * group by jobId insert into map_reduce_failed_job_stream_out
This policy contains below parts:
Source: from map_reduce_failed_job_stream
Filter: [site==“sandbox” and currentState==“FAILED”]
Projection: select *
GroupBy: group by jobId
Destination: insert into map_reduce_failed_job_stream_out
Source Streams(schema) are defined by applications, and applications will write stream data to data sink(currently, we support kafka as data sink).
<streams> <stream> <streamId>map_reduce_failed_job_stream</streamId> <description>Map Reduce Failed Job Stream</description> <validate>true</validate> <columns> <column> <name>site</name> <type>string</type> </column> …... <column> <name>jobId</name> <type>string</type> <column> <name>currentState</name> <type>string</type> </column> </columns> </stream> </streams>
After policy is defined, Alert engine will create siddhi execution runtime for the policy(also load stream data schema from metadata store). Since siddhi execution runtime knows the stream data schema, then it will process stream data and do the calculation.
After setting the sites and applications, you can find the site item from the home page or “Sites” menu.
Here is a site home example. After entering the site home, the left menu will be replaced by application dashboard links only related to that site, so you can switch between the application dashboard quickly. In the right panel, it contains the application icons installed in this site, but depends on if the application has its dashboard defined. You can click the application icon or the application links to go to the application dashboard home. Please check the application documentation about how to use the application monitoring dashboard.
Eagle has all the alerts generated by all the applications stored in its database, so you can check your application alerts from Eagle WI.
Go to “Alert” -> “Alerts”, you can find the alerts table.