tree: 332ea225048c156b7045ed85d61d2a3dcb512e00 [path history] [tgz]
  1. merlin/
  2. merlin-core/
  3. .gitignore
  4. CHANGES.txt
  5. pom.xml
  6. README.md
falcon-regression/README.md

Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Falcon Regression

This project had 2 modules :

  1. merlin: it has all the system test for falcon
  2. merlin-core: it has all the utils used by merlin

Requirements

In addition to falcon server and prism, running full falcon regression requires three clusters. Each of these clusters must have:

  • hadoop
  • oozie
  • hive
  • hcat For specific tests it may be possible to run it without all clusters and components.

Prior to running tests Merlin.properties needs to be created and populated with cluster details.

Configuring Merlin.properties

Merlin.properties must be created before running falcon regression tests. The file must be created at the location:

falcon/falcon-regression/merlin/src/main/resources/Merlin.properties

Populate it with prism related properties:

#prism properties
prism.oozie_url = http://node-1.example.com:11000/oozie/
prism.oozie_location = /usr/lib/oozie/bin
prism.qa_host = node-1.example.com
prism.service_user = falcon
prism.hadoop_url = node-1.example.com:8020
prism.hadoop_location = /usr/lib/hadoop/bin/hadoop
prism.hostname = http://node-1.example.com:15000
prism.storeLocation = hdfs://node-1.example.com:8020/apps/falcon

Specify the clusters that you would be using for testing:

servers = cluster1,cluster2,cluster3

For each cluster specify properties:

#cluster1 properties
cluster1.oozie_url = http://node-1.example.com:11000/oozie/
cluster1.oozie_location = /usr/lib/oozie/bin
cluster1.qa_host = node-1.example.com
cluster1.service_user = falcon
cluster1.password = rgautam
cluster1.hadoop_url = node-1.example.com:8020
cluster1.hadoop_location = /usr/lib/hadoop/bin/hadoop
cluster1.hostname = http://node-1.example.com:15000
cluster1.cluster_readonly = webhdfs://node-1.example.com:50070
cluster1.cluster_execute = node-1.example.com:8032
cluster1.cluster_write = hdfs://node-1.example.com:8020
cluster1.activemq_url = tcp://node-1.example.com:61616?daemon=true
cluster1.storeLocation = hdfs://node-1.example.com:8020/apps/falcon
cluster1.colo = default
cluster1.namenode.kerberos.principal = nn/node-1.example.com@none
cluster1.hive.metastore.kerberos.principal = hive/node-1.example.com@none
cluster1.hcat_endpoint = thrift://node-1.example.com:9083
cluster1.service_stop_cmd = /usr/lib/falcon/bin/falcon-stop
cluster1.service_start_cmd = /usr/lib/falcon/bin/falcon-start

To not clean root tests dir before every test:

clean_tests_dir=false

Setting up HDFS Dirs

On all cluster as user that started falcon server do:

hdfs dfs -mkdir -p  /tmp/falcon-regression-staging
hdfs dfs -chmod 777 /tmp/falcon-regression-staging
hdfs dfs -mkdir -p  /tmp/falcon-regression-working
hdfs dfs -chmod 755 /tmp/falcon-regression-working

Running Tests

After creating Merlin.properties file. You can run the following commands to run the tests.

cd falcon-regression
mvn clean test -Phadoop-2

Profiles Supported: hadoop-2

To run a specific test:

mvn clean test -Phadoop-2 -Dtest=EmbeddedPigScriptTest

If you want to use specific version of any component, they can be specified using -D, for eg:

mvn clean test -Phadoop-2 -Doozie.version=4.1.0 -Dhadoop.version=2.6.0

Security Tests:

ACL tests require multiple user account setup:

other.user.name=root
falcon.super.user.name=falcon
falcon.super2.user.name=falcon2

ACL tests also require group name of the current user:

current_user.group.name=users

For testing with kerberos set keytabs properties for different users:

current_user_keytab=/home/qa/hadoopqa/keytabs/qa.headless.keytab
falcon.super.user.keytab=/home/qa/hadoopqa/keytabs/falcon.headless.keytab
falcon.super2.user.keytab=/home/qa/hadoopqa/keytabs/falcon2.headless.keytab
other.user.keytab=/home/qa/hadoopqa/keytabs/root.headless.keytab

Adding tests to falcon regression:

If you wish to contribute to falcon regression, it's as easy as it gets. All test classes must be added to the directory:

falcon/falcon-regression/merlin/src/test/java

This directory contains sub directories such as prism, ui, security, etc which contain tests specific to these aspects of falcon. Any general test can be added directly to the parent directory above. If you wish to write a series of tests for a new feature, feel free to create a new sub directory. Your test can use the various process/feed/cluster/workflow templates present in:

falcon/falcon-regression/merlin/src/test/resources

or you can add your own bundle of XMLs in this directory. Please avoid redundancy of any resource.

Each test class can contain multiple related tests. Let us look at a sample test class. Refer to comments in code for aid :

    //The License note must be added to each test

    /**
     * Licensed to the Apache Software Foundation (ASF) under one
     * or more contributor license agreements.  See the NOTICE file
     * distributed with this work for additional information
     * regarding copyright ownership.  The ASF licenses this file
     * to you under the Apache License, Version 2.0 (the
     * "License"); you may not use this file except in compliance
     * with the License.  You may obtain a copy of the License at
     *
     *     http://www.apache.org/licenses/LICENSE-2.0
     *
     * Unless required by applicable law or agreed to in writing, software
     * distributed under the License is distributed on an "AS IS" BASIS,
     * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     * See the License for the specific language governing permissions and
     * limitations under the License.
     */

    package org.apache.falcon.regression;


    import org.apache.falcon.regression.core.bundle.Bundle;
    import org.apache.falcon.regression.core.helpers.ColoHelper;
    import org.apache.falcon.regression.core.response.ServiceResponse;
    import org.apache.falcon.regression.core.util.AssertUtil;
    import org.apache.falcon.regression.core.util.BundleUtil;
    import org.apache.falcon.regression.testHelper.BaseTestClass;
    import org.testng.annotations.AfterMethod;
    import org.testng.annotations.BeforeMethod;
    import org.testng.annotations.Test;

    @Test(groups = "embedded")

    //Every test class must inherit the BaseTestClass. This class
    //helps using properties mentioned in Merlin.properties, in the test.

    public class FeedSubmitTest extends BaseTestClass {

        private ColoHelper cluster = servers.get(0);
        private String feed;

        @BeforeMethod(alwaysRun = true)
        public void setUp() throws Exception {

        //Several Util classes are available, such as BundleUtil, which for example
        //has been used here to read the ELBundle present in falcon/falcon-regression/src/test/resources

            bundles[0] = BundleUtil.readELBundle();
            bundles[0].generateUniqueBundle();
            bundles[0] = new Bundle(bundles[0], cluster);

            //submit the cluster
            ServiceResponse response =
                prism.getClusterHelper().submitEntity(bundles[0].getClusters().get(0));
            AssertUtil.assertSucceeded(response);
            feed = bundles[0].getInputFeedFromBundle();
        }

        @AfterMethod(alwaysRun = true)
        public void tearDown() {
            removeBundles();
        }

        //Java docs must be added for each test function, explaining what the function does

        /**
         * Submit correctly adjusted feed. Response should reflect success.
         *
         * @throws Exception
         */
        @Test(groups = {"singleCluster"})
        public void submitValidFeed() throws Exception {
            ServiceResponse response = prism.getFeedHelper().submitEntity(feed);
            AssertUtil.assertSucceeded(response);
        }

        /**
         * Submit and remove feed. Try to submit it again. Response should reflect success.
         *
         * @throws Exception
         */
        @Test(groups = {"singleCluster"})
        public void submitValidFeedPostDeletion() throws Exception {
            ServiceResponse response = prism.getFeedHelper().submitEntity(feed);
            AssertUtil.assertSucceeded(response);

            response = prism.getFeedHelper().delete(feed);
            AssertUtil.assertSucceeded(response);
            response = prism.getFeedHelper().submitEntity(feed);
            AssertUtil.assertSucceeded(response);
        }
    }
  • This class, as the name suggests was to test the Feed Submition aspect of Falcon. It contains multiple test functions, all of which however are various test cases for the same feature. This organisation in code must be maintained.

  • In order to be able to manipulate feeds, processes and clusters for the various tests, objects of classes FeedMerlin, ProcessMerlin, ClusterMerlin can be used. There are already existing functions which use these objects, such as setProcessInput, setFeedValidity, setProcessConcurrency, setInputFeedPeriodicity etc. in Bundle.java which should serve your purpose well enough.

  • To add more on the utils, you can use functions in HadoopUtil to create HDFS dirs, delete them, and add data on HDFS, OozieUtil to hit Oozie for checking coordinator/workflow status, TimeUtil to get lists of dates and directories to aid in data creation, HCatUtil for Hcatalog related utilities, and many others to make writing tests very easy.

  • Coding conventions are strictly followed. Use the checkstyle xml present in falcon/checkstyle/src/main/resources/falcon

in your project to not get checkstyle errors.

Testing on Windows

Some tests switch user to run commands as a different user. Location of binary to switch user is configurable:

windows.su.binary=ExecuteAs.exe

Automatic capture of oozie logs

For full falcon regression runs. It might be desirable to pull all oozie job info and logs at the end of the test. This can be done by configuring Merlin.properties:

log.capture.oozie = true
log.capture.oozie.skip_info = false
log.capture.oozie.skip_log = true
log.capture.location = ../

Dumping entities generated by falcon


Add -Dmerlin.dump.staging to the maven command. For example:

mvn clean test -Phadoop-2 -Dmerlin.dump.staging=true