FALCON-2196 User Extensions Documentation

Author: sandeep <sandysmdl@gmail.com>
Author: sandeep <sandeep@kickseed.corp.inmobi.com>

Reviewers: "Pallavi Rao <pallavi.rao@inmobi.com>, Pracheer Agarwal <pracheeragarwal@gmail.com>"

Closes #381 from sandeepSamudrala/FALCON-2196 and squashes the following commits:

362dcec [sandeep] FALCON-2196 Incorporated review comments
485f38c [sandeep] FALCON-2196 Incorporated review comments
b525b58 [sandeep] FALCON-2196 User Extensions Documentation
68b1a51 [sandeep] Merge branch 'master' of https://github.com/apache/falcon into FALCON-2196
cf7ea21 [sandeep] Merge branch 'master' of https://github.com/apache/falcon into FALCON-2196
79e8d64 [sandeep] Merge branch 'master' of https://github.com/apache/falcon
7de7798 [sandeep] go -b FALCON-2263Merge branch 'master' of https://github.com/apache/falcon
c5da0a2 [sandeep] Merge branch 'master' of https://github.com/apache/falcon
7e16263 [sandeep] Merge branch 'master' of https://github.com/apache/falcon
a234d94 [sandeep] FALCON-2231 Incoporated review comments and small fixes for duplicate submission and colo addition to schedule command
26e3350 [sandeep] Merge branch 'master' of https://github.com/apache/falcon
73fbf75 [sandeep] Merge branch 'master' of https://github.com/apache/falcon
cc28658 [sandeep] Merge branch 'master' of https://github.com/apache/falcon
089b10d [sandeep] Merge branch 'master' of https://github.com/apache/falcon
456d4ee [sandeep] Merge branch 'master' of https://github.com/apache/falcon
0cf9af6 [sandeep] Merge branch 'master' of https://github.com/apache/falcon
4a2e23e [sandeep] Merge branch 'master' of https://github.com/apache/falcon
b1546ed [sandeep] Merge branch 'master' of https://github.com/apache/falcon
0a433fb [sandeep] Merge branch 'master' of https://github.com/apache/falcon
194f36a [sandeep] Merge branch 'master' of https://github.com/apache/falcon
e0ad358 [sandeep] Merge branch 'master' of https://github.com/apache/falcon
f96a084 [sandeep] Merge branch 'master' of https://github.com/apache/falcon
9cf36e9 [sandeep] Merge branch 'master' of https://github.com/apache/falcon
bbca081 [sandeep] Merge branch 'master' of https://github.com/apache/falcon
48f6afa [sandeep] Merge branch 'master' of https://github.com/apache/falcon
250cc46 [sandeep] Merge branch 'master' of https://github.com/apache/falcon
d0393e9 [sandeep] Merge branch 'master' of https://github.com/apache/falcon
a178805 [sandeep] Merge branch 'master' of https://github.com/apache/falcon
d6dc8bf [sandeep] Merge branch 'master' of https://github.com/apache/falcon
1bb8d3c [sandeep] Merge branch 'master' of https://github.com/apache/falcon
c065566 [sandeep] reverting last line changes made
1a4dcd2 [sandeep] rebased and resolved the conflicts from master
271318b [sandeep] FALCON-2097. Adding UT to the new method for getting next instance time with Delay.
a94d4fe [sandeep] rebasing from master
9e68a57 [sandeep] FALCON-298. Feed update with replication delay creates holes
diff --git a/docs/src/site/twiki/Extensions.twiki b/docs/src/site/twiki/Extensions.twiki
index a3fed4e..2871d3c 100644
--- a/docs/src/site/twiki/Extensions.twiki
+++ b/docs/src/site/twiki/Extensions.twiki
@@ -2,64 +2,11 @@
 
 ---++ Overview
 
-A Falcon extension is a static process template with parameterized workflow to realize a specific use case and enable non-programmers to capture and re-use very complex business logic. Extensions are defined in server space. Objective of the extension is to solve a standard data management function that can be invoked as a tool using the standard Falcon features (REST API, CLI and UI access) supporting standard falcon features.
+A Falcon extension is a solution template comprising of entities that solve a specific use case and enable non-programmers to capture and re-use very complex business logic. Objective of an extension is to solve a standard data processing/management function that can be invoked as a tool using the standard Falcon features (REST API, CLI and UI access) supporting standard Falcon features.
 
-For example:
+Falcon currently supports two types of extensions:
 
-   * Replicating directories from one HDFS cluster to another (not timed partitions)
-   * Replicating hive metadata (database, table, views, etc.)
-   * Replicating between HDFS and Hive - either way
-   * Data masking etc.
+   * [[TrustedExtensions][Trusted Extensions]]
 
----++ Proposal
+   * [[UserExtensions][User Extensions]]
 
-Falcon provides a Process abstraction that encapsulates the configuration for a user workflow with scheduling controls. All extensions can be modeled as a Process and its dependent feeds with in Falcon which executes the user
-workflow periodically. The process and its associated workflow are parameterized. The user will provide properties which are <name, value> pairs that are substituted by falcon before scheduling it. Falcon translates these extensions
-as a process entity by replacing the parameters in the workflow definition.
-
----++ Falcon extension artifacts to manage extensions
-
-Extension artifacts are published in addons/extensions. Artifacts are expected to be installed on HDFS at "extension.store.uri" path defined in startup properties. Each extension is expected to ahve the below artifacts
-   * json file under META directory lists all the required and optional parameters/arguments for scheduling extension job
-   * process entity template to be scheduled under resources directory
-   * parameterized workflow under resources directory
-   * required libs under the libs directory
-   * README describing the functionality achieved by extension
-
-REST API and CLI support has been added for extension artifact management on HDFS. Please Refer to [[falconcli/FalconCLI][Falcon CLI]] and [[restapi/ResourceList][REST API]] for more details.
-
----++ CLI and REST API support
-REST APIs and CLI support has been added to manage extension jobs and instances.
-
-Please Refer to [[falconcli/FalconCLI][Falcon CLI]] and [[restapi/ResourceList][REST API]] for more details on usage of CLI and REST API's for extension jobs and instances management.
-
----++ Metrics
-HDFS mirroring and Hive mirroring extensions will capture the replication metrics like TIMETAKEN, BYTESCOPIED, COPY (number of files copied) for an instance and populate to the GraphDB.
-
----++ Sample extensions
-
-Sample extensions are published in addons/extensions
-
----++ Types of extensions
-   * [[HDFSMirroring][HDFS mirroring extension]]
-   * [[HiveMirroring][Hive mirroring extension]]
-   * [[HdfsSnapshotMirroring][HDFS snapshot based mirroring]]
-
----++ Packaging and installation
-
-This feature is enabled by default but could be disabled by removing the following from startup properties:
-<verbatim>
-config name: *.application.services
-config value: org.apache.falcon.extensions.ExtensionService
-</verbatim>
-
-ExtensionService should be added before ConfigurationStore in startup properties for application services configuration.
-For manual installation user is expected to update "extension.store.uri" property defined in startup properties with
-HDFS path where the extension artifacts will be copied to.
-Extension artifacts in addons/extensions are packaged in falcon. For manual installation once the Falcon Server is setup user is expected to copy the extension artifacts under {falcon-server-dir}/extensions to HDFS at "extension.store.uri" path defined in startup properties and then restart Falcon.
-
----++ Migration
-Recipes framework and HDFS mirroring capability was added in Apache Falcon 0.6.0 release and it was client side logic. With 0.10 release its moved to server side and renamed as server side extensions. Client side recipes only had CLI support and expected certain pre steps to get it working. This is no longer required in 0.10 release as new CLI and REST API support has been provided.
-
-Migrating to 0.10 release and above is not backward compatible for Recipes. If user is migrating to 0.10 release and above then old Recipe setup and CLI's won't work. For manual installation user is expected to copy Extension artifacts to HDFS. Please refer "Packaging and installation" section above for more details.
-Please Refer to [[falconcli/FalconCLI][Falcon CLI]] and [[restapi/ResourceList][REST API]] for more details on usage of CLI and REST API's for extension jobs and instances management.
diff --git a/docs/src/site/twiki/TrustedExtensions.twiki b/docs/src/site/twiki/TrustedExtensions.twiki
new file mode 100644
index 0000000..cfe825c
--- /dev/null
+++ b/docs/src/site/twiki/TrustedExtensions.twiki
@@ -0,0 +1,63 @@
+---+ Falcon Trusted Extensions
+
+---++ Overview
+
+A Falcon Trusted extension is a static process template with parameterized workflow to realize a specific use case and enable non-programmers to capture and re-use very complex business logic. Trusted Extensions are defined in server space. The Trusted Extension solves standard data management function that can be invoked as a tool using the standard Falcon features (REST API, CLI and UI access) supporting standard falcon features.
+
+Falcon's trusted extension provides a Process abstraction that encapsulates the configuration for a user workflow with scheduling controls. All extensions can be modeled as a Process and its dependent feeds with in Falcon which executes the user
+workflow periodically. The process and its associated workflow are parameterized. The user will provide properties which are <name, value> pairs that are substituted by falcon before scheduling it. Falcon translates these extensions
+as a process entity by replacing the parameters in the workflow definition.
+
+For example:
+
+   * Replicating directories from one HDFS cluster to another (not timed partitions)
+   * Replicating hive metadata (database, table, views, etc.)
+   * Replicating between HDFS and Hive - either way
+   * Data masking etc.
+
+---++ Falcon extension artifacts to manage trusted extensions
+
+Extension artifacts are published in addons/extensions. Artifacts are expected to be installed on HDFS at "extension.store.uri" path defined in startup properties. Each extension is expected to have the below artifacts
+   * json file under META directory lists all the required and optional parameters/arguments for scheduling extension job
+   * process entity template to be scheduled under resources directory
+   * parameterized workflow under resources directory
+   * required libs under the libs directory
+   * README describing the functionality achieved by extension
+
+REST API and CLI support has been added for extension artifact management on HDFS. Please Refer to [[falconcli/FalconCLI][Falcon CLI]] and [[restapi/ResourceList][REST API]] for more details.
+
+---++ CLI and REST API support
+REST APIs and CLI support has been added to manage extension jobs and instances.
+
+Please Refer to [[falconcli/FalconCLI][Falcon CLI]] and [[restapi/ResourceList][REST API]] for more details on usage of CLI and REST API's for extension jobs and instances management.
+
+---++ Metrics
+HDFS mirroring and Hive mirroring extensions will capture the replication metrics like TIMETAKEN, BYTESCOPIED, COPY (number of files copied) for an instance and populate to the GraphDB.
+
+---++ Sample trusted extensions
+
+Sample extensions are published in addons/extensions
+
+---++ Types of extensions
+   * [[HDFSMirroring][HDFS mirroring extension]]
+   * [[HiveMirroring][Hive mirroring extension]]
+   * [[HdfsSnapshotMirroring][HDFS snapshot based mirroring]]
+
+---++ Packaging and installation
+
+This feature is enabled by default but could be disabled by removing the following from startup properties:
+<verbatim>
+config name: *.application.services
+config value: org.apache.falcon.extensions.ExtensionService
+</verbatim>
+
+ExtensionService should be added before ConfigurationStore in startup properties for application services configuration.
+For manual installation user is expected to update "extension.store.uri" property defined in startup properties with
+HDFS path where the extension artifacts will be copied to.
+Extension artifacts in addons/extensions are packaged in falcon. For manual installation once the Falcon Server is setup user is expected to copy the extension artifacts under {falcon-server-dir}/extensions to HDFS at "extension.store.uri" path defined in startup properties and then restart Falcon.
+
+---++ Migration
+Recipes framework and HDFS mirroring capability was added in Apache Falcon 0.6.0 release and it was client side logic. With 0.10 release its moved to server side and renamed as server side extensions. Client side recipes only had CLI support and expected certain pre steps to get it working. This is no longer required in 0.10 release as new CLI and REST API support has been provided.
+
+Migrating to 0.10 release and above is not backward compatible for Recipes. If user is migrating to 0.10 release and above then old Recipe setup and CLI's won't work. For manual installation user is expected to copy Extension artifacts to HDFS. Please refer "Packaging and installation" section above for more details.
+Please Refer to [[falconcli/FalconCLI][Falcon CLI]] and [[restapi/ResourceList][REST API]] for more details on usage of CLI and REST API's for extension jobs and instances management.
diff --git a/docs/src/site/twiki/UserExtensions.twiki b/docs/src/site/twiki/UserExtensions.twiki
new file mode 100644
index 0000000..99a5354
--- /dev/null
+++ b/docs/src/site/twiki/UserExtensions.twiki
@@ -0,0 +1,257 @@
+---+ Falcon User Extensions
+
+---++ Overview
+
+A Falcon user extension is a solution template that comprises of Falcon entities that a Falcon user can build. The extension so built can be made available to and discovered by other users (sharing the same Falcon installation).
+Extension users can instantiate the solution template and deploy it by providing relevant parameters.
+
+   * <a href="#Extension Developer">Extension Developer</a>
+
+   * <a href="#Extension User">Extension User</a>
+
+
+---++ Extension Developer
+This user is the developer of an user extension. Following are the various actions a Falcon extension developer performs:
+
+   * Implements the Extension Builder SPI: The SPI contains the following high level behaviour:
+      * Validate the user supplied parameters.
+      * Generate Entities from user supplied parameters.
+      * Get dataset schemas if it generates an o/p feed.
+   * Packages extension as per the guidelines of the framework. See <a href="#Packaging structure for an extension ">Packaging Structure</a> for more details.
+      * README, user param template.
+      * build time jars, resources etc.
+   * Either resources/Jars(packaged) need to provide the canoncial class name under META-INF/services in the Extension Builder SPI classname file in resoruces.
+        Example: org.apache.falcon.ExtensionExample (needs ot be the entry in the META-INF/services/org.apache.falcon.extensions.ExtensionBuilder)
+   * Extension User can use FalconUnit to test an instantiation of the extension.
+        Registers/unregisters/disables an extension with Falcon. See <a href="#Extension Registration API ">Extension Registration API</a> for more details
+
+---+++ Extension Registration API
+---++++ Extension Registration:
+An extension developer has to register his extension with Falcon in order to make it discoverable by other users. Registration of the extension, enables it by default.
+
+---++++ Command:
+falcon extension -register <extension name> -path <location> [-description <short description>]
+
+---++++ Sample Command:
+falcon extension -register data-quality-check -path hdfs://<nn:port>/projects/abc/data-quality-extension [-description “Ensures data complies with schema”]
+
+Note that location will be under user space.
+
+
+---+++ Deregistration of an user extension
+An extension developer can unregister an extension. This is as good as deleting an extension, however, Falcon will not delete the extension package (in the HDFS location). Falcon will delete all references to the extension. Falcon will validate if there are no instances of this extension at the time of unregistering. If present, un-registration will fail.
+
+---++++ Command:
+falcon extension -unregister <extension name>
+
+---++++ Sample Command:
+falcon extension -unregister data-quality-check
+
+---+++ Disabling an user extension
+An extension developer can disable an extension. Existing instances of this extension will not be affected. When disabled, new instances of the extension  cannot be created. Basically, submit and update operations will not be possible. But, a user can delete an existing extension job.
+
+---++++ Command:
+falcon extension -disable <extension name>
+---++++ Sample Command:
+falcon extension -disable data-quality-check
+
+---+++ Enabling a previously disabled user extension
+An extension developer can enable a previously disabled extension. When enabled, new instances of the extension can be created. Submit, update and delete operations on an extension job can be performed.
+
+---++++ Command:
+falcon extension -enable <extension name>
+---++++ Sample Command:
+falcon extension -enable data-quality-checks
+
+
+---+++ Packaging structure for an extension
+The packaging structure for an user extensions will continue to be consistent with that of trusted extensions:
+<verbatim>
+|-- <extension name>
+    |-- README // This file will be ‘cat’ when extension user issues the “describe” command.
+    |-- META
+        |-- properties file // This file will be ‘cat’ when extension user issues the “definition” command.
+    |-- libs // Libraries to used during build time (instantiation of extension) and runtime
+        |-- build
+    |-- resources (OPTIONAL)// Other resource files to be used during build time (instantiation of extension)
+        |-- build
+</verbatim>
+The jar supplied in the build directory should be packaged with the following file META-INF/services/org.apache.falcon.extensions.ExtensionBuilder
+containing the full class name of the class which implements ExtensionBuilder.
+---++++Command: falcon extension -definition -name <extension name>
+ Sample Output
+<verbatim>
+##### Start configuration:
+
+pipeline.name: "my business name"
+xml {
+  schema.path: "/hdfs/path/to/schema.xml"
+  metadata.path: "/hdfs/path/to/metadata.xml"
+}
+start: "2016-01-01T00:00Z"
+end: "2017-01-01T00:00Z"
+queue: "the queue the pipeline will run in"
+falcon-cluster {
+  clusters {
+    cluster1: {} // empty object will inherit start and end dates from above
+    cluster2: {start: "2016-06-01T00:00Z"} // cluster2 was added in between
+    cluster3: {end: "2016-06-01T00:00Z"} // cluster3 was taken out in between
+    }
+  }
+}
+# A note on path.prefix used below. The path of a falcon feed is constructed
+</verbatim>
+
+
+
+---++ Extension User
+The Extension user is the one who will use and instantiate the extensions that are available.
+
+---+++Enumerate Extensions(including trusted extensions)
+---++++Command: falcon extension -enumerate
+Sample Output (Could be tabular format too, instead of plain JSON): {
+<verbatim>
+  "extensions": [
+    {
+      "name": "hdfs-mirroring",
+      "type": "Trusted extension",
+      "description": "This extension implements replicating arbitrary directories on HDFS from one Hadoop cluster to another Hadoop cluster. This piggy backs on replication solution in Falcon which uses the DistCp tool.",
+      “location”: “fs://<falcon-home>/extensions/hdfs-mirroring”
+    },
+    {
+      "name": "data-quality-check",
+      "type": "User extension",
+      "description": "This extension allows you to check if the data complies with the schema."
+      “location” : “hdfs://<nn:port>/projects/<user-dir>/falcon-extensions/…”
+    },
+    "totalResults": 2
+]
+</verbatim>
+---+++ Get description/details of an extension
+Displays the README supplied by the extension Developer “as-it-is”.
+---++++ Command: falcon extension -describe -extensionName <extension name>
+<verbatim>
+Sample Command : falcon extension -describe -extensionName data-quality-check
+Sample Output :
+The data quality extension allows you…..
+
+</verbatim>
+
+---+++Details of an extension
+Gives the details that Extension Developer has supplied during extension registration.
+---++++Command: falcon extension -detail -extensionName <extension name>
+<verbatim>
+Sample Command : falcon extension -detail -extensionName hdfs-mirroring
+Sample Output :
+{
+      "name": "hdfs-mirroring",
+      "type": "Trusted extension",
+      "description": "This extension implements replicating arbitrary directories on HDFS from one Hadoop cluster to another Hadoop cluster. This piggy backs on replication solution in Falcon which uses the DistCp tool.",
+      “location”: “fs://<falcon-home>/extensions/hdfs-mirroring”
+}
+</verbatim>
+
+---+++ Extension Instantiation validation.
+Validates the configs supplied for an instantiation of the given extension.
+<verbatim>
+Sample Command: falcon extension -validate -jobName dataQualityTestJob -extensionName data-quality-check -file extensions/hdfs-mirroring//META/hdfs-mirroring-properties.json
+</verbatim>
+
+---+++ List of the instantiations of an extension
+Lists all the instances of a given extension. Extension Name is optional, when not passed will list down all the extension jobs.
+---++++Command: bin/falcon extension -list -extensionName <extension name>
+<verbatim>
+Sample Output (Could be tabular format too, instead of plain JSON):
+"jobs": [
+    {
+      "name": "data-quality-abc",
+       “feeds”: [“feed1”, “feed2”]
+       “processes” : [“process1”, “process2”]
+    },
+    {
+      "name": "data-quality-xyz",
+      “feeds”: [“feed3”, “feed4”]
+      “processes” : [“process3”, “process4”]
+    },
+    "totalResults": 2
+}
+</verbatim>
+
+---+++Instantiate an extension (Idempotent)
+User can submit an instance of the extension. This generates the entity definitions and submits all entities of an extension to the Prism Server.
+
+Note that only Feeds and Processes (and not clusters) are generated as part instantiation. Also, these entities will be tagged by the framework to the extension job, so that all entities related to an extension job can be identified and tracked.
+
+Commands:
+falcon extension -submit -extensionName <extension name> -jobName <name to be used in all the extension entities> -file <path to extension parameters>
+falcon extension -submitAndSchedule -extensionName <extension name> -jobName <name to be used in all the extension entities> -file <path to extension parameters>
+<verbatim>
+Sample Output:
+falcon/default/Submit successful (feed) out-feed-abc
+submit/falcon/default/Submit successful (process) pig-process-abc
+</verbatim>
+
+---+++ Details of an extension job
+Gets the details of a particular job:
+---++++Command: bin/falcon extension -detail -jobName <jobName>
+<verbatim>
+Sample Output :
+    {
+      "name": "data-quality-abc",
+       “feeds”: [“feed1”, “feed2”]
+       “processes” : [“process1”, “process2”]
+    }
+</verbatim>
+
+---+++Schedule an Extension Job(Idempotent)
+A submitted extension job can be scheduled.
+As of now there is no support for scheduling in specific colos in distributed mode.
+---++++Commands: falcon extension -schedule -jobName <name to be used in all the extension entities>
+<verbatim>
+Sample Output:
+falcon/default/resume successful (feed) out-feed-abc
+submit/falcon/default/resume successful (process) pig-process-abc
+</verbatim>
+
+---+++ Update an extension job
+Lets the user change properties and/or Update/re-generate instance of an extension (Idempotent)
+Users will have an option to update a particular instance of an extension. He can supply new parameters during update. However, an update will always update the entities to use the latest libs of the extension.
+---++++Commands:
+<verbatim>
+falcon extension -update -jobName <name to be used in all the extension entities> -file <path to extension parameters>
+Sample Output:
+falcon/default/Update successful (feed) out-feed-abc
+submit/falcon/default/Update successful (process) pig-process-abc
+</verbatim>
+
+---+++Delete an Extension Job(Idempotent)
+Deletion of the entities that are part of an extension can only be deleted by deleting an extension.
+All the write operations(submit, submitAndSchedule, update and delete) of entites that are part of an extension job can be done only via extension job apis.
+---++++Commands: falcon extension -delete -jobName <name to be used in all the extension entities>
+<verbatim>
+Sample Output:
+falcon/default/Delete successful (feed) out-feed-abc
+submit/falcon/default/Delete successful (process) pig-process-abc
+</verbatim>
+
+---+++Suspend an Extension Job(Idempotent)
+Suspending an extension job will suspend all the entities that are part of the extension.
+As of now there is no support for suspending in specific colos in distributed mode.
+---++++Commands: falcon extension -suspend -jobName <name to be used in all the extension entities>
+<verbatim>
+Sample Output:
+falcon/default/suspend successful (feed) out-feed-abc
+submit/falcon/default/suspend successful (process) pig-process-abc
+</verbatim>
+
+---+++Resume an Extension Job(Idempotent)
+Resuming an extension job will resume all the entities that are part of the extension.
+As of now there is no support for resuming in specific colos in distributed mode.
+---++++Commands: falcon extension -resume -jobName <name to be used in all the extension entities>
+<verbatim>
+Sample Output:
+falcon/default/resume successful (feed) out-feed-abc
+submit/falcon/default/resume successful (process) pig-process-abc
+</verbatim>
+
+All instance(instance of an entity) level operations will have to be performed by Falcon instance operations.
\ No newline at end of file