tool for doing extract - transform - load operations through a resource tree configuration
often one-shot data transformations need sample code to be written & executed. This tiny tool set intends to provide ability to do such transformations with proven & reusable blocks called pipes, streaming resources from one to the other.
getOutputBinding ^ | getInput +---+---+ getOutput | | +----> Pipe +----> | | +-------+
A sling pipe is essentially a sling resource stream:
At the moment, there are 3 types of pipes to consider:
A Plumber
osgi service is provided to help getting & executing pipes.
a pipe configuration is a jcr node, with:
sling:resourceType
property, which must be a pipe type registered by the plumbername
property, that will be used in bindings as an id, and will be the key for the output bindings (default value being a value map of the current output resource). Note that the node name will be used in case no name is provided.path
property, if configured, will override upstream's pipe output as an input.expr
property, expression through which the pipe will execute (depending on the type)additionalBinding
is a node you can add to set “global” bindings (property=value) in pipe executionadditionalScripts
is a multi value property to declare scripts that can be reused in expressionsconf
optional child node that contains addition configuration of the pipe (depending on the type)rather dummy pipe, outputs what is in input (so what is configured in path). Handy for doing some test mostly, and giving basic functionalities to others that inherit from it
sling:resourceType
is slingPipes/base
executes $(getInput()).children(expression)
sling:resourceType
is slingPipes/slingQuery
expr
mandatory property, contains slingQuery expression through which getInput()'s children will be computed to getOutput()feeds bindings with remote json
sling:resourceType
is slingPipes/json
expr
mandatory property contains url that will be called, the json be sent to the output bindings, getOutput = getInput. An empty url or a failing url will block the pipe at that given place.iterates through values of input multi value property and write them to bindings
sling:resourceType
is slingPipes/multiProperty
path
should be the path of a mv propertyretrieve resources resulting of an xpath query
sling:resourceType
is slingPipes/xpath
expr
should be a valid xpath queryfeeds bindings with remote json
sling:resourceType
is slingPipes/json
expr
mandatory property contains url that will be called, the json be sent to the output bindings, getOutput = getInput. An empty url or a failing url will block the pipe at that given place.retrieve authorizable resource corresponding to the id passed in expression, or if not found (or void expression), from the input path, output the found authorizable's resource
sling:resourceType
is slingPipes/authorizable
expr
should be an authorizable id, or void (but then input should be an authorizable)autoCreateGroup
(boolean) if autorizable id is here, but the authorizable not present, then create group with given id (in that case, considered as a write pipe)addMembers
(stringified json array) if authorizable is a group, add instanciated members to it (in that case, considered as a write pipe)addToGroup
(expression) add found authorizable to instanciated group (in that case, considered as a write pipe)bindMembers
(boolean) if found authorizable is a group, bind the members (in that case, considered as a write pipe)outputs the parent resource of input resource
sling:resourceType
is slingPipes/parent
outputs the input resource if its matches its configuration
sling:resourceType
is slingPipes/filter
conf
node tree that will be tested against the current input of the pipe, each /conf/sub@prop=value
will triggers a test on ./sub@prop
property of the current input, testing if its value matches value
regex. If the special slingPipesFilter_noChildren=${true}
property is there with the value instantiated as a true boolean, then filter will pass if corresponding node has no children.assemble a sequence of pipes
sling:resourceType
is slingPipes/container
conf
node contains child pipes' configurations, that will be configured in the order they are found (note you should use sling:OrderedFolder)execute the pipe referenced in path property
sling:resourceType
is slingPipes/reference
path
path of the referenced pipewrites given properties to current input
sling:resourceType
is slingPipes/slingQuery
conf
node tree that will be copied to the current input of the pipe, each node's properties names and value will be written to the input resource. Input resource will be outputed.JCR move of current input to target path (can be a node or a property)
sling:resourceType
is slingPipes/mv
expr
target path, note that parent path must existsremoves the input resource, returns the parent, regardless of the resource being a node, or a property
sling:resourceType
is slingPipes/rm
conf
node tree that will be used to filter relative properties & subtrees to the current resource to remove. A subnode is considered to be removed if it has no property configured, nore any child.get or create path given in expression
sling:resourceType
is slingPipes/path
nodeType
node type of the intermediate nodes to createautosave
should save at each creation (will make things slow, but sometimes you don't have choice)in order to make things interesting, most of the configurations are javascript template strings, hence valid js expressions reusing bindings (from configuration, or other pipes).
Following configurations are evaluated:
path
expr
you can use name of previous pipes in the pipe container, or the special binding path
, where path.previousPipe
is the path of the current resource of previous pipe named previousPipe
global bindings can be set at pipe execution, external scripts can be added to the execution as well (see pipe configurations)
for now it's possible to execute Pipes through GET (read) or POST (read/write) commands:
etc/pipes
and then to executecurl -u admin:admin -F "path=/etc/pipes/mySamplePipe" http://localhost:8080/etc/pipes.json
curl -u admin:admin http://localhost:8080/etc/pipes/mySamplePipe.json
which will return you the path of the pipes that have been through the output of the configured pipe.
binding
you can add as bindings
parameter a json object of global bindings you want to add for the execution of the pipe
e.g.
curl -u admin:admin -F "path=/etc/pipes/test" -F "bindings={testBinding:'foo'}" http://localhost:4502/etc/pipes.json
will returns something like
["/one/output/resource", "another/one"]
writer
you can add as writer
parameter a json object as a pattern to the result you want to have. The values of the json object are expressions and can reuse each pipe's subpipe binding. Note this works only if the pipe called is a container pipe.
e.g.
curl -u admin:admin http://localhost:4502/etc/pipes/users.json?writer={"user":"${user.fullName}"}
will returns something similar to
[{'user':'John Smith','path':'/home/users/q/q123jk1UAZS'},{'user':'John Doe','path':'/home/users/q/q153jk1UAZS'}]
dryRun
if parameter dryRun is set to true, and the executed pipe is supposed to modify content, it will log (at best it can) the change it would have done, without doing anything
this pipe parse all profile nodes, and
{ "sling:resourceType":"slingPipes/container", "name":"Dummy User prefix Sample", "jcr:description":"prefix all full names of profile with "Mr" or "Ms" depending on gender", "conf":{ "profile": { "sling:resourceType":"slingPipes/slingQuery", "expr":"nt:unstructured#profile", "path":"/home/users" }, "writeFullName": { "sling:resourceType":"slingPipes/write", "conf": { "fullName":"${(profile.gender === 'female' ? 'Ms ' + profile.fullName : 'Mr ' + profile.fullName)}", "generatedBy":"slingPipes" } } } }
{ "jcr:primaryType": "sling:Folder", "jcr:description": "move badge<->user relation ship from badge MV property to a user MV property" "name": "badges", "sling:resourceType": "slingPipes/container", "conf": { "jcr:primaryType": "sling:OrderedFolder", "badge": { "jcr:primaryType": "sling:Folder", "jcr:description": "outputs all badge component resources", "expr": "[sling:resourceType=myApp/components/badge]", "path": "/etc/badges/badges-admin/jcr:content", "sling:resourceType": "slingPipes/slingQuery" }, "profile": { "jcr:primaryType": "sling:Folder", "jcr:description": "retrieve all user ids from a mv property", "path": "${path.badge}/profiles", "sling:resourceType": "slingPipes/multiProperty" }, "user": { "jcr:primaryType": "sling:OrderedFolder", "jcr:description": "outputs user resource", "expr": "profile", "sling:resourceType": "slingPipes/authorizable" }, "write": { "jcr:primaryType": "sling:OrderedFolder", "jcr:descritption": "patches the badge path to the badges property of the user profile" "path": "${path.user}/profile", "sling:resourceType": "slingPipes/write", "conf": { "jcr:primaryType": "nt:unstructured", "badges": "+[${path.badge}]" } } } }
this use case is for completing repository profiles with external system's data (that has an json api)
{ "jcr:primaryType": "nt:unstructured", "jcr:description": "this pipe retrieves json info from an external system and writes them to the user profile, uses moment.js, it distributes modified resources using publish distribution agent", "sling:resourceType": "slingPipes/container", "distribution.agent": "publish", "additionalScripts": "/etc/source/moment.js", "conf": { "jcr:primaryType": "sling:OrderedFolder", "profile": { "jcr:primaryType": "sling:OrderedFolder", "expr": "/jcr:root/home/users//element(profile,nt:unstructured)[@uid]", "jcr:description": "query all user profile nodes", "sling:resourceType": "slingPipes/xpath" }, "json": { "jcr:primaryType": "sling:OrderedFolder", "expr": "${(profile.uid ? 'https://my.external.system.corp.com/profiles/' + profile.uid.substr(0,2) + '/' + profile.uid + '.json' : '')", "jcr:description": "retrieves json information relative to the given profile, if the uid is not found, expr is empty: the pipe will do nothing", "sling:resourceType": "slingPipes/json" }, "write": { "jcr:primaryType": "sling:OrderedFolder", "path": "path.profile", "jcr:description": "write json information to the profile node", "sling:resourceType": "slingPipes/write", "conf": { "jcr:primaryType": "sling:OrderedFolder", "jcr:createdBy": "admin", "background": "${json.opt('background')}", "about": "${json.opt('about')}", "jcr:created": "Fri Jul 03 2015 15:32:22 GMT+0200", "birthday": "${(json.opt('birthday') ? moment(json.opt('birthday'), \"MMMM DD\").toDate() : '')}", "mobile": "${json.opt('mobile')}" } } } }
{ "jcr:primaryType": "nt:unstructured", "jcr:description": "this pipe removes user with bad property in their profile", "sling:resourceType": "slingPipes/container", "conf": { "jcr:primaryType": "sling:OrderedFolder", "profile": { "jcr:primaryType": "sling:OrderedFolder", "expr": "/jcr:root/home/users//element(profile,nt:unstructured)[@bad]", "jcr:description": "query all user profile nodes with bad properties", "sling:resourceType": "slingPipes/xpath" }, "parent": { "jcr:primaryType": "sling:OrderedFolder", "jcr:description": "get the parent node (user node)", "sling:resourceType": "slingPipes/parent" }, "rm": { "jcr:primaryType": "sling:OrderedFolder", "jcr:description": "remove it", "sling:resourceType": "slingPipes/rm", } } }
some other samples are in https://github.com/npeltier/sling-pipes/tree/master/src/test/
For running this tool on a sling instance you need: