One of the most important meta files of a vault checkout or a content package is the filter.xml
which is present in the META-INF/vault
directory. The filter.xml
is used to load and initialize the WorkspaceFilter. The workspace filter defines what parts of the JCR repository are imported or exported during the respective operations through vlt
or package management.
The filter.xml
consists of a set of filter
elements, each with a root
attribute and an optional list of include
and exclude
child elements.
Example:
<workspaceFilter version="1.0"> <filter root="/apps/project1" /> <filter root="/etc/project1"> <exclude pattern=".*\.gif" /> <include pattern="/etc/project1/static(/.*)?" /> </filter> <filter root="/etc/map" mode="merge" /> </workspaceFilter>
The filter elements are independent of each other and define include and exclude patters for subtrees. The root of a subtree is defined by the root
attribute, which must be an absolute path.
The filter element can have an optional mode
attribute which specified the import mode used when importing content. the following values are possible:
“replace” : This is the normal behavior. Existing content is replaced completely by the imported content, i.e. is overridden or deleted accordingly.
“merge” : Existing content is not modified, i.e. only new content is added and none is deleted or modified.
“update” : Existing content is updated, new content is added and none is deleted.
For a more detailed description of the import mode, see here
The include and exclude elements allow more fine grained filtering of the subtree during import and export. they have a mandatory pattern
attribute which has the format of a regexp. The regexp is matched against the full path of the respective or potential JCR node, so it either must start with /
(absolute regex) or a wildcard (relative regex).
The order of the include and exclude elements is important. the paths are tested in a sequential order against all patterns and the type of the last matching element determines if the path is included or not. One caveat is, that the type of the first pattern defines the default behavior, so that the filter is more natural to write. If the first pattern is include, then the default is exclude and vice versa.
The following example only includes the nodes in /tmp
that end with .gif
.
<filter root="/tmp"> <include pattern=".*\.gif"/> </filter>
The following example includes all nodes in /tmp
except those that end with .gif
.
<filter root="/tmp"> <exclude pattern=".*\.gif"/> </filter>
Since FileVault 3.1.28 (JCRVLT-120) it is not only possible to filter on node level but also only include/exclude certain properties below a certain node by setting the attribute matchProperties
on the exlude
/include
element to true
.
<filter root="/tmp"> <exclude pattern="/tmp/property1" matchProperties="true"/> </filter>
Then the pattern
is matched against property paths instead of node paths. If the attribute matchProperties
is not set all properties below the given node paths are included/excluded. Otherwise the excluded properties are not contained in the exported package and during import not touched in the repository.
When exporting content into the filesystem or a content package, the workspace filter defines which nodes are serialized. It is important to know, that only the nodes that match the filter are actually traversed, which can lead to unexpected results.
for example:
<filter root="/tmp"> <include pattern="/tmp/a(/.*)?"/> <include pattern="/tmp/b/c(/.*)?"/> </filter>
Will include the /tmp/a
subtree, but not the /tmp/b/c
subtree, since /tmp/b
does not match the filter and is therefor not traversed.
There is one exception, if all the pattern are relative (i.e. don't start with a slash), then the algorithm is:
When importing (i.e. installing) content packages into a repository the workspace filter defines which nodes are deserialized and overwritten in the repository. Nodes/Properties being covered by some filter rules but not contained in the to be imported content are removed from the repository.
The exact rules are outlined below
Item covered by filter rule | Item contained in the Content Package | Item contained in the Repository (prior to Import/Installation) | State of Item in Repository after Import/Installation |
---|---|---|---|
no | yes | yes | not touched |
no | no | yes | not touched |
no | yes | no | deserialized from content package (for backwards compatibility reasons), this should not be used, i.e. all items in the content package should always be covered by some filter rule |
no | no | no | not existing (not touched) |
yes | yes | yes | overwritten |
yes | no | yes | removed |
yes | yes | no | deserialized from content package |
yes | no | no | not existing |
Content Package Filter
<filter root="/tmp"> <include pattern="/tmp/a(/.*)?"/> <include pattern="/tmp/b(/.*)?/> <exclude pattern="/tmp/b/property1" matchProperties="true"/> <include pattern="/tmp/c(/.*)?"/> </filter>
Content Package Serialized Content
+ /jcr_root/ + tmp/ + a/ - property1="new" + b/ - property1="new" - property2="new"
+ /tmp/ + b/ - property1="old" - property2="old" + c/ - property1="old"
+ /tmp/ + a/ - property1="new" + b/ - property1="old" - property2="new"