| <!DOCTYPE html> |
| |
| |
| <!-- |
| | Generated by Apache Maven Doxia Site Renderer 2.0.0 from src/site/markdown/development/write_file_processor.md at 2025-09-23 |
| | Rendered using Apache Maven Fluido Skin 2.1.0 |
| --> |
| <html xmlns="http://www.w3.org/1999/xhtml" lang="en"> |
| <head> |
| <meta charset="UTF-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1" /> |
| <meta name="generator" content="Apache Maven Doxia Site Renderer 2.0.0" /> |
| <title>Writing a new File Processor – Apache RAT™ Core</title> |
| <link rel="stylesheet" href="../css/apache-maven-fluido-2.1.0.min.css" /> |
| <link rel="stylesheet" href="../css/site.css" /> |
| <link rel="stylesheet" href="../css/print.css" media="print" /> |
| <script src="../js/apache-maven-fluido-2.1.0.min.js"></script> |
| <link href="https://creadur.apache.org/font/matesc.css" type="text/css" rel="stylesheet" /> |
| </head> |
| <body> |
| <a class="github-fork-ribbon right-top" href="https://github.com/apache/creadur-rat" data-ribbon="Fork me on GitHub">Fork me on GitHub</a> |
| <div class="container-fluid container-fluid-top"> |
| <header> |
| <div id="banner"> |
| <div class="pull-left"><div id="bannerLeft"><h1><a href="https://www.apache.org/"><img src="https://www.apache.org/img/asf_logo.png" alt="The Apache Software Foundation" /> Apache RAT</a></h1></div></div> |
| <div class="pull-right"></div> |
| <div class="clear"><hr/></div> |
| </div> |
| |
| <div id="breadcrumbs"> |
| <ul class="breadcrumb"> |
| <li id="publishDate">Last Published: 2025-09-23<span class="divider">|</span> |
| </li> |
| <li id="projectVersion">Version: 0.17-SNAPSHOT<span class="divider">|</span></li> |
| <li><a href="https://www.apache.org/">Apache</a><span class="divider">/</span></li> |
| <li><a href="../../../">Creadur</a><span class="divider">/</span></li> |
| <li><a href="../../">RAT</a><span class="divider">/</span></li> |
| <li><a href="../index.html">Apache RAT™ Core</a><span class="divider">/</span></li> |
| <li class="active">Writing a new File Processor</li> |
| </ul> |
| </div> |
| </header> |
| <div class="row-fluid"> |
| <header id="leftColumn" class="span2"> |
| <nav class="well sidebar-nav"> |
| <ul class="nav nav-list"> |
| <li class="nav-header">Parent Project</li> |
| <li><a href="../../index.html">Apache Creadur RAT</a></li> |
| <li class="nav-header">Project Documentation</li> |
| <li><a href="../project-info.html"><span class="icon-chevron-right"></span>Project Information</a></li> |
| <li><a href="../project-reports.html"><span class="icon-chevron-right"></span>Project Reports</a></li> |
| <li class="nav-header">Apache RAT™</li> |
| <li><a href="../../index.html">Introducing RAT</a></li> |
| <li><a href="../../download_rat.cgi">Downloads</a></li> |
| <li><a href="../../changes.html">Changes</a></li> |
| <li class="nav-header">RAT from the Command Line</li> |
| <li><a href="../../apache-rat/index.html">Command Line Introduction</a></li> |
| <li><a href="../../apache-rat/cli_options.html">Command Line Options</a></li> |
| <li><a href="../../apache-rat/env_vars.html">Environment Variables</a></li> |
| <li><a href="../../apache-rat-core/exclusion_expression.html">Exclusion Expressions</a></li> |
| <li><a href="../../apache-rat/standard_collections.html">Standard Collections</a></li> |
| <li class="nav-header">RAT from Ant</li> |
| <li><a href="../../apache-rat-tasks/index.html">Ant Task Introduction</a></li> |
| <li><a href="../../apache-rat-tasks/ant_options.html">Ant Elements and Attributes</a></li> |
| <li class="nav-header">RAT from Maven</li> |
| <li><a href="../../apache-rat-plugin/index.html">Maven Plugin Introduction</a></li> |
| <li><a href="../../apache-rat-plugin/mvn_options.html">Maven Options</a></li> |
| <li><a href="../../apache-rat-plugin/examples/index.html">Maven Examples</a></li> |
| <li class="nav-header">Configuring RAT</li> |
| <li><a href="../../apache-rat/name_xref.html">Option Name Cross Reference</a></li> |
| <li><a href="../../apache-rat/default_licenses.html">Default Licenses</a></li> |
| <li><a href="../../apache-rat/default_matchers.html">Default Matchers</a></li> |
| <li><a href="../../license_def.html">Defining New Licenses</a></li> |
| <li><a href="../../apache-rat/xsd.html">Configuration XSD</a></li> |
| <li><a href="https://gitbox.apache.org/repos/asf/creadur-rat/blob/master/apache-rat-core/src/main/resources/org/apache/rat/default.xml">Default Configuration</a></li> |
| <li><a href="../../apache-rat/detecting_generated_files.html">Detecting Generated Files</a></li> |
| <li class="nav-header">RAT Output</li> |
| <li><a href="../../apache-rat/output/example.html">Standard Output Example</a></li> |
| <li><a href="https://gitbox.apache.org/repos/asf/creadur-rat/blob/master/apache-rat-core/src/main/resources/org/apache/rat/rat-report.xsd">Output XSD</a></li> |
| <li><a href="https://gitbox.apache.org/repos/asf/creadur-rat/blob/master/apache-rat-core/src/main/resources/org/apache/rat/plain-rat.xsl">XSLT - Plain text</a></li> |
| <li><a href="https://gitbox.apache.org/repos/asf/creadur-rat/blob/master/apache-rat-core/src/main/resources/org/apache/rat/missing-headers.xsl">XSLT - Missing headers list</a></li> |
| <li><a href="https://gitbox.apache.org/repos/asf/creadur-rat/blob/master/apache-rat-core/src/main/resources/org/apache/rat/unapproved-licenses.xsl">XSLT - Unapproved licenses list</a></li> |
| <li class="nav-header">Developing RAT</li> |
| <li><a href="../../architecture.html">Architecture</a></li> |
| <li><a href="../../apidocs/index.html">Javadocs</a></li> |
| <li><a href="../../apache-rat-core/development/document_name.html">Document Name concept</a></li> |
| <li><a href="../../development/ui_implementation.html">UI Development</a></li> |
| <li><a href="../../apache-rat-core/development/write_file_processor.html">Writing a File Processor</a></li> |
| <li class="nav-header">Apache Creadur™</li> |
| <li><a href="../../..">Creadur Project Home</a></li> |
| <li><a href="../../../tentacles">Apache Tentacles</a></li> |
| <li><a href="../../../whisker">Apache Whisker</a></li> |
| <li><a href="https://www.apache.org/security/">Security</a></li> |
| <li><a href="https://www.apache.org/licenses/">License</a></li> |
| <li><a href="https://privacy.apache.org/policies/privacy-policy-public.html">Privacy</a></li> |
| <li><a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> |
| <li><a href="https://www.apache.org/foundation/thanks.html">Thanks</a></li> |
| <li class="nav-header">The Apache Software Foundation</li> |
| <li><a href="https://www.apache.org/foundation">About the Foundation</a></li> |
| <li><a href="https://projects.apache.org">The projects</a></li> |
| <li><a href="https://people.apache.org">The people</a></li> |
| <li><a href="https://www.apache.org/foundation/how-it-works.html">How we work</a></li> |
| <li><a href="https://www.apache.org/foundation/how-it-works.html#history">Our history</a></li> |
| <li><a href="https://blogs.apache.org/foundation/">News</a></li> |
| <li class="nav-header">Contribute</li> |
| <li><a href="https://www.apache.org/foundation/getinvolved.html">Get Involved</a></li> |
| <li class="nav-header">Committer Info</li> |
| <li><a href="https://www.apache.org/dev/committers.html">ASF Committers' FAQ</a></li> |
| <li><a href="https://www.apache.org/dev/new-committers-guide.html">New Committers Guide</a></li> |
| <li><a href="https://gitbox.apache.org/repos/asf/creadur-site/blob/asf-site/README.md">How to publish this site</a></li> |
| <li><a href="https://community.apache.org/">Community</a></li> |
| <li><a href="https://www.apache.org/legal/">Legal</a></li> |
| <li><a href="https://www.apache.org/foundation/marks/">Branding</a></li> |
| <li><a href="https://www.apache.org/press/">Media Relations</a></li> |
| </ul> |
| </nav> |
| <div class="well sidebar-nav"> |
| <div id="poweredBy"> |
| <div class="clear"></div> |
| <div class="clear"></div> |
| <a href="https://maven.apache.org/" class="builtBy"><img class="builtBy" src="https://maven.apache.org/images/logos/maven-feather.png" /> Maven</a> |
| </div> |
| </div> |
| </header> |
| <main id="bodyColumn" class="span10"> |
| <!--- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <section><a id="Writing_a_new_File_Processor"></a> |
| <h1>Writing a new File Processor</h1> |
| <blockquote> |
| <section><a id="Required_Knowledge"></a> |
| <h2>Required Knowledge</h2> |
| <p>Knowledge of the following topics is recommended:</p> |
| <ul> |
| |
| <li><a href="document_name.html">DocumentName</a>: The DocumentName class that is used to identify files.</li> |
| <li>RAT <a href="../exclusion_expression.html">Exclude Expressions</a>: The expressions that are used to match file names.</li> |
| </ul> |
| </blockquote> |
| <p>A file processor is a construct that locates files with a specific name in the directory tree and reads from them file patterns that are translated into RAT include or exclude expressions. These files are normally found in the file directory tree and their restrictions normally only applies to files at the same directory level as the processed file or below. When these files are processed the result is a MatcherSet indicating the files to be explicitly included and the files to be excluded. The include and exclude together are called a <code>org.apache.rat.config.exclusion.MatcherSet</code>. MatcherSets are build by a <code>org.apache.rat.config.exclusion.MatcherSet.Builder</code>.</p></section><section><a id="MatcherSet"></a> |
| <h2>MatcherSet</h2> |
| <p>The matcher set comprises two collections of patterns, one to include and one to exclude. These collections are implemented as DocumentNameMatcher instances. The DocumentNameMatcher patterns are fully qualified to the directory in which the document specified by the DocumentName is found.</p> |
| <p>The order of the Match patterns are retained. Multiple MatcherSets may be combined into a single MatcherSet.</p></section><section><a id="DocumentNameMatcher"></a> |
| <h2>DocumentNameMatcher</h2> |
| <p>The document name matcher is, as the name says, used to determine if a document name is matched. It comprises a <code>Predicate</code> to match the file name, the name of the DocumentNameMatcher and a flag to indicate if the matcher is a collection of matchers.</p> |
| <p>The name is used to provide feedback to identify where the restriction comes from. For example the pattern “/**/foo.txt” may have the pattern as the name of the DocumentNameMatcher while a DocumentNameMatcher of exclusions generated by an exclude file called <code>/MyExcludeFile</code> may be called “exluded /MyExcludeFile”.</p> |
| <p>Multiple DocumentNameMatchers may be combined together using the <code>DocumentNameMatcher.Or</code> or <code>DocumentNameMatcher.And</code> classes. Additionally, DocumentNameMatchers may be negated by use of the <code>DocumentNameMatcher.Not</code> class.</p></section><section><a id="AbstractFileProcessorBuilder"></a> |
| <h2>AbstractFileProcessorBuilder</h2> |
| <p>In many cases a file processor should process multiple files in the source tree. For example the <code>.gitignore</code> or <code>.hgignore</code> files. To implement a file processor that performs a walk down the source tree the <code>AbstractFileProcessorBuilder</code> is used.</p> |
| <p>The <code>AbstractFileProcessorBuilder</code> constructor takes a file name, one or more comment prefixes, and a flag to indicate whether the file name should be listed in the exclude list. The file name normally is a file that is generally hidden on Linux systems like “.gitignore” or “.hgignore”. The <code>AbstractFileProcessorBuilder</code> will scan the directories looking for files with the specified name. If one is found it is passed to the <code>process(DocumentName)</code> method which reads the document and returns a MatcherSet.</p> |
| <p>Classes that extend the <code>AbstractFileProcessorBuilder</code> have two main extension points: <code>modifyEntry(DocumentName, String)</code> and <code>process(DocumentName)</code>.</p><section><a id="Extension_Points"></a> |
| <h3>Extension Points</h3><section><a id="modifyEntry"></a> |
| <h4>modifyEntry</h4> |
| <p>The <code>modifyEntry</code> method accepts the source <code>DocumentName</code> and a non-comment string. It is expected to process the string and return an exclude expression or null if the line does not result in an exclude expression. The default implementation simply returns the string argument.</p> |
| <p>An example of <code>modifyEntry</code> is found in the <code>BazaarIgnoreBuilder</code> where lines that start with “RE:” are regular expressions and all other lines are standard exclude patterns. The <code>BazaarIgnoreBuilder.modifyEntry</code> method converts “RE:” prefixed strings into the standard exclude regular expression string.</p></section><section><a id="process"></a> |
| <h4>process</h4> |
| <p>In many cases the process method does not need to be modified. In general the process method:</p> |
| <ul> |
| |
| <li>Opens a File on the <code>DocumentName</code></li> |
| <li>Reads each line in the file</li> |
| <li>Calls the modifyEntry on the line.</li> |
| <li>If the line is not null: |
| <ul> |
| |
| <li>Uses the <code>FileProcessor.localizePattern()</code> to create a DocumentName for the pattern with the baseName specified as the name of the file being read.</li> |
| <li>Stores the new document name in the list of names being returned.</li> |
| </ul></li> |
| <li>Repeats until all the lines in the input file have been read.</li> |
| </ul> |
| <p>Classes that override the <code>process</code> method generally do so because they have some special cases. For example the <code>GitIgnoreBuilder</code> has some specific rules about when to add wildcard paths and when the paths are literal. Thus a special process is required.</p></section></section></section><section><a id="Theory_of_Operation"></a> |
| <h2>Theory of Operation</h2> |
| <p>The AbstractFileProcessorBuilder creates MatcherSets for each instance of the target file it finds in the source tree. Those MatcherSets are organized into levels based on how far down the tree the target file is. MatcherSets generated from files in the root of the tree are at level zero while files found in a subdirectory of root are are level 1, and subdirectories of subdirectories of root are at level 2 and so on.</p> |
| <p>The builder constructs a list of MatcherSets with the MatcherSets from the deepest level combined followed by the MatcherSets from the next deepest level and so on to the shallowest level. This ensures that later files override earlier files.</p> |
| <p>If files outside the source tree need to be processed they will need to override the <code>process</code> method to add the processed files at the appropriate level. An example of this can be seen in the <code>GitIgnoreBuilder</code> code where a global ignore file is added at level -1 because it must be processed after all the explicit includes and excludes found in the source tree.</p></section><section><a id="Debugging"></a> |
| <h2>Debugging</h2> |
| <p>Debugging a DocumentNameMatcher might be difficult due to the nested Predicate nature of the structure. However, the <code>decompose()</code> method provides a view into the inner operation of the class without having to execute a stepwise debugging session.</p> |
| <p>Assuming there is a candidate document name that needs to be checked the following code block will output the call tree of the DocumentNameMatcher and show exactly what the result of each test is.</p> |
| |
| <pre><code class="nohighlight nocode"> DocumentNameMatcher matcher = ...; |
| DocumentName candidate = DocumentName.builder() |
| .setName(dirName+"/dir1/file1.log") |
| .setBaseName(dirName).build(); |
| System.out.println("Decomposition for " + candidate); |
| matcher.decompose(candidate).forEach(System.out::println); |
| </code></pre> |
| <p>The result will list the name of the test, the result of the test, the name of the document being tested, and the predicate being executed. If the predicate is a CompoundPredicate then each of the matchers from the CompoundPredicate will be decomposed as well. The result is a display of all the predicates and an indication of which one, if any, fired.</p></section><section><a id="Examples"></a> |
| <h2>Examples</h2> |
| <p>All the examples below use <code>/testName</code> as the candidate name to match.</p><section><a id="FileFilter"></a> |
| <h3>FileFilter</h3> |
| <p>A DocumentNameMatcher created as: <code>DocumentNameMatcher matcher1 = new DocumentNameMatcher("FileFilterTest", new NameFileFilter("File.name"));</code></p> |
| <p>will produce:</p> |
| |
| <pre><code class="nohighlight nocode">FileFilterTest: >>false<< /testName |
| NameFileFilter(File.name) >>false<< |
| </code></pre></section><section><a id="Multiple_patterns"></a> |
| <h3>Multiple patterns</h3> |
| <p>A DocumentNameMatcher created as: <code>DocumentNameMatcher matcher2 = new DocumentNameMatcher("MatchPatternsTest", MatchPatterns.from("/", "**/test1*", "**/*Name"));</code></p> |
| <p>will produce:</p> |
| |
| <pre><code class="nohighlight nocode">MatchPatternsTest: >>true<< /testName |
| **/test1*: >>false<< |
| org.apache.rat.document.DocumentNameMatcher$1$$Lambda/0x00007f0c3c141f58@465232e9 >>false<< |
| **/*Name: >>true<< |
| org.apache.rat.document.DocumentNameMatcher$1$$Lambda/0x00007f0c3c141f58@798162bc >>true<< |
| </code></pre></section><section><a id="Combined_patterns"></a> |
| <h3>Combined patterns</h3> |
| <p>If the above 2 patterns are combined into a single DocumentNameMatcher as: <code>DocumentNameMatcher.matcherSet(matcher1, matcher2);</code></p> |
| <p>it will produce:</p> |
| |
| <pre><code class="nohighlight nocode">matcherSet(FileFilterTest, MatchPatternsTest): >>false<< /testName |
| FileFilterTest: >>false<< |
| NameFileFilter(File.name) >>false<< |
| MatchPatternsTest: >>true<< |
| **/test1*: >>false<< |
| org.apache.rat.document.DocumentNameMatcher$1$$Lambda/0x00007f0c3c141f58@6f36c2f0 >>false<< |
| **/*Name: >>true<< |
| org.apache.rat.document.DocumentNameMatcher$1$$Lambda/0x00007f0c3c141f58@f58853c >>true<< |
| </code></pre></section></section></section> </main> |
| </div> |
| </div> |
| <hr/> |
| <footer> |
| <div class="container-fluid"> |
| <div class="row-fluid"> |
| Copyright © 2016-2025 The Apache Software Foundation, Licensed under the Apache License, Version 2.0. |
| Apache Creadur, Creadur, Apache RAT, Apache Tentacles, Apache Whisker, Apache and the ASF logo are trademarks |
| of The Apache Software Foundation. |
| Oracle and Java are registered trademarks of Oracle and/or its affiliates. |
| All other marks mentioned may be trademarks or registered trademarks of their respective owners. |
| </div> |
| </div> |
| </footer> |
| </body> |
| </html> |