content/apt/plugin-developers/common-bugs.apt - maven-site - Git at Google

  ------
  Common Bugs and Pitfalls
  ------
  Benjamin Bentmann
  ------
  2008-08-29
  ------

 ~~ Licensed to the Apache Software Foundation (ASF) under one
 ~~ or more contributor license agreements.  See the NOTICE file
 ~~ distributed with this work for additional information
 ~~ regarding copyright ownership.  The ASF licenses this file
 ~~ to you under the Apache License, Version 2.0 (the
 ~~ "License"); you may not use this file except in compliance
 ~~ with the License.  You may obtain a copy of the License at
 ~~
 ~~   http://www.apache.org/licenses/LICENSE-2.0
 ~~
 ~~ Unless required by applicable law or agreed to in writing,
 ~~ software distributed under the License is distributed on an
 ~~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 ~~ KIND, either express or implied.  See the License for the
 ~~ specific language governing permissions and limitations
 ~~ under the License.

 ~~ NOTE: For help with the syntax of this file, see:
 ~~ http://maven.apache.org/doxia/references/apt-format.html

 Common Bugs and Pitfalls

   Maven is not the smallest project in terms of source code and has as such already suffered from many bugs. Having a
   closer look at all the issues revealed some coding problems that had widespread among the various subcomponents. This
   document lists these commonly occurring anti patterns in order to help the Maven community to prevent rather than fix
   bugs. Note that the primary focus is on pointing out problems that are subtle in their nature rather than giving a
   comprehensive guide for Java or Maven development.

   * {{{Reading_and_Writing_Text_Files}Reading and Writing Text Files}}

   * {{{Converting_between_URLs_and_Filesystem_Paths}Converting between URLs and Filesystem Paths}}

   * {{{Handling_Strings_Case-insensitively}Handling Strings Case-insensitively}}

   * {{{Creating_Resource_Bundle_Families}Creating Resource Bundle Families}}

   * {{{Using_System_Properties}Using System Properties}}

   * {{{Using_Shutdown_Hooks}Using Shutdown Hooks}}

   * {{{Resolving_Relative_Paths}Resolving Relative Paths}}

   * {{{Determining_the_Output_Directory_for_a_Site_Report}Determining the Output Directory for a Site Report}}

   * {{{Retrieving_the_Mojo_Logger}Retrieving the Mojo Logger}}

   * {{{Depending_on_Plexus_Utilities_1.1}Depending on Plexus Utilities 1.1+}}

 * {Reading and Writing Text Files}

   Textual content is composed of characters while file systems merely store byte streams. A file encoding (aka charset)
   is used to convert between bytes and characters. The challenge is using the right file encoding.

   The JVM has this notion of a default encoding (given by the <<<file.encoding>>> property) which it derives from a
   system's locale. While this might be a convenient feature sometimes, using this default encoding for a project build
   is in general a bad idea: The build output will depend on the machine/developer who runs the build. As such, usage
   of the default encoding threatens the dream of a reproducible build.

   For example, if developer A has UTF-8 as default encoding while developer B uses ISO-8859-1, text files are very
   likely to get messed up during resource filtering or similar tasks.

   Therefore, developers should avoid any direct or indirect usage of the classes/methods that simply employ the
   platform's default encoding. For instance, <<<FileWriter>>> and <<<FileReader>>> should usually be avoided:

 +---
 /*
  * FIXME: This assumes the source file is using the platform's default encoding.
  */
 Reader reader = new FileReader( javaFile );
 +---

   Instead, the classes <<<OutputStreamWriter>>> and <<<OutputStreamReader>>> can be used in combination with an
   explicit encoding value. This encoding value can be retrieved from a mojo parameter such that the user can configure
   the plugin to fit his/her needs.

   To save the user from configuring each plugin individually, conventions have been established that allow a user to
   centrally configure the file encoding per POM. Plugin developers should respect these conventions whereever possible:

   * {{{https://cwiki.apache.org/confluence/display/MAVEN/POM+Element+for+Source+File+Encoding}Source File Encoding}}

   * {{{http://cwiki.apache.org/confluence/display/MAVENOLD/Reporting+Encoding+Configuration}Report Output Encoding}}

   []

   Finally note that XML files require special handling because they are equipped with an encoding declaration in the
   XML prolog. Reading or writing XML files with an encoding that does not match their XML prolog's <<<encoding>>>
   attribute is a bad idea:

 +---
 /*
  * FIXME: This assumes the XML encoding declaration matches the platform's default encoding.
  */
 Writer writer = new FileWriter( xmlFile );
 writer.write( xmlContent );
 +---

   To ease the correct processing of XML files, developers are encouraged to use
   <<<{{{https://codehaus-plexus.github.io/plexus-utils/apidocs/org/codehaus/plexus/util/ReaderFactory.html#newXmlReader(java.io.File)}ReaderFactory.newXmlReader()}}>>>
   and
   <<<{{{https://codehaus-plexus.github.io/plexus-utils/apidocs/org/codehaus/plexus/util/WriterFactory.html#newXmlWriter(java.io.File)}WriterFactory.newXmlWriter()}}>>>
   from the Plexus Utilities.

 * {Converting between URLs and Filesystem Paths}

   URLs and filesystem paths are really two different things and converting between them is not trivial. The main source
   of problems is that different encoding rules apply for the strings that make up a URL or filesystem path. For example,
   consider the following code snippet and its associated console output:

 +---
 File file = new File( "foo bar+foo" );
 URL url = file.toURI().toURL();

 System.out.println( file.toURL() );
 > file:/C:/temp/foo bar+foo

 System.out.println( url );
 > file:/C:/temp/foo%20bar+foo

 System.out.println( url.getPath() );
 > /C:/temp/foo%20bar+foo

 System.out.println( URLDecoder.decode( url.getPath(), "UTF-8" ) );
 > /C:/temp/foo bar foo
 +---

   First of all, please note that
   <<<{{{http://java.sun.com/javase/6/docs/api/java/io/File.html#toURL()}File.toURL()}}>>> does not escape the space
   character (and others). This yields an invalid URL, as per
   {{{http://www.faqs.org/rfcs/rfc2396.html}RFC 2396, section 2.4.3 "Excluded US-ASCII Characters"}}. The class
   <<<java.net.URL>>> will silently accept such invalid URLs, in contrast <<<java.net.URI>>> will not (see also
   <<<{{{http://java.sun.com/javase/6/docs/api/java/net/URL.html#toURI()}URL.toURI()}}>>>). For this reason,
   <<<File.toURL()>>> has been deprecated and should be replaced with <<<File.toURI().toURL()>>>.

   Next, <<<{{{http://java.sun.com/javase/6/docs/api/java/net/URL.html#getPath()}URL.getPath()}}>>> does in general not
   return a string that can be used as a filesystem path. It returns a substring of the URL and as such can contain
   escape sequences. The prominent example is the space character which will show up as "%20". People sometimes hack
   around this by means of <<<replace("%20", " ")>>> but that does simply not cover all cases. It's worth to mention
   that on the other hand the related method
   <<<{{{http://java.sun.com/javase/6/docs/api/java/net/URI.html#getPath()}URI.getPath()}}>>> does decode escapes but
   still the result is not a filesystem path (compare the source for the constructor <<<File(URI)>>>). To summarize,
   the following idiom is to be avoided:

 +---
 URL url = new URL( "file:/C:/Program%20Files/Java/bin/java.exe" );

 /*
  * FIXME: This does not decode percent encoded characters.
  */
 File path = new File( url.getPath() );
 +---

   To decode a URL, people sometimes also choose
   <<<{{{http://java.sun.com/javase/6/docs/api/java/net/URLDecoder.html}java.net.URLDecoder}}>>>. The pitfall with this
   class is that is actually performs HTML form decoding which is yet another encoding and not the same as the URL
   encoding (compare the last paragraph in class javadoc about
   <<<{{{http://java.sun.com/javase/6/docs/api/java/net/URL.html}java.net.URL}}>>>). For instance, a <<<URLDecoder>>>
   will erroneously convert the character "+" into a space as illustrated by the last sysout in the example above.

   In an ideal world, code targetting JRE 1.4+ could easily avoid these problems by using the constructor
   <<<{{{http://java.sun.com/javase/6/docs/api/java/io/File.html#File(java.net.URI)}File(URI)}}>>> as suggested by the
   following snippet:

 +---
 URL url = new URL( "file:/C:/Documents and Settings/user/.m2/settings.xml" );

 /*
  * FIXME: This assumes the URL is fully compliant with RFC 3986.
  */
 File path = new File( new URI( url.toExternalForm() ) );
 +---

   The remaining source of frustration is the conversion from <<<URL>>> to <<<URI>>>. As already said, the <<<URL>>>
   class accepts malformed URLs which will make the constructor of <<<URI>>> throw an exception. And indeed, class
   loaders from Sun JREs up to Java 1.4 will deliver malformed URLs when queried for a resource. Likewise, the class
   loaders employed by Maven 2.x deliver malformed resource URLs regardless of the JRE version (see
   {{{https://issues.apache.org/jira/browse/MNG-3607}MNG-3607}}).

   For all these reasons, it is recommended to use
   <<<{{{http://commons.apache.org/io/api-release/org/apache/commons/io/FileUtils.html#toFile(java.net.URL)}FileUtils.toFile()}}>>>
   from Commons IO or
   <<<{{{https://codehaus-plexus.github.io/plexus-utils/apidocs/org/codehaus/plexus/util/FileUtils.html#toFile(java.net.URL)}FileUtils.toFile()}}>>>
   from a recent Plexus Utilities.

 * {Handling Strings Case-insensitively}

   When developers need to compare strings without regard to case or want to realize a map with case-insensitive string
   keys, they often employ
   <<<{{{http://java.sun.com/javase/6/docs/api/java/lang/String.html#toLowerCase()}String.toLowerCase()}}>>> or
   <<<{{{http://java.sun.com/javase/6/docs/api/java/lang/String.html#toUpperCase()}String.toUpperCase()}}>>> to create
   a "normalized" string before doing a simple <<<String.equals()>>>. Now, the <<<to*Case()>>> methods are overloaded:
   One takes no arguments and one takes a <<<Locale>>> object.

   The gotcha with the arg-less methods is that their output depends on the default locale of the JVM but the default
   locale is out of control of the developer. That means the string expected by the developer (who runs/tests his code
   in a JVM using locale <<<xy>>>) does not necessarily match the string seen by another user (that runs a JVM with
   locale <<<ab>>>). For example, the comparison shown in the next code snippet is likely to fail for systems with
   default locale Turkish because Turkish has unusual casing rules for the characters "i" and "I":

 +---
 /*
  * FIXME: This assumes the casing rules of the current platform
  * match the rules for the English locale.
  */
 if ( "info".equals( debugLevel.toLowerCase() ) )
     logger.info( message );
 +---

   For case-insensitive string comparisons which should be locale-insensitive, the method
   <<<{{{http://java.sun.com/javase/6/docs/api/java/lang/String.html#equalsIgnoreCase(java.lang.String)}String.equalsIgnoreCase()}}>>>
   should be used instead. If only a substring like a prefix/suffix should be compared, the method
   <<<{{{http://java.sun.com/javase/6/docs/api/java/lang/String.html#regionMatches(boolean,%20int,%20java.lang.String,%20int,%20int)}String.regionMatches()}}>>>
   can be used instead.

   If the usage of <<<String.to*Case()>>> cannot be avoided, the overloaded version taking a <<<Locale>>> object should
   be used, passing in
   <<<{{{http://java.sun.com/javase/6/docs/api/java/util/Locale.html#ENGLISH}Locale.ENGLISH}}>>>. The resulting code
   will still run on Non-English systems, the parameter only locks down the casing rules used for the string comparison
   such that the code delivers the same results on all platforms.

 * {Creating Resource Bundle Families}

   Especially reporting plugins employ resource bundles to support internationalization. One language (usually English)
   is provided as the fallback/default language in the base resource bundle. Due to the lookup strategy performed by
   <<<{{{http://java.sun.com/javase/6/docs/api/java/util/ResourceBundle.html#getBundle(java.lang.String,%20java.util.Locale,%20java.lang.ClassLoader)}ResourceBundle.getBundle()}}>>>,
   one must always provide a dedicated resource bundle for this default language, too. This bundle should be empty
   because it inherits the strings via the parent chain from the base bundle, but it must exist.

   The following example illustrates this requirement. Imagine the broken resource bundle family shown below which is
   intended to provide localization for English, German and French:

 +---
 src/
 +- main/
    +- resources/
       +- mymojo-report.properties
       +- mymojo-report_de.properties
       +- mymojo-report_fr.properties
 +---

   Now, if a resource bundle is to be looked up for English on a JVM whose default locale happens to be French, the
   bundle <<<mymojo-report_fr.properties>>> will be loaded instead of the intended bundle <<<mymojo-report.properties>>>.

   Reporting plugins that suffer from this bug can easily be detected by executing <<<mvn site -D locales=xy,en>>> where
   <<<xy>>> denotes any other language code supported by the particular plugin. Specifying <<<xy>>> as the first locale
   will have the Maven Site Plugin change the JVM's default locale to <<<xy>>> which in turn causes the lookup for
   <<<en>>> to fail as outlined above unless the plugin has a dedicated resource bundle for English.

 * {Using System Properties}

   Maven's command line supports the definition of system properties via arguments of the form <<<-D key=value>>>.
   While these properties are called system properties, plugins should never use
   <<<{{{http://java.sun.com/javase/6/docs/api/java/lang/System.html#getProperty(java.lang.String)}System.getProperty()}}>>>
   and related methods to query these properties. For example, the following code snippet will not work reliably when
   Maven is embedded, say into an IDE or a CI server:

 +---
 public MyMojo extends AbstractMojo
 {
     public void execute()
     {
         /*
          * FIXME: This prevents proper embedding into IDEs or CI systems.
          */
         String value = System.getProperty( "maven.test.skip" );
     }
 }
 +---

   The problem is that the properties managed by the <<<System>>> class are global, i.e. shared among all threads in the
   current JVM. To prevent conflicts with other code running in the same JVM, Maven plugins should instead query the
   execution properties. These can be obtained from
   <<<{{{https://maven.apache.org/ref/current/maven-core/apidocs/org/apache/maven/execution/MavenSession.html#getExecutionProperties()}MavenSession.getExecutionProperties()}}>>>.

 * {Using Shutdown Hooks}

   People occasionally employ shutdown hooks to perform cleanup tasks, e.g. to delete temporary files as shown in the
   example below:

 +---
 public MyMojo extends AbstractMojo
 {
     public void execute()
     {
         File tempFile = File.createTempFile( "temp", null );
         /*
          * FIXME: This assumes the JVM terminates soon after
          * the Maven build has finished.
          */
         tempFile.deleteOnExit();
     }
 }
 +---

   The problem is that the JVM executing Maven can be running much longer than the actual Maven build. Of course, this
   does not apply to the standalone invocation of Maven from the command line. However, it affects the embedded usage of
   Maven in IDEs or CI servers. In those cases, the cleanup tasks will be deferred, too. If the JVM is then executing a
   bunch of other Maven builds, many such cleanup tasks can sum up, eating up resources of the JVM.

   For this reason, plugin developers should avoid usage of shutdown hooks and rather use <<<try>>>/<<<finally>>> blocks
   to perform cleanup as soon as the resources are no longer needed.

 * {Resolving Relative Paths}

   It is common practice for users of Maven to specify relative paths in the POM, not to mention that the Super POM does
   so, too. The intention is to resolve such relative paths against the base directory of the current project. In other
   words, the paths <<<target/classes>>> and <<<$\{basedir\}/target/classes>>> should resolve to the same directory for a
   given POM.

   Unfortunately, the class <<<{{{http://java.sun.com/javase/6/docs/api/java/io/File.html}java.io.File}}>>> does not
   resolve relative paths against the project's base directory. As mentioned in its class javadoc, it resolves relative
   paths against the current working directory. In plain English: Unless a Maven component has complete control over the
   current working directory, any usage of <<<java.io.File>>> in combination with a relative path is a bug.

   At first glance, one might be tempted to argue that the project base directory is equal to the current working
   directory. However, this assumption is generally not true. Consider the following scenarios:

   [[a]] Reactor Builds

         When a child module is build during a reactor build, the current working directory is usually the base directory of the
         parent project, not the base directory of the current module. That is the most common scenario where users are
         faced with the bug.

   [[b]] Embedded Maven Invocations

         Other tools, most notably IDEs, that run Maven under the hood may have set the current working directory to
         their installation folder or whatever they like.

   [[c]] Maven Invocations using the <<<-f>>> Switch

         While it is surely an uncommon use-case, the user is free to invoke Maven from an arbitrary working directory
         by specifying an absolute path like <<<mvn -f /home/me/projects/demo/pom.xml>>>.

   []

   Hence this example code is prone to misbehave:

 +---
 public MyMojo extends AbstractMojo
 {
     /**
      * @parameter
      */
     private String outputDirectory;

     public void execute()
     {
         /*
          * FIXME: This will resolve relative paths like "target/classes" against
          * the user's working directory instead of the project's base directory.
          */
         File outputDir = new File( outputDirectory ).getAbsoluteFile();
     }
 }
 +---

   In order to guarantee reliable builds, Maven and its plugins must manually resolve relative paths against the
   project's base directory. A simple idiom like the following will do just fine:

 +---
 File file = new File( path );
 if ( !file.isAbsolute() )
 {
     file = new File( project.getBasedir(), file );
 }
 +---

   Many Maven plugins can get this resolution automatically if they declare their affected mojo parameters of type
   <<<java.io.File>>> instead of <<<java.lang.String>>>. This subtle difference in parameter types will trigger a
   feature known as <path translation>, i.e. the Maven core will automatically resolve relative paths when it pumps the
   XML configuration into a mojo.

 * {Determining the Output Directory for a Site Report}

   Most reporting plugins inherit from <<<AbstractMavenReport>>>. In doing so, they need to implement the inherited but
   abstract method <<<getOutputDirectory()>>>. To implement this method, plugins usually declare a field named
   <<<outputDirectory>>> which they return in the method. Nothing wrong so far.

   Now, some plugins need to create additional files in the report output directory that accompany the report generated
   via the sink interface. While it is tempting to use either the method <<<getOutputDirectory()>>> or the field
   <<<outputDirectory>>> directly in order to setup a path for the output files, this leads most likely to a bug. More
   precisely, those plugins will not properly output files when run by the Maven Site Plugin as part of the site
   lifecycle. This is best noticed when the output directory for the site is configured directly in the Maven Site Plugin
   such that it deviates from the expression <<<$\{project.reporting.outputDirectory\}>>> that the plugins use by default.
   Multi-language site generation is another scenario to exploit this bug which is illustrated below:

 +---
 public MyReportMojo extends AbstractMavenReport
 {
     /**
      * @parameter default-value="${project.reporting.outputDirectory}"
      */
     private File outputDirectory;

     protected String getOutputDirectory()
     {
         return outputDirectory.getAbsolutePath();
     }

     public void executeReport( Locale locale )
     {
         /*
          * FIXME: This assumes the mojo parameter reflects the effective
          * output directory as used by the Maven Site Plugin.
          */
         outputDirectory.mkdirs();
     }
 }
 +---

   There are in principal two situations in which a report mojo could be invoked. The mojo might be run directly from
   the command line or the default build lifecycle or it might be run indirectly as part of the site generation along
   with other report mojos. The glaring difference between these two invocations is the way the output directory is
   controlled. In the first case, the parameter <<<outputDirectory>>> from the mojo itself is used. In the second case
   however, the Maven Site Plugin takes over control and will set the output directory according to its own configuration
   by calling <<<MavenReport.setReportOutputDirectory()>>> on the reports being generated.

   Therefore, developers should always use <<<MavenReport.getReportOutputDirectory()>>> if they need to query the
   effective output directory for the report. The implementation of <<<AbstractMavenReport.getOutputDirectory()>>>
   is only intended as a fallback in case the mojo is not run as part of the site generation.

 * {Retrieving the Mojo Logger}

   Maven employs an IoC container named Plexus to setup a plugin's mojos before their execution. In other words,
   components required by a mojo will be provided by means of dependency injection, more precisely field injection. The
   important point to keep in mind is that this field injection happens <after> the mojo's constructor has finished.
   This means that references to injected components are invalid during the construction time of the mojo.

   For example, the next snippet tries to retrieve the mojo logger during construction time but the mojo logger is an
   injected component and as such has not been properly initialized yet:

 +---
 public MyMojo extends AbstractMojo
 {
     /*
      * FIXME: This will retrieve a wrong logger instead of the intended mojo logger.
      */
     private Log log = getLog();

     public void execute()
     {
         log.debug( "..." );
     }
 }
 +---

   In case of the logger, the above mojo will simply use a default console logger, i.e. the code defect is not
   immediately noticeable by a <<<NullPointerException>>>. This default logger will however use a different message
   format for its output and also outputs debug messages even if Maven's debug mode was not enabled. For this reason,
   developers must not try to cache the logger during construction time. The method <<<getLog()>>> is fast enough and
   can simply be called whenever one needs it.
 +---