  ---
  Maven Repository Synchronization Refactor: Summary of Changes
  ---
  John Casey
  ---
  2005-April-06
  ---
  
~~ Copyright 2006 The Apache Software Foundation.
~~
~~ Licensed under the Apache License, Version 2.0 (the "License");
~~ you may not use this file except in compliance with the License.
~~ You may obtain a copy of the License at
~~
~~      http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License.

~~ NOTE: For help with the syntax of this file, see:
~~ http://maven.apache.org/guides/mini/guide-apt-format.html

Summary of Changes for the Maven Repository Synchronization Process

*Abstract

  In order to support the impending release of maven2 from a production-ready
  repository on ibiblio.org, several things had to be changed. Most importantly,
  we had to somehow find a way to synchronize the maven1 repository and feeds
  with maven2's repository, and find a way to integrate this conversion process
  with the synchronization already taking place on beaver.codehaus.org.
  
  What follows is a description of the changes I made to the original maven1 
  synchronization process in order to accommodate maven2's release.
  
*Conversion

  First, we needed a reliable tool to convert a maven1 repository into a maven2
  repository. There are several tasks involved in this process:
  
  [[1]] Parsing artifact paths for artifact information.
  
  [[2]] Moving artifacts from source repo to target repo, reformatting the
        relative artifact paths along the way (to conform with the new repo
        layout for m2).
  
  [[3]] Translating m1 POMs into m2 POMs, and creating skeletal POMs where they
        were missing, using the artifact information parsed in [1] above.
  
  [[4]] Repairing and/or moving MD5 checksums for each artifact from source to
        target repository.
        
  [[5]] Preserving a good log of errors encountered during the conversion
        process, for later auditing.
        
  Since I had limited time with which to implement a solution, and didn't have
  much familiarity with pre-existing repository conversion tools made by Carlos
  et al. I decided to design my own solution to the problem, and worry about
  merging with other tools later.
  
  The solution I have created is called repoclean, and can be found in
  <<<maven-components/sandbox/repoclean>>>. It's a plexus application, with some
  basic bash shell scripts used to install and run the application. The steps
  enumerated above were implemented as separate components, then stitched 
  together with a Main class and controller component which serves as the entry
  point for Main.
  
  As a final point, the reporting takes place both at the entire-process level
  for operations such as artifact discovery, and at the per-artifact level. A
  report is only written in the event of an error or warning, and per-artifact
  reports are mentioned in the entire-process report if they contained an error.
  In the event that an error was detected, the entire-process report should be
  mailed to the m2-dev list with a subject similar to: <<[REPOCLEAN] Error(s)
  occurred while converting the repository>>. Other reports can be found in the
  reports directory of the sync work directory (mentioned below).
  
*Synchronization

  Now, the synchronization process as-is was only maintaining a maven1 repository
  from a set of feeds. In order to refactor this into a maintenance process for
  both maven1 and maven2 repositories, I had to make a few minor changes.
  
  In order to aid in understanding this process, I moved the tools suite into
  $HOME/repository-tools. I moved the synchronization work directory (the 
  directory into which all feeds will copy, and which the outbound rsync will
  use as a source) into $HOME/repository-staging. The tools suite (in
  $HOME/repository-tools) does NOT contain the only copy of syncopate and the
  outbound rsync script, only the copies I made and modified for the new
  synchronization process...this was an insurance policy made to allow rollback.
  
  As I said, I made some minor changes to the existing process. These mainly 
  consisted of reconfiguring syncopate and the outbound rsync script to use the
  new directory structures, along with adding a control script which would be 
  called from cron, and which would inject a call to repoclean into the middle
  of the process. The new controller script was used to consolidate all 
  synchronization logic into the repository-tools directory, and expose it all
  equally as scripts to be maintained as a unit. Now, the crontab entry is very
  simple, only referencing the controller script.
  
  The new synchronization process executes the following operations:
  
  [[1]] Run syncopate to collect new artifacts from the feeder repositories.
  
        <<Syncopate location:>> $HOME/repository-tools/syncopate
        <<Target repository location:>> $HOME/repository-staging/to-ibiblio/maven
        
  [[2]] Run repoclean to convert any new added or updated artifacts to the
        maven2 repository work directory.
        
        <<Repoclean location:>> $HOME/repository-tools/repoclean
        <<Source repository location:>> $HOME/repository-staging/to-ibiblio/maven
        <<Target repository location:>> $HOME/repository-staging/to-ibiblio/maven2
        
  [[3]] Run the rsync to ibiblio.
  
        <<Rsync script location:>> $HOME/repository-tools/ibiblio-sync/synchronize-codehaus-to-ibiblio.sh
        
        <<*NOTE:>> This is accomplished as two separate rsync operations, to 
        avoid unwanted directories being added to the outbound rsync (which 
        would land in /public/html on ibiblio...a big no-no).
   
   All of the old synchronization stuff is still in place, with the exception of
   the old version of the canonical repositories, which were removed to keep our
   space usage to a minimum on beaver.codehaus.org.