blob: df62682a98eff41909b7293231def141d8e07077 [file] [log] [blame]
---
Maven Repository Synchronization Refactor: Summary of Changes
---
John Casey
---
2005-April-06
---
~~ Copyright 2006 The Apache Software Foundation.
~~
~~ Licensed under the Apache License, Version 2.0 (the "License");
~~ you may not use this file except in compliance with the License.
~~ You may obtain a copy of the License at
~~
~~ http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License.
~~ NOTE: For help with the syntax of this file, see:
~~ http://maven.apache.org/guides/mini/guide-apt-format.html
Summary of Changes for the Maven Repository Synchronization Process
*Abstract
In order to support the impending release of maven2 from a production-ready
repository on ibiblio.org, several things had to be changed. Most importantly,
we had to somehow find a way to synchronize the maven1 repository and feeds
with maven2's repository, and find a way to integrate this conversion process
with the synchronization already taking place on beaver.codehaus.org.
What follows is a description of the changes I made to the original maven1
synchronization process in order to accommodate maven2's release.
*Conversion
First, we needed a reliable tool to convert a maven1 repository into a maven2
repository. There are several tasks involved in this process:
[[1]] Parsing artifact paths for artifact information.
[[2]] Moving artifacts from source repo to target repo, reformatting the
relative artifact paths along the way (to conform with the new repo
layout for m2).
[[3]] Translating m1 POMs into m2 POMs, and creating skeletal POMs where they
were missing, using the artifact information parsed in [1] above.
[[4]] Repairing and/or moving MD5 checksums for each artifact from source to
target repository.
[[5]] Preserving a good log of errors encountered during the conversion
process, for later auditing.
Since I had limited time with which to implement a solution, and didn't have
much familiarity with pre-existing repository conversion tools made by Carlos
et al. I decided to design my own solution to the problem, and worry about
merging with other tools later.
The solution I have created is called repoclean, and can be found in
<<<maven-components/sandbox/repoclean>>>. It's a plexus application, with some
basic bash shell scripts used to install and run the application. The steps
enumerated above were implemented as separate components, then stitched
together with a Main class and controller component which serves as the entry
point for Main.
As a final point, the reporting takes place both at the entire-process level
for operations such as artifact discovery, and at the per-artifact level. A
report is only written in the event of an error or warning, and per-artifact
reports are mentioned in the entire-process report if they contained an error.
In the event that an error was detected, the entire-process report should be
mailed to the m2-dev list with a subject similar to: <<[REPOCLEAN] Error(s)
occurred while converting the repository>>. Other reports can be found in the
reports directory of the sync work directory (mentioned below).
*Synchronization
Now, the synchronization process as-is was only maintaining a maven1 repository
from a set of feeds. In order to refactor this into a maintenance process for
both maven1 and maven2 repositories, I had to make a few minor changes.
In order to aid in understanding this process, I moved the tools suite into
$HOME/repository-tools. I moved the synchronization work directory (the
directory into which all feeds will copy, and which the outbound rsync will
use as a source) into $HOME/repository-staging. The tools suite (in
$HOME/repository-tools) does NOT contain the only copy of syncopate and the
outbound rsync script, only the copies I made and modified for the new
synchronization process...this was an insurance policy made to allow rollback.
As I said, I made some minor changes to the existing process. These mainly
consisted of reconfiguring syncopate and the outbound rsync script to use the
new directory structures, along with adding a control script which would be
called from cron, and which would inject a call to repoclean into the middle
of the process. The new controller script was used to consolidate all
synchronization logic into the repository-tools directory, and expose it all
equally as scripts to be maintained as a unit. Now, the crontab entry is very
simple, only referencing the controller script.
The new synchronization process executes the following operations:
[[1]] Run syncopate to collect new artifacts from the feeder repositories.
<<Syncopate location:>> $HOME/repository-tools/syncopate
<<Target repository location:>> $HOME/repository-staging/to-ibiblio/maven
[[2]] Run repoclean to convert any new added or updated artifacts to the
maven2 repository work directory.
<<Repoclean location:>> $HOME/repository-tools/repoclean
<<Source repository location:>> $HOME/repository-staging/to-ibiblio/maven
<<Target repository location:>> $HOME/repository-staging/to-ibiblio/maven2
[[3]] Run the rsync to ibiblio.
<<Rsync script location:>> $HOME/repository-tools/ibiblio-sync/synchronize-codehaus-to-ibiblio.sh
<<*NOTE:>> This is accomplished as two separate rsync operations, to
avoid unwanted directories being added to the outbound rsync (which
would land in /public/html on ibiblio...a big no-no).
All of the old synchronization stuff is still in place, with the exception of
the old version of the canonical repositories, which were removed to keep our
space usage to a minimum on beaver.codehaus.org.