| --- |
| Maven Repository Synchronization Refactor: Summary of Changes |
| --- |
| John Casey |
| --- |
| 2005-April-06 |
| --- |
| |
| ~~ Copyright 2006 The Apache Software Foundation. |
| ~~ |
| ~~ Licensed under the Apache License, Version 2.0 (the "License"); |
| ~~ you may not use this file except in compliance with the License. |
| ~~ You may obtain a copy of the License at |
| ~~ |
| ~~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~~ |
| ~~ Unless required by applicable law or agreed to in writing, software |
| ~~ distributed under the License is distributed on an "AS IS" BASIS, |
| ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| ~~ See the License for the specific language governing permissions and |
| ~~ limitations under the License. |
| |
| ~~ NOTE: For help with the syntax of this file, see: |
| ~~ http://maven.apache.org/guides/mini/guide-apt-format.html |
| |
| Summary of Changes for the Maven Repository Synchronization Process |
| |
| *Abstract |
| |
| In order to support the impending release of maven2 from a production-ready |
| repository on ibiblio.org, several things had to be changed. Most importantly, |
| we had to somehow find a way to synchronize the maven1 repository and feeds |
| with maven2's repository, and find a way to integrate this conversion process |
| with the synchronization already taking place on beaver.codehaus.org. |
| |
| What follows is a description of the changes I made to the original maven1 |
| synchronization process in order to accommodate maven2's release. |
| |
| *Conversion |
| |
| First, we needed a reliable tool to convert a maven1 repository into a maven2 |
| repository. There are several tasks involved in this process: |
| |
| [[1]] Parsing artifact paths for artifact information. |
| |
| [[2]] Moving artifacts from source repo to target repo, reformatting the |
| relative artifact paths along the way (to conform with the new repo |
| layout for m2). |
| |
| [[3]] Translating m1 POMs into m2 POMs, and creating skeletal POMs where they |
| were missing, using the artifact information parsed in [1] above. |
| |
| [[4]] Repairing and/or moving MD5 checksums for each artifact from source to |
| target repository. |
| |
| [[5]] Preserving a good log of errors encountered during the conversion |
| process, for later auditing. |
| |
| Since I had limited time with which to implement a solution, and didn't have |
| much familiarity with pre-existing repository conversion tools made by Carlos |
| et al. I decided to design my own solution to the problem, and worry about |
| merging with other tools later. |
| |
| The solution I have created is called repoclean, and can be found in |
| <<<maven-components/sandbox/repoclean>>>. It's a plexus application, with some |
| basic bash shell scripts used to install and run the application. The steps |
| enumerated above were implemented as separate components, then stitched |
| together with a Main class and controller component which serves as the entry |
| point for Main. |
| |
| As a final point, the reporting takes place both at the entire-process level |
| for operations such as artifact discovery, and at the per-artifact level. A |
| report is only written in the event of an error or warning, and per-artifact |
| reports are mentioned in the entire-process report if they contained an error. |
| In the event that an error was detected, the entire-process report should be |
| mailed to the m2-dev list with a subject similar to: <<[REPOCLEAN] Error(s) |
| occurred while converting the repository>>. Other reports can be found in the |
| reports directory of the sync work directory (mentioned below). |
| |
| *Synchronization |
| |
| Now, the synchronization process as-is was only maintaining a maven1 repository |
| from a set of feeds. In order to refactor this into a maintenance process for |
| both maven1 and maven2 repositories, I had to make a few minor changes. |
| |
| In order to aid in understanding this process, I moved the tools suite into |
| $HOME/repository-tools. I moved the synchronization work directory (the |
| directory into which all feeds will copy, and which the outbound rsync will |
| use as a source) into $HOME/repository-staging. The tools suite (in |
| $HOME/repository-tools) does NOT contain the only copy of syncopate and the |
| outbound rsync script, only the copies I made and modified for the new |
| synchronization process...this was an insurance policy made to allow rollback. |
| |
| As I said, I made some minor changes to the existing process. These mainly |
| consisted of reconfiguring syncopate and the outbound rsync script to use the |
| new directory structures, along with adding a control script which would be |
| called from cron, and which would inject a call to repoclean into the middle |
| of the process. The new controller script was used to consolidate all |
| synchronization logic into the repository-tools directory, and expose it all |
| equally as scripts to be maintained as a unit. Now, the crontab entry is very |
| simple, only referencing the controller script. |
| |
| The new synchronization process executes the following operations: |
| |
| [[1]] Run syncopate to collect new artifacts from the feeder repositories. |
| |
| <<Syncopate location:>> $HOME/repository-tools/syncopate |
| <<Target repository location:>> $HOME/repository-staging/to-ibiblio/maven |
| |
| [[2]] Run repoclean to convert any new added or updated artifacts to the |
| maven2 repository work directory. |
| |
| <<Repoclean location:>> $HOME/repository-tools/repoclean |
| <<Source repository location:>> $HOME/repository-staging/to-ibiblio/maven |
| <<Target repository location:>> $HOME/repository-staging/to-ibiblio/maven2 |
| |
| [[3]] Run the rsync to ibiblio. |
| |
| <<Rsync script location:>> $HOME/repository-tools/ibiblio-sync/synchronize-codehaus-to-ibiblio.sh |
| |
| <<*NOTE:>> This is accomplished as two separate rsync operations, to |
| avoid unwanted directories being added to the outbound rsync (which |
| would land in /public/html on ibiblio...a big no-no). |
| |
| All of the old synchronization stuff is still in place, with the exception of |
| the old version of the canonical repositories, which were removed to keep our |
| space usage to a minimum on beaver.codehaus.org. |