| ---- |
| Metadata Control Model |
| ---- |
| |
| Metadata Content Model |
| |
| The metadata repository stores all known information about a repository in a common format that other plugins can |
| understand, and that eventually external applications will be able to query. |
| |
| The content model is designed such that it models the most likely structure of the data both for storage and |
| retrieval. For example, audit logs are stored by the time they occur, not grouped under an action. |
| |
| * Content Model Structure |
| |
| The following is a sample tree that represents the content model: |
| |
| ---- |
| . |
| `-- repositories/ |
| `-- central/ |
| |-- config/ |
| | |-- name= |
| | |-- storageUrl= |
| | `-- uri= |
| |-- content/ |
| | `-- org/ |
| | `-- apache/ |
| | |-- archiva/ |
| | | `-- platform/ |
| | | |-- scanner/ |
| | | | |-- 1.0-SNAPSHOT/ |
| | | | | |-- scanner-1.0-20091120.012345-1.pom/ |
| | | | | | |-- asc= |
| | | | | | |-- created= |
| | | | | | |-- fileCreated= |
| | | | | | |-- fileLastModified= |
| | | | | | |-- maven:buildNumber= |
| | | | | | |-- maven:classifier |
| | | | | | |-- maven:timestamp= |
| | | | | | |-- maven:type= |
| | | | | | |-- md5= |
| | | | | | |-- sha1= |
| | | | | | |-- size= |
| | | | | | |-- updated= |
| | | | | | `-- version= |
| | | | | |-- ciManagement.system= |
| | | | | |-- ciManagement.url= |
| | | | | |-- created= |
| | | | | |-- dependencies.0.artifactId= |
| | | | | |-- dependencies.0.classifier= |
| | | | | |-- dependencies.0.groupId= |
| | | | | |-- dependencies.0.optional= |
| | | | | |-- dependencies.0.scope= |
| | | | | |-- dependencies.0.systemPath= |
| | | | | |-- dependencies.0.type= |
| | | | | |-- dependencies.0.version= |
| | | | | |-- description= |
| | | | | |-- individuals.0.email= |
| | | | | |-- individuals.0.name= |
| | | | | |-- individuals.0.properties.scmId= |
| | | | | |-- individuals.0.roles.0= |
| | | | | |-- individuals.0.timezone= |
| | | | | |-- issueManagement.system= |
| | | | | |-- issueManagement.url= |
| | | | | |-- licenses.0.name= |
| | | | | |-- licenses.0.url= |
| | | | | |-- mailingLists.0.mainArchiveUrl= |
| | | | | |-- mailingLists.0.name= |
| | | | | |-- mailingLists.0.otherArchives.0= |
| | | | | |-- mailingLists.0.postAddress= |
| | | | | |-- mailingLists.0.subscribeAddress= |
| | | | | |-- mailingLists.0.unsubscribeAddress= |
| | | | | |-- maven:buildExtensions.0.artifactId= |
| | | | | |-- maven:buildExtensions.0.groupId= |
| | | | | |-- maven:buildExtensions.0.version= |
| | | | | |-- maven:packaging= |
| | | | | |-- maven:parent.artifactId= |
| | | | | |-- maven:parent.groupId= |
| | | | | |-- maven:parent.version= |
| | | | | |-- maven:plugins.0.artifactId= |
| | | | | |-- maven:plugins.0.groupId= |
| | | | | |-- maven:plugins.0.reporting= |
| | | | | |-- maven:plugins.0.version= |
| | | | | |-- maven:properties.mavenVersion= |
| | | | | |-- maven:repositories.0.id= |
| | | | | |-- maven:repositories.0.layout= |
| | | | | |-- maven:repositories.0.name= |
| | | | | |-- maven:repositories.0.plugins= |
| | | | | |-- maven:repositories.0.releases= |
| | | | | |-- maven:repositories.0.snapshots= |
| | | | | |-- maven:repositories.0.url= |
| | | | | |-- name= |
| | | | | |-- organization.favicon= |
| | | | | |-- organization.logo= |
| | | | | |-- organization.name= |
| | | | | |-- organization.url= |
| | | | | |-- relocatedTo.namespace= |
| | | | | |-- relocatedTo.project= |
| | | | | |-- relocatedTo.projectVersion= |
| | | | | |-- scm.connection= |
| | | | | |-- scm.developerConnection= |
| | | | | |-- scm.url= |
| | | | | |-- updated= |
| | | | | `-- url= |
| | | | `-- maven:artifactId= |
| | | `-- maven:groupId= |
| | `-- maven/ |
| | `-- plugins/ |
| | |-- maven:groupId= |
| | |-- maven:plugins.compiler.artifactId= |
| | `-- maven:plugins.compiler.name= |
| |-- facets/ |
| | |-- org.apache.archiva.audit/ |
| | | `-- 2010/ |
| | | `-- 01/ |
| | | `-- 19/ |
| | | `-- 093600.000/ |
| | | |-- action= |
| | | |-- artifact.id= |
| | | |-- artifact.namespace= |
| | | |-- artifact.projectId= |
| | | |-- artifact.version= |
| | | |-- remoteIP= |
| | | `-- user= |
| | |-- org.apache.archiva.metadata.repository.stats/ |
| | | `-- 2009/ |
| | | `-- 12/ |
| | | `-- 03/ |
| | | `-- 090000.000/ |
| | | |-- scanEndTime= |
| | | |-- scanStartTime= |
| | | |-- totalArtifactCount= |
| | | |-- totalArtifactFileSize= |
| | | |-- totalFileCount= |
| | | |-- totalGroupCount= |
| | | `-- totalProjectCount= |
| | `-- org.apache.archiva.reports/ |
| `-- references/ |
| `-- org/ |
| `-- apache/ |
| `-- archiva/ |
| |-- parent/ |
| | `-- 1/ |
| | `-- references/ |
| | `-- org/ |
| | `-- apache/ |
| | `-- archiva/ |
| | |-- platform/ |
| | | `-- scanner/ |
| | | `-- 1.0-SNAPSHOT/ |
| | | `-- referenceType=parent |
| | `-- web/ |
| | `-- webapp/ |
| | `-- 1.0-SNAPSHOT/ |
| | `-- referenceType=parent |
| `-- platform/ |
| `-- scanner/ |
| `-- 1.0-SNAPSHOT/ |
| `-- references/ |
| `-- org/ |
| `-- apache/ |
| `-- archiva/ |
| `-- web/ |
| `-- webapp/ |
| `-- 1.0-SNAPSHOT/ |
| `-- referenceType=dependency |
| ---- |
| |
| ~~ To update - run "tree --dirsfirst -F" on the unpacked content-model.zip from the sandbox |
| |
| This uses a typical content repository structure, where there is a path to a particular node (the last paths in |
| the structure above), and nodes can have properties and values (shown as <<<property=value>>> above). |
| |
| Properties with '.' may be nested in other representations such as Java models or XML, if appropriate - this is |
| the decision of the content repository persistence implementation. |
| |
| Additionally, while some information is stored at the most generic level in the metadata repository (eg |
| <<<maven:groupId>>>, <<<maven:artifactId>>>), for convenience when loaded by the implementation it may all be pushed |
| into the project version's information. The metadata repository implementation can decide how best to store and |
| retrieve the information. |
| |
| <Note:> Some of the properties have been put in place temporarily but need to be revisited - for example the use |
| of index counters for the lists of Maven POM information are not ideal, and some Maven specific aspects of |
| the dependencies should become faceted content |
| |
| The following sections walk through parts of the tree. |
| |
| ** Configuration section |
| |
| <Note:> The configuration section is not currently implemented in the code. It should be shadowed to a file on the |
| file system for easy editing and pre-configuration outside the server. A possible implementation is to use |
| the same storage and resolution mechanism to access the configuration so that this can be achieved, and it |
| can be loaded on the fly, etc. |
| |
| It is desirable to be able to access and modify all configuration through the same interfaces, so it is also stored |
| in the content repository. |
| |
| Each repository will have it's own metadata, but there also needs to be a server-level configuration for other parts |
| of the system. |
| |
| ** Content section |
| |
| The content section houses the information directly about the artifacts in the repository. As described in the |
| {{{./terminology.html} Terminology}} document, artifacts are described by the following coordinates (with the values |
| shown from the example above): |
| |
| * Namespace (<<<org.apache.archiva.platform>>>) |
| |
| * Project ID (<<<scanner>>>) |
| |
| * Project version (<<<1.0-SNAPSHOT>>>) |
| |
| * Artifact ID (<<<scanner-1.0-20091120.012345-1.pom>>>) |
| |
| [] |
| |
| Namespaces are of arbitrary depth, and are project namespaces, not to be confused with JCR's item/node namespaces. |
| A separate namespace and project identifier are retained to allow '.' in the project identifier without splitting, |
| while still allowing splitting on '.' in the namespace, when determining the most appropriate path for an artifact |
| in the content repository. The namespace may be null if there isn't one. |
| |
| Projects are very simple entities. They do not have subprojects - if such modeling needs to be done, then we |
| would create a "products" tree (or similar) that will map what "Archiva 1.0" contains as a collection of project |
| version nodes, for example. |
| |
| Each artifact in the repository will contain an entry, though not necessarily every file. For example, in a Maven |
| repository it is known that the <<<.md5>>>, <<<.sha1>>> and <<<.asc>>> files represent metadata about the artifact |
| of the same name, so that is attached to that node instead. |
| |
| Metadata is stored at the level most appropriate to that piece of information. This means that in a Maven |
| repository, while both the POM and other artifact(s) are considered be separate artifacts, they all share the |
| information in the POM that is stored at the project version or even project level. We only keep one set of project |
| information for a version - this differs from Maven's storage of one POM per snapshot. The Maven 2 module will take |
| the latest snapshot data and use that. Those that need Maven's behaviour should retrieve the POM directly. |
| |
| Note that artifact data is not stored in the metadata repository (there is no data= property on the file). The |
| information here is enough to locate the file in the original storage when it is requested. |
| |
| The following describes some of the metadata at each level. Note that the Maven extensions are covered here - these |
| are optional, and they wouldn't be present on a non-Maven storage repository. Likewise, plugins may store |
| additional metadata for each artifact. |
| |
| *** Namespace Metadata |
| |
| * <<<maven:groupId>>> - the Maven group ID |
| |
| *** Project Metadata |
| |
| * <<<maven:artifactId>>> - the Maven artifact ID |
| |
| *** Project Version Metadata |
| |
| * <<<created>>> - when the metadata was added to the repository (see [1] below) |
| |
| * <<<updated>>> - when the metadata was last updated (see [1] below) |
| |
| * <<<name>>> - human-readable project name |
| |
| * <<<description>>> - a human-readable description of this project |
| |
| * <<<url>>> - a URL to the project's documentation or other information |
| |
| * <<<organization[].*>>> - information about the organization |
| |
| * <<<licenses[].*>>> - the license the project source code is available under |
| |
| * <<<issueManagement.*>>> - the issue tracker used by the application |
| |
| * <<<ciManagement.*>>> - continuous integration server information |
| |
| * <<<dependencies[].*>>> - other projects that this project version depends on. Note that currently this contains |
| Maven specifics that are expected to be abstracted out into the Maven extensions |
| |
| * <<<individuals[].*>>> - participants in the development of, or otherwise associated with, the project |
| |
| * <<<scm.*>>> - information about the SCM used to store the project source code |
| |
| * <<<relocatedTo.*>>> - co-ordinates that this artifact has been relocated to |
| |
| * <<<maven:packaging>>> - the packaging value in the Maven POM |
| |
| * <<<maven:parent.*>>> - a reference to the Maven parent POM |
| |
| * <<<maven.plugins[].*>>> - references to build plugins in a Maven POM |
| |
| * <<<maven.repositories[].*>>> - references to other repositories in a Maven POM |
| |
| * <<<maven:buildExtensions[].*>>> - references to build extensions in a Maven POM |
| |
| * <<<maven:properties.*>>> - properties stored in a Maven POM |
| |
| [] |
| |
| Footnotes: |
| |
| [[1]] created/updated timestamps may be maintained by the metadata repository implementation for the metadata |
| itself. Timestamps for individual files are stored as additional properties (<<<fileCreated>>>, |
| <<<fileLastModified>>>). It may make sense to add a "discovered" timestamp if an artifact is known to be |
| created at a different time to which it is added to the metadata repository. |
| |
| ** Facets Section |
| |
| The facets section allows storage of other repository metadata for specific plugins. Each is named by the plugin's |
| unique identifier. |
| |
| *** Audit Logs (<<<org.apache.archiva.audit>>>) |
| |
| Audit logs are stored hierarchically by name, breaking down the date until getting to the timestamp of a particular |
| event. The event details are stored as properties of that node. Presently filtering by an action or other field |
| would require querying the content repository. |
| |
| * <<<action>>> - the action that was taken, such as uploading an artifact |
| |
| * <<<artifact.*>>> - the co-ordinates of the artifact affected |
| |
| * <<<remoteIP>>> - the IP address of the person executing the action, if applicable |
| |
| * <<<user>>> - the user affecting the action, if applicable |
| |
| [] |
| |
| A future possibility is to store audit metadata on artifacts themselves (who uploaded, when, and how), or whether it |
| was discovered by scanning. While this duplicates some information, it would reduce the need to query by a certain |
| artifact ID and the nodes could be lined referentially. |
| |
| Audit metadata may also need to be extended to other nodes such as configuration. In this case, it may make sense |
| to alter the artifact reference to a content repository path instead, or to utilise a native mechanism of the |
| content repository. |
| |
| *** Repository Statistics (<<<org.apache.archiva.metadata.repository.stats>>>) |
| |
| Like audit logs, repository statistics are stored by timestamp, marking the time a scan started. The results are |
| stored as properties of the scan: |
| |
| * <<<scanStartTime>>>, <<<scanEndTime>>> - when the scan ran from and until |
| |
| * <<<total*>>> - the statistics gathered about certain totals in the repository |
| |
| [] |
| |
| The current approach of tying statistics to the scanning process is not optimal, as it cannot be 'live'. We may |
| later determine if any of the stats can be derived by functions of the content repository rather than storing and |
| trying to keep them up to date. Historical data might be retained by versioning and taking a snapshot at a given |
| point in time. |
| |
| *** Problem Reports (<<<org.apache.archiva.reports>>>) |
| |
| While not shown above, the problem reporting plugin similarly stores a facet of information, recording particular |
| issues noticed in the repository such as invalid Maven POMs, etc. |
| |
| ** References Section |
| |
| The references section contains information about references to a given artifact. It is the inverse of the |
| dependency relationship. |
| |
| References are stored outside the main model so that their creation doesn't imply a "stub" model - we know if the |
| project exists whether a reference is created or not. References need not infer referential integrity. |
| |