blob: 9b150176a633d412e9e6ea815119f67d9effd520 [file] [log] [blame]
<!DOCTYPE html>
<!--
| Generated by Apache Maven Doxia Site Renderer 1.8.1
| Rendered using Apache Maven Fluido Skin 1.6
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta http-equiv="Content-Language" content="en" />
<title>Archiva :: Modules &#x2013; Metadata Control Model</title>
<link rel="stylesheet" href="./css/apache-maven-fluido-1.6.min.css" />
<link rel="stylesheet" href="./css/site.css" />
<link rel="stylesheet" href="./css/print.css" media="print" />
<script type="text/javascript" src="./js/apache-maven-fluido-1.6.min.js"></script>
<link rel="stylesheet" type="text/css" href="https://archiva.apache.org/css/site.css" />
<!-- Google Analytics -->
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-140879-5']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</head>
<body class="topBarEnabled">
<a href="https://github.com/apache/archiva">
<img style="position: absolute; top: 0; right: 0; border: 0; z-index: 10000;"
src="https://s3.amazonaws.com/github/ribbons/forkme_right_gray_6d6d6d.png"
alt="Fork me on GitHub">
</a>
<div id="topbar" class="navbar navbar-fixed-top ">
<div class="navbar-inner">
<div class="container"><div class="nav-collapse">
<ul class="nav">
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Developers <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="index.html" title="Overview">Overview</a></li>
<li><a href="terminology.html" title="Terminology">Terminology</a></li>
<li><a href="metadata-content-model.html" title="Metadata Control Model">Metadata Control Model</a></li>
<li><a href="metadata-api.html" title="Metadata API">Metadata API</a></li>
<li><a href="repository-api.html" title="Repository API">Repository API</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Modules <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="archiva-base/index.html" title="Archiva :: Base">Archiva :: Base</a></li>
<li><a href="archiva-scheduler/index.html" title="Archiva :: Scheduler">Archiva :: Scheduler</a></li>
<li><a href="archiva-web/index.html" title="Archiva :: Web">Archiva :: Web</a></li>
<li><a href="metadata/index.html" title="Archiva :: Metadata">Archiva :: Metadata</a></li>
<li><a href="plugins/index.html" title="Archiva :: Core Plugins">Archiva :: Core Plugins</a></li>
<li><a href="archiva-maven/index.html" title="Archiva :: Maven">Archiva :: Maven</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Project Documentation <b class="caret"></b></a>
<ul class="dropdown-menu">
<li class="dropdown-submenu">
<a href="project-info.html" title="Project Information">Project Information</a>
<ul class="dropdown-menu">
<li><a href="index.html" title="About">About</a></li>
</ul>
</li>
<li class="dropdown-submenu">
<a href="project-reports.html" title="Project Reports">Project Reports</a>
<ul class="dropdown-menu">
<li><a href="xref/index.html" title="Source Xref">Source Xref</a></li>
<li><a href="checkstyle-aggregate.html" title="Checkstyle">Checkstyle</a></li>
<li><a href="apidocs/index.html" title="Javadoc">Javadoc</a></li>
</ul>
</li>
</ul>
</li>
</ul>
<form id="search-form" action="https://www.google.com/search" method="get" class="navbar-search pull-right" >
<input value="https://archiva.apache.org/ref/3.0.0-SNAPSHOT" name="sitesearch" type="hidden"/>
<input class="search-query" name="q" id="query" type="text" />
</form>
<script type="text/javascript">asyncJs( 'https://cse.google.com/brand?form=search-form' )</script>
<ul class="nav pull-right"><li>
<a href="https://twitter.com/archiva" class="twitter-follow-button" data-show-count="false" data-align="right" data-size="large" data-show-screen-name="true" data-lang="en">Follow archiva</a>
<script type="text/javascript">!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
</li></ul>
</div>
</div>
</div>
</div>
<div class="container">
<div id="banner">
<div class="pull-left"><a href="http://archiva.apache.org/index.html" id="bannerLeft"><img src="http://archiva.apache.org/images/archiva.png" alt="Apache Archiva"/></a></div>
<div class="pull-right"><a href="http://www.apache.org/" id="bannerRight"><img src="https://www.apache.org/images/asf_logo_wide_2016.png" alt="Apache Software Foundation"/></a></div>
<div class="clear"><hr/></div>
</div>
<div id="breadcrumbs">
<ul class="breadcrumb">
<li class=""><a href="https://www.apache.org" class="externalLink" title="Apache">Apache</a><span class="divider">/</span></li>
<li class=""><a href="../../index.html" title="Archiva">Archiva</a><span class="divider">/</span></li>
<li class=""><a href="index.html" title="Archiva Modules">Archiva Modules</a><span class="divider">/</span></li>
<li class="active ">Metadata Control Model</li>
<li id="publishDate" class="pull-right"><span class="divider">|</span> Last Published: 2019-11-30</li>
<li id="projectVersion" class="pull-right">Version: 3.0.0-SNAPSHOT</li>
</ul>
</div>
<div id="bodyColumn" >
<div class="section">
<h2><a name="Metadata_Content_Model"></a>Metadata Content Model</h2>
<p>The metadata repository stores all known information about a repository in a common format that other plugins can understand, and that eventually external applications will be able to query.</p>
<p>The content model is designed such that it models the most likely structure of the data both for storage and retrieval. For example, audit logs are stored by the time they occur, not grouped under an action.</p>
<div class="section">
<h3><a name="Content_Model_Structure"></a>Content Model Structure</h3>
<p>The following is a sample tree that represents the content model:</p>
<div>
<pre>.
`-- repositories/
`-- central/
|-- config/
| |-- name=
| |-- storageUrl=
| `-- uri=
|-- content/
| `-- org/
| `-- apache/
| |-- archiva/
| | `-- platform/
| | |-- scanner/
| | | |-- 1.0-SNAPSHOT/
| | | | |-- scanner-1.0-20091120.012345-1.pom/
| | | | | |-- asc=
| | | | | |-- created=
| | | | | |-- fileCreated=
| | | | | |-- fileLastModified=
| | | | | |-- maven:buildNumber=
| | | | | |-- maven:classifier
| | | | | |-- maven:timestamp=
| | | | | |-- maven:type=
| | | | | |-- md5=
| | | | | |-- sha1=
| | | | | |-- size=
| | | | | |-- updated=
| | | | | `-- version=
| | | | |-- ciManagement.system=
| | | | |-- ciManagement.url=
| | | | |-- created=
| | | | |-- dependencies.0.artifactId=
| | | | |-- dependencies.0.classifier=
| | | | |-- dependencies.0.groupId=
| | | | |-- dependencies.0.optional=
| | | | |-- dependencies.0.scope=
| | | | |-- dependencies.0.systemPath=
| | | | |-- dependencies.0.type=
| | | | |-- dependencies.0.version=
| | | | |-- description=
| | | | |-- individuals.0.email=
| | | | |-- individuals.0.name=
| | | | |-- individuals.0.properties.scmId=
| | | | |-- individuals.0.roles.0=
| | | | |-- individuals.0.timezone=
| | | | |-- issueManagement.system=
| | | | |-- issueManagement.url=
| | | | |-- licenses.0.name=
| | | | |-- licenses.0.url=
| | | | |-- mailingLists.0.mainArchiveUrl=
| | | | |-- mailingLists.0.name=
| | | | |-- mailingLists.0.otherArchives.0=
| | | | |-- mailingLists.0.postAddress=
| | | | |-- mailingLists.0.subscribeAddress=
| | | | |-- mailingLists.0.unsubscribeAddress=
| | | | |-- maven:buildExtensions.0.artifactId=
| | | | |-- maven:buildExtensions.0.groupId=
| | | | |-- maven:buildExtensions.0.version=
| | | | |-- maven:packaging=
| | | | |-- maven:parent.artifactId=
| | | | |-- maven:parent.groupId=
| | | | |-- maven:parent.version=
| | | | |-- maven:plugins.0.artifactId=
| | | | |-- maven:plugins.0.groupId=
| | | | |-- maven:plugins.0.reporting=
| | | | |-- maven:plugins.0.version=
| | | | |-- maven:properties.mavenVersion=
| | | | |-- maven:repositories.0.id=
| | | | |-- maven:repositories.0.layout=
| | | | |-- maven:repositories.0.name=
| | | | |-- maven:repositories.0.plugins=
| | | | |-- maven:repositories.0.releases=
| | | | |-- maven:repositories.0.snapshots=
| | | | |-- maven:repositories.0.url=
| | | | |-- name=
| | | | |-- organization.favicon=
| | | | |-- organization.logo=
| | | | |-- organization.name=
| | | | |-- organization.url=
| | | | |-- relocatedTo.namespace=
| | | | |-- relocatedTo.project=
| | | | |-- relocatedTo.projectVersion=
| | | | |-- scm.connection=
| | | | |-- scm.developerConnection=
| | | | |-- scm.url=
| | | | |-- updated=
| | | | `-- url=
| | | `-- maven:artifactId=
| | `-- maven:groupId=
| `-- maven/
| `-- plugins/
| |-- maven:groupId=
| |-- maven:plugins.compiler.artifactId=
| `-- maven:plugins.compiler.name=
|-- facets/
| |-- org.apache.archiva.audit/
| | `-- 2010/
| | `-- 01/
| | `-- 19/
| | `-- 093600.000/
| | |-- action=
| | |-- artifact.id=
| | |-- artifact.namespace=
| | |-- artifact.projectId=
| | |-- artifact.version=
| | |-- remoteIP=
| | `-- user=
| |-- org.apache.archiva.metadata.repository.stats/
| | `-- 2009/
| | `-- 12/
| | `-- 03/
| | `-- 090000.000/
| | |-- scanEndTime=
| | |-- scanStartTime=
| | |-- totalArtifactCount=
| | |-- totalArtifactFileSize=
| | |-- totalFileCount=
| | |-- totalGroupCount=
| | `-- totalProjectCount=
| `-- org.apache.archiva.reports/
`-- references/
`-- org/
`-- apache/
`-- archiva/
|-- parent/
| `-- 1/
| `-- references/
| `-- org/
| `-- apache/
| `-- archiva/
| |-- platform/
| | `-- scanner/
| | `-- 1.0-SNAPSHOT/
| | `-- referenceType=parent
| `-- web/
| `-- webapp/
| `-- 1.0-SNAPSHOT/
| `-- referenceType=parent
`-- platform/
`-- scanner/
`-- 1.0-SNAPSHOT/
`-- references/
`-- org/
`-- apache/
`-- archiva/
`-- web/
`-- webapp/
`-- 1.0-SNAPSHOT/
`-- referenceType=dependency</pre></div>
<p>This uses a typical content repository structure, where there is a path to a particular node (the last paths in the structure above), and nodes can have properties and values (shown as <tt>property=value</tt> above).</p>
<p>Properties with '.' may be nested in other representations such as Java models or XML, if appropriate - this is the decision of the content repository persistence implementation.</p>
<p>Additionally, while some information is stored at the most generic level in the metadata repository (eg <tt>maven:groupId</tt>, <tt>maven:artifactId</tt>), for convenience when loaded by the implementation it may all be pushed into the project version's information. The metadata repository implementation can decide how best to store and retrieve the information.</p>
<p><i>Note:</i> Some of the properties have been put in place temporarily but need to be revisited - for example the use of index counters for the lists of Maven POM information are not ideal, and some Maven specific aspects of the dependencies should become faceted content</p>
<p>The following sections walk through parts of the tree.</p>
<div class="section">
<h4><a name="Configuration_section"></a>Configuration section</h4>
<p><i>Note:</i> The configuration section is not currently implemented in the code. It should be shadowed to a file on the file system for easy editing and pre-configuration outside the server. A possible implementation is to use the same storage and resolution mechanism to access the configuration so that this can be achieved, and it can be loaded on the fly, etc.</p>
<p>It is desirable to be able to access and modify all configuration through the same interfaces, so it is also stored in the content repository.</p>
<p>Each repository will have it's own metadata, but there also needs to be a server-level configuration for other parts of the system.</p></div>
<div class="section">
<h4><a name="Content_section"></a>Content section</h4>
<p>The content section houses the information directly about the artifacts in the repository. As described in the <a href="./terminology.html"> Terminology</a> document, artifacts are described by the following coordinates (with the values shown from the example above):</p>
<ul>
<li>Namespace (<tt>org.apache.archiva.platform</tt>)</li>
<li>Project ID (<tt>scanner</tt>)</li>
<li>Project version (<tt>1.0-SNAPSHOT</tt>)</li>
<li>Artifact ID (<tt>scanner-1.0-20091120.012345-1.pom</tt>)</li></ul>
<p>Namespaces are of arbitrary depth, and are project namespaces, not to be confused with JCR's item/node namespaces. A separate namespace and project identifier are retained to allow '.' in the project identifier without splitting, while still allowing splitting on '.' in the namespace, when determining the most appropriate path for an artifact in the content repository. The namespace may be null if there isn't one.</p>
<p>Projects are very simple entities. They do not have subprojects - if such modeling needs to be done, then we would create a &quot;products&quot; tree (or similar) that will map what &quot;Archiva 1.0&quot; contains as a collection of project version nodes, for example.</p>
<p>Each artifact in the repository will contain an entry, though not necessarily every file. For example, in a Maven repository it is known that the <tt>.md5</tt>, <tt>.sha1</tt> and <tt>.asc</tt> files represent metadata about the artifact of the same name, so that is attached to that node instead.</p>
<p>Metadata is stored at the level most appropriate to that piece of information. This means that in a Maven repository, while both the POM and other artifact(s) are considered be separate artifacts, they all share the information in the POM that is stored at the project version or even project level. We only keep one set of project information for a version - this differs from Maven's storage of one POM per snapshot. The Maven 2 module will take the latest snapshot data and use that. Those that need Maven's behaviour should retrieve the POM directly. </p>
<p>Note that artifact data is not stored in the metadata repository (there is no data= property on the file). The information here is enough to locate the file in the original storage when it is requested.</p>
<p>The following describes some of the metadata at each level. Note that the Maven extensions are covered here - these are optional, and they wouldn't be present on a non-Maven storage repository. Likewise, plugins may store additional metadata for each artifact.</p>
<div class="section">
<h5><a name="Namespace_Metadata"></a>Namespace Metadata</h5>
<ul>
<li><tt>maven:groupId</tt> - the Maven group ID</li></ul></div>
<div class="section">
<h5><a name="Project_Metadata"></a>Project Metadata</h5>
<ul>
<li><tt>maven:artifactId</tt> - the Maven artifact ID</li></ul></div>
<div class="section">
<h5><a name="Project_Version_Metadata"></a>Project Version Metadata</h5>
<ul>
<li><tt>created</tt> - when the metadata was added to the repository (see [1] below)</li>
<li><tt>updated</tt> - when the metadata was last updated (see [1] below)</li>
<li><tt>name</tt> - human-readable project name</li>
<li><tt>description</tt> - a human-readable description of this project</li>
<li><tt>url</tt> - a URL to the project's documentation or other information</li>
<li><tt>organization[].*</tt> - information about the organization</li>
<li><tt>licenses[].*</tt> - the license the project source code is available under</li>
<li><tt>issueManagement.*</tt> - the issue tracker used by the application</li>
<li><tt>ciManagement.*</tt> - continuous integration server information</li>
<li><tt>dependencies[].*</tt> - other projects that this project version depends on. Note that currently this contains Maven specifics that are expected to be abstracted out into the Maven extensions</li>
<li><tt>individuals[].*</tt> - participants in the development of, or otherwise associated with, the project</li>
<li><tt>scm.*</tt> - information about the SCM used to store the project source code</li>
<li><tt>relocatedTo.*</tt> - co-ordinates that this artifact has been relocated to</li>
<li><tt>maven:packaging</tt> - the packaging value in the Maven POM</li>
<li><tt>maven:parent.*</tt> - a reference to the Maven parent POM</li>
<li><tt>maven.plugins[].*</tt> - references to build plugins in a Maven POM</li>
<li><tt>maven.repositories[].*</tt> - references to other repositories in a Maven POM</li>
<li><tt>maven:buildExtensions[].*</tt> - references to build extensions in a Maven POM</li>
<li><tt>maven:properties.*</tt> - properties stored in a Maven POM</li></ul>
<p>Footnotes:</p>
<ol style="list-style-type: decimal">
<li>created/updated timestamps may be maintained by the metadata repository implementation for the metadata itself. Timestamps for individual files are stored as additional properties (<tt>fileCreated</tt>, <tt>fileLastModified</tt>). It may make sense to add a &quot;discovered&quot; timestamp if an artifact is known to be created at a different time to which it is added to the metadata repository.</li></ol></div></div>
<div class="section">
<h4><a name="Facets_Section"></a>Facets Section</h4>
<p>The facets section allows storage of other repository metadata for specific plugins. Each is named by the plugin's unique identifier.</p>
<div class="section">
<h5><a name="Audit_Logs_.28org.apache.archiva.audit.29"></a>Audit Logs (<tt>org.apache.archiva.audit</tt>)</h5>
<p>Audit logs are stored hierarchically by name, breaking down the date until getting to the timestamp of a particular event. The event details are stored as properties of that node. Presently filtering by an action or other field would require querying the content repository.</p>
<ul>
<li><tt>action</tt> - the action that was taken, such as uploading an artifact</li>
<li><tt>artifact.*</tt> - the co-ordinates of the artifact affected</li>
<li><tt>remoteIP</tt> - the IP address of the person executing the action, if applicable</li>
<li><tt>user</tt> - the user affecting the action, if applicable</li></ul>
<p>A future possibility is to store audit metadata on artifacts themselves (who uploaded, when, and how), or whether it was discovered by scanning. While this duplicates some information, it would reduce the need to query by a certain artifact ID and the nodes could be lined referentially.</p>
<p>Audit metadata may also need to be extended to other nodes such as configuration. In this case, it may make sense to alter the artifact reference to a content repository path instead, or to utilise a native mechanism of the content repository.</p></div>
<div class="section">
<h5><a name="Repository_Statistics_.28org.apache.archiva.metadata.repository.stats.29"></a>Repository Statistics (<tt>org.apache.archiva.metadata.repository.stats</tt>)</h5>
<p>Like audit logs, repository statistics are stored by timestamp, marking the time a scan started. The results are stored as properties of the scan:</p>
<ul>
<li><tt>scanStartTime</tt>, <tt>scanEndTime</tt> - when the scan ran from and until</li>
<li><tt>total*</tt> - the statistics gathered about certain totals in the repository</li></ul>
<p>The current approach of tying statistics to the scanning process is not optimal, as it cannot be 'live'. We may later determine if any of the stats can be derived by functions of the content repository rather than storing and trying to keep them up to date. Historical data might be retained by versioning and taking a snapshot at a given point in time. </p></div>
<div class="section">
<h5><a name="Problem_Reports_.28org.apache.archiva.reports.29"></a>Problem Reports (<tt>org.apache.archiva.reports</tt>)</h5>
<p>While not shown above, the problem reporting plugin similarly stores a facet of information, recording particular issues noticed in the repository such as invalid Maven POMs, etc.</p></div></div>
<div class="section">
<h4><a name="References_Section"></a>References Section</h4>
<p>The references section contains information about references to a given artifact. It is the inverse of the dependency relationship.</p>
<p>References are stored outside the main model so that their creation doesn't imply a &quot;stub&quot; model - we know if the project exists whether a reference is created or not. References need not infer referential integrity.</p></div></div></div>
</div>
</div>
<hr/>
<footer>
<div class="container">
<div class="row">
<p>Copyright &copy;2006&#x2013;2019
<a href="http://www.apache.org/">The Apache Software Foundation</a>.
All rights reserved.</p>
</div>
<p id="poweredBy" class="pull-right"> <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"><img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" /></a>
</p>
<div id="ohloh" class="pull-right">
<script type="text/javascript" src="https://www.ohloh.net/p/6670/widgets/project_thin_badge.js"></script>
</div>
</div>
</footer>
</body>
</html>