mercury-artifact/notes.txt - maven-mercury - Git at Google

 ====
      Licensed to the Apache Software Foundation (ASF) under one
      or more contributor license agreements.  See the NOTICE file
      distributed with this work for additional information
      regarding copyright ownership.  The ASF licenses this file
      to you under the Apache License, Version 2.0 (the
      "License"); you may not use this file except in compliance
      with the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

      Unless required by applicable law or agreed to in writing,
      software distributed under the License is distributed on an
      "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
      KIND, either express or implied.  See the License for the
      specific language governing permissions and limitations
      under the License.
 ====
 Maven Artifact is supposed to be a general artifact mechanism for retrieving, installing, and deploying artifacts
 to repositories. Maven Artifact was originally decoupled from Maven proper and as such carries a lot of baggage
 which prevents it from being used generally and carries many notions that are very specific to Maven itself. Artifacts
 currently have a notion of scope, classifiers, and behavioral attributes such as whether scopes should be inherited.
 For any mechanism to work generally these baked in notions need to be removed, vetted, and then made compatible with
 notions currently in Maven. A list of things that should not be in the Artifact:

  * scope
  * classifier
  * dependency filter
  * dependency trail
  * resolved
  * released
  * optional
  * available versions

 These are all attributes of the target system

 *Removal of the ArtifactFactory

  3 February 2008 (Sunday)

  I have removed the factory and left only a small set of constructors (which I would like to reduce to one) so that you
  have a valid artifact after construction. I have also started to hide the VersionRange creation. You just pass in
  a string and the constructor for the DefaultArtifact will do the right thing. This will ultimately need to be more
  pluggable as different versioning strategies happen. But variations of the theme like Maven, OSGi, will have their
  own subclasses and tools to operate on the graphs of dependencies.

  4 February 2008 (Monday)

  John:
  Some notes about classifiers taken from the mailing list in a discussion with John about classifiers:
  I'd tend to disagree about classifier not being a 'core' part of the artifact system...it distinguishes a main
  artifact from one of its derivatives, and serves as a pretty foundational part of how we retrieve artifacts from existing
  remote repositories. Without it, I doubt that you can reconstruct the path to some existing artifacts (like sources or javadocs)
  reliably without bastardizing the version string.

  We can see that the artifact system has certain inescapable identity attributes. Scope is obviously more related
  to how an artifact is used, since you can't see any trace of scope in the artifact as it's been deployed on a remote
  repository. Classifier, however, doesn't fit this criteria...it's not a usage marker, but an identity marker.

  The rest I agree with.

  Jason:
  This is where I think you've already baked in what you think about Maven. Look at how we deploy our derivative
  artifacts right now. We don't track any of it in the metadata when we deploy. We toss it up there and things hope
  they are there. Like javadocs, or sources. I think what's more important is that the coordinate be unique and we
  have a way to associate what ever artifacts together in a scalable way. So you say "I want to associate this artifact
  with that one, this is how I would like to record that relationship in the metadata.". Subsequently you can query
  the metadata and know these relationships. We currently don't do this. It generally boils down to a bunch of
  coordinates in the repository. How we choose to relate them via the metadata. We have all sort of problems with
  classifiers currently because it was an adhoc method of association. A general model of association would be a
  superset of what we currently do for classifiers. I agree we need an mechanism for association, I don't think
  classifiers have worked all that well.

  5 February 2008 (Tuesday)

  The rework of the artifact resolution mechanism is an attempt to entirely separate 1) the process of metadata retrieval into
  a tree, 2) converting the tree to a graph by a process of conflict resolution, and 3) retrieving the complete set
  of artifacts, and ultimately 4) Doing something in a particular fashion withe the retrieved set like make a classpath.
  Currently we have an incremental processing model that doesn't let a complete graph be formed for analysis
  which greatly complicates the process whereas having a graph and using standard graph analysis techniques and graph
  optimization is the only reasonable way forward. There should be no doubt about what needs to be retrieved once the
  analysis is complete. We could actually create an aggregrate request where instructions are sent to retrieve everything
  required. The server could send a stream all the artifacts back in one shot.

  What Oleg is attempting to do is create a working solution for 1) and 2) above. Along with the implementation we also
  have a visualization tool that will help us determine what exactly the correct analysis is. The beauty of this is that
  regardless of the analysis we arrive at a representation of the complete set can be modeled and we can start working on
  the optimized retrieval mechanism. We still need to do some work to separate out 4) as we're doing some classpath
  calculations already which we will need to further decouple but that should be relatively straight forward.

  7 February 2008 (Friday)

  The number of methods in the artifact factory is simply insane, for each type that we ended up with in Maven just started
  being effectively hard-coded in the factory which is totally unscalable, any new types with handlers become a nightmare
  to maintain. I have reduced everything to two constructors in the DefaultArtifact and I would like to reduce it to being
  one. Right now I have to account for needing to use a version string, or creating a range which is completely confusing
  to anyone using the API. You should just need one constructor with a version string and everything else should be taken
  care of for you. Right now there are bits of code all over the place that do the if/else versionRange detection.

  inheritedScope goes away entirely from the model when a graph is used because the scope selected will be a function of
  how the graph is processed.

  24 May 2008

  1. Retrieval & Storage

     There is the task of retrieving a set of resources from a data source atomically. Simple, and safe retrieval. Period. This has nothing to do with
     dependency management per se, but is the basis of any safe and reliable dependency management system. We need to deal with repository corruption
     and recovery as well. The method employed by GIT with hierarchical checksums provides an efficient means to detect where in a repository corruption
     has occured to make sure the problem can be correct, shunted around, or simply bring it to the users attention.

  2. Representation Processing

     There is the task of processing the representation of an artifact. In the case of Maven an artifact's representation is encapsulated in
     a POM. If the representation refers to other representations i.e. dependencies then these have to be taken into account as well. The system
     may allow transitive processing and this is where the real power of a dependency management system comes into play. The representations are
     gathered into a tree structure where the flavour of the system imparts special processing on this tree to yield a graph.

  Once the representation has been processed and we have a graph, we fall back to the retrieval mechanism to place the desired artifacts in
  the storage system. Ultimately from this graph, according to the desired purpose we have set of artifacts that we can do something with.

 Processing

 I have come to the conclusion that providing the necessary support for version ranges cannot be done without a SAT solver, as we are
 approaching an NP complete problem and we're going to end up with an approximation and all the heavy lifting is being done already by SAT4J.

 The Quality of providers

 The new processing model will allow the complete manipulation of the resultant set either in memory, or in a location that does not pollute the
 local repository. So any provider must guarantee the safe retrieval of the set the artifacts from remote sources, but must also guarantee the safe
 placement of that set into the local repository and if the user aborts, or the machine crashes that the provider supplies a means to the
 process to clean up anything half-baked. We need to provide a single place where a journal can be written which can be easily
 detected and any action taken if partially complete operations have been detected. We know what can go wrong and every possible measure
 needs to make sure the repository  cannot be corrupted.
	====
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	====
	Maven Artifact is supposed to be a general artifact mechanism for retrieving, installing, and deploying artifacts
	to repositories. Maven Artifact was originally decoupled from Maven proper and as such carries a lot of baggage
	which prevents it from being used generally and carries many notions that are very specific to Maven itself. Artifacts
	currently have a notion of scope, classifiers, and behavioral attributes such as whether scopes should be inherited.
	For any mechanism to work generally these baked in notions need to be removed, vetted, and then made compatible with
	notions currently in Maven. A list of things that should not be in the Artifact:

	* scope
	* classifier
	* dependency filter
	* dependency trail
	* resolved
	* released
	* optional
	* available versions

	These are all attributes of the target system

	*Removal of the ArtifactFactory

	3 February 2008 (Sunday)

	I have removed the factory and left only a small set of constructors (which I would like to reduce to one) so that you
	have a valid artifact after construction. I have also started to hide the VersionRange creation. You just pass in
	a string and the constructor for the DefaultArtifact will do the right thing. This will ultimately need to be more
	pluggable as different versioning strategies happen. But variations of the theme like Maven, OSGi, will have their
	own subclasses and tools to operate on the graphs of dependencies.

	4 February 2008 (Monday)

	John:
	Some notes about classifiers taken from the mailing list in a discussion with John about classifiers:
	I'd tend to disagree about classifier not being a 'core' part of the artifact system...it distinguishes a main
	artifact from one of its derivatives, and serves as a pretty foundational part of how we retrieve artifacts from existing
	remote repositories. Without it, I doubt that you can reconstruct the path to some existing artifacts (like sources or javadocs)
	reliably without bastardizing the version string.

	We can see that the artifact system has certain inescapable identity attributes. Scope is obviously more related
	to how an artifact is used, since you can't see any trace of scope in the artifact as it's been deployed on a remote
	repository. Classifier, however, doesn't fit this criteria...it's not a usage marker, but an identity marker.

	The rest I agree with.

	Jason:
	This is where I think you've already baked in what you think about Maven. Look at how we deploy our derivative
	artifacts right now. We don't track any of it in the metadata when we deploy. We toss it up there and things hope
	they are there. Like javadocs, or sources. I think what's more important is that the coordinate be unique and we
	have a way to associate what ever artifacts together in a scalable way. So you say "I want to associate this artifact
	with that one, this is how I would like to record that relationship in the metadata.". Subsequently you can query
	the metadata and know these relationships. We currently don't do this. It generally boils down to a bunch of
	coordinates in the repository. How we choose to relate them via the metadata. We have all sort of problems with
	classifiers currently because it was an adhoc method of association. A general model of association would be a
	superset of what we currently do for classifiers. I agree we need an mechanism for association, I don't think
	classifiers have worked all that well.

	5 February 2008 (Tuesday)

	The rework of the artifact resolution mechanism is an attempt to entirely separate 1) the process of metadata retrieval into
	a tree, 2) converting the tree to a graph by a process of conflict resolution, and 3) retrieving the complete set
	of artifacts, and ultimately 4) Doing something in a particular fashion withe the retrieved set like make a classpath.
	Currently we have an incremental processing model that doesn't let a complete graph be formed for analysis
	which greatly complicates the process whereas having a graph and using standard graph analysis techniques and graph
	optimization is the only reasonable way forward. There should be no doubt about what needs to be retrieved once the
	analysis is complete. We could actually create an aggregrate request where instructions are sent to retrieve everything
	required. The server could send a stream all the artifacts back in one shot.

	What Oleg is attempting to do is create a working solution for 1) and 2) above. Along with the implementation we also
	have a visualization tool that will help us determine what exactly the correct analysis is. The beauty of this is that
	regardless of the analysis we arrive at a representation of the complete set can be modeled and we can start working on
	the optimized retrieval mechanism. We still need to do some work to separate out 4) as we're doing some classpath
	calculations already which we will need to further decouple but that should be relatively straight forward.

	7 February 2008 (Friday)

	The number of methods in the artifact factory is simply insane, for each type that we ended up with in Maven just started
	being effectively hard-coded in the factory which is totally unscalable, any new types with handlers become a nightmare
	to maintain. I have reduced everything to two constructors in the DefaultArtifact and I would like to reduce it to being
	one. Right now I have to account for needing to use a version string, or creating a range which is completely confusing
	to anyone using the API. You should just need one constructor with a version string and everything else should be taken
	care of for you. Right now there are bits of code all over the place that do the if/else versionRange detection.

	inheritedScope goes away entirely from the model when a graph is used because the scope selected will be a function of
	how the graph is processed.

	24 May 2008

	1. Retrieval & Storage

	There is the task of retrieving a set of resources from a data source atomically. Simple, and safe retrieval. Period. This has nothing to do with
	dependency management per se, but is the basis of any safe and reliable dependency management system. We need to deal with repository corruption
	and recovery as well. The method employed by GIT with hierarchical checksums provides an efficient means to detect where in a repository corruption
	has occured to make sure the problem can be correct, shunted around, or simply bring it to the users attention.

	2. Representation Processing

	There is the task of processing the representation of an artifact. In the case of Maven an artifact's representation is encapsulated in
	a POM. If the representation refers to other representations i.e. dependencies then these have to be taken into account as well. The system
	may allow transitive processing and this is where the real power of a dependency management system comes into play. The representations are
	gathered into a tree structure where the flavour of the system imparts special processing on this tree to yield a graph.

	Once the representation has been processed and we have a graph, we fall back to the retrieval mechanism to place the desired artifacts in
	the storage system. Ultimately from this graph, according to the desired purpose we have set of artifacts that we can do something with.

	Processing

	I have come to the conclusion that providing the necessary support for version ranges cannot be done without a SAT solver, as we are
	approaching an NP complete problem and we're going to end up with an approximation and all the heavy lifting is being done already by SAT4J.

	The Quality of providers

	The new processing model will allow the complete manipulation of the resultant set either in memory, or in a location that does not pollute the
	local repository. So any provider must guarantee the safe retrieval of the set the artifacts from remote sources, but must also guarantee the safe
	placement of that set into the local repository and if the user aborts, or the machine crashes that the provider supplies a means to the
	process to clean up anything half-baked. We need to provide a single place where a journal can be written which can be easily
	detected and any action taken if partially complete operations have been detected. We know what can go wrong and every possible measure
	needs to make sure the repository cannot be corrupted.