blob: 7569046351972f4c2562d522e6ded45307ff12e6 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
*******************
SCM tools in Allura
*******************
The interface API and most of the controller and view structure of
code repository type apps is defined in the base classes in the
Allura package, which consists of the classes in the following
packages:
* `allura.lib.repository` for the base application and implementation classes
* `allura.controllers.repository` for the base controllers
* `allura.model.repository` for the repo metadata models
* `allura.model.repo_refresh` for the repo metadata refresh logic
Application and Implementation
------------------------------
The `Application` structure for SCM apps follows the normal pattern for
Allura applications, though they should inherit from
`allura.lib.repository.RepositoryApp` instead of `allura.app.Application`.
The apps are then responsible for implementing subclasses of
`allura.lib.repository.Repository` and `allura.lib.repository.RepositoryImplementation`.
The `Repository` subclass is responsible for implementing tool-specific
logic using the metadata models and proxying the rest of its logic to the
`RepositoryImplementation` subclass, which talks directly to the underlying
SCM tool, such as `GitPython`, `Mercurial`, or `pysvn`.
Historically, more was done in the `Repository` subclass using the metadata
models, but we are trying to move away from those toward making the SCM apps
thin wrappers around the underlying SCM tool (see `Indexless`_, below).
Controller / View Dispatch
--------------------------
All of the SCM apps use the base controllers in `allura.controllers.repository`
with only minimal customization through subclassing (primarily to
override the template for displaying instructions for setting up a newly
created repository), so the dispatch for all SCM apps follows the same
pattern.
The root controller for SCM apps is `allura.controllers.repository.RepoRootController`.
This controller has views for repo-level actions, such as forking and merging,
and the SCM app attaches a refs and a commits (ci) controller to dispatch symbolic
references and explicit commit IDs, respectively. (This should be refactored to be
done in the `RepoRootController` so that the dispatch can be followed more easily.)
(Also, `ForgeSVN` actually eschews this class and uses `BranchBrowser` directly as
its root, in order to tweak the URL slightly, but it monkeypatches the relevant
views over, so the dispatch ends up working more or less the same.)
The refs controller, `allura.controllers.repository.RefsController`, handles
symbolic references, and in particular handles custom escape handling to detect
what is part of the ref name vs part of the remainder of the URL to dispatch.
This is then handed off to the `BranchBrowserClass` which is a pointer to
the implementation within the specific SCM app which handles the empty
repo instructions or hands it back to the generic commits controller.
The commits controller, `allura.controllers.repository.CommitsController`,
originally only handled explicit commit IDs, but was modified to allow for
persistent symbolic refs in the URL ("/p/allura/git/ci/master/", e.g.),
it was changed to have the same escape parsing logic as the refs controller.
Regardless, it just parses out the reference / ID from the URL and hands
off to the commit browser.
The commit browser, `allura.controllers.repository.CommitBrowser`, holds
the views related to a specific commit, such as viewing the commit details
(message and changes), a log of the commit history starting with the commit,
or creating a snapshot of the code as of the commit. It also has a "tree"
endpoint for browsing the file system tree as of the commit.
The tree browser, `allura.controllers.repository.TreeBrowser`, holds the
view for viewing the file system contents at a specific path for a given
commit. The only complication here is that, instead of parsing out the
entire tree path at once, it recursively dispatches to itself to build
up the path a piece at a time. Tree browsing also depends on the
`Last Commit Logic`_ to create the data needed to display the last
commit that touched each file or directory within the given directory.
Last Commit Logic
-----------------
Determining which commit was the last to touch a given set of files or
directories can be complicated, depending on the specific SCM tool.
Git and Mercurial require manually walking up the commit history to
discover this information, while SVN can return it all from a single
`info2` command (though the SVN call will be significantly slower than
any individual call to Git or Mercurial). Because this can sometimes
be costly to generate, it is cached via the `allura.model.repository.LastCommit`
model. This will generate the data on demand by calling the underlying
SCM tool, if necessary, but the data is currently pre-generated during
the post-push refresh logic for Git and Mercurial.
The overall logic for generating this data for Git and Mercurial is as follows:
1. All items modified in the current commit get their info from the
current commit
2. The info for the remaining items is found:
* If there is a `LastCommit` record for the parent directory, all of
the remaining items get their info from the previous `LastCommit`
* Otherwise, the list of remaining items is sent to the SCM implementation,
which repeatedly asks the SCM for the last commit to touch any of the
items for which we are missing information, removing items from the list
as commits are reported as having modified them
3. Once all of the items have information, or if a processing timeout is reached,
the gathered information is saved in the `LastCommit` model and returned
Indexless
---------
Currently, there are model classes which encapsulate SCM metadata
(such as commits, file system structure, etc) in a generic (agnostic to
the underlying tool implementation) way. However, this means that we're
duplicating in mongo a lot of data that is already tracked by the
underlying SCM tool, and this data must also be indexed for new repos
and after every subsequent push before the commits or files are browsable
via the web interface.
To minimize this duplication of data and reduce or eliminate the delay
between commits being pushed and them being visible, we are trying to
move toward a lightweight API layer that requests the data from the
underlying SCM tool directly, with intelligent caching at the points
and in the format that makes the most sense to make rendering the SCM
pages as fast as possible.