blob: 767be5ed17e331879d85237a65e23a7c5066d19e [file] [log] [blame]
Implementing Incomplete Directory Support in SVN
#########################################################################
### ###
### Note: Although this feature was called "incomplete directories" ###
### while under development, we might want to call it something ###
### else when it goes live. "Incomplete" makes it sound like ###
### there's something wrong with the directory, something missing. ###
### Perhaps "sparse directories" or "partial directories" would be ###
### less user-frightening. ###
### ###
#########################################################################
Contents
========
1. Design
2. User Interface
3. Examples
4. Implementation Strategy
5. Current Status
1. Design
=========
This design document started out as a post by Eric Gillespie:
http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=117053
From: Eric Gillespie <epg@pretzelnet.org>
To: dev@subversion.tigris.org
Subject: [PROPOSAL] Incomplete working copies (issue #695)
Date: Thu, 22 Jun 2006 22:35:06 -0700
Message-ID: <25668.1151040906@gould.diplodocus.org>
[The design has evolved since then; the text below is not exactly
the same as what Eric posted, but has the same general ideas.]
I'd like to propose a new solution to this issue, and hopefully get
it into 1.5. What i'm really looking for is the kind of
flexibility Perforce has with its client specs in which parts of a
tree you check out.
I don't think Ben Reser's proposal
(http://svn.haxx.se/dev/archive-2005-07/0398.shtml) covers this.
Using his first example, there is no way to avoid pulling in
trunk/foo/images/another-big-dir when it is added.
This is based on an idea from Karl Fogel.
Implementing Incomplete Directory Support in SVN
==================================================
Many users have very large trees of which they only want to
checkout certain parts. checkout -N is not today up to this task.
This proposal introduces the --depth option to the checkout,
switch, and update subcommands as a replacement for -N, which
allows working copies to have very specific contents, leaving out
everything the user does not want.
This is similar to Perforce's client specs, but without the ability
to have a repository entry have a different name in the working
copy. We actually already have this capability in switch.
Depth:
We have a new "depth" field in .svn/entries, which has (currently)
four possible values: depth-empty, depth-files, depth-immediates,
and depth-infinity. Only this_dir entries may have depths other
than depth-infinity.
depth-empty ------> Updates will not pull in any files or
subdirectories not already present.
depth-files ------> Updates will pull in any files not already
present, but not subdirectories.
depth-immediates -> Updates will pull in any files or
subdirectories not already present; those
subdirectories' this_dir entries will
have depth-empty.
depth-infinity ---> Updates will pull in any files or
subdirectories not already present; those
subdirectories' this_dir entries will
have depth-infinity. Equivalent to
today's default update behavior.
The --depth option sets depth values as it updates the working
copy, setting any new subdirectories' this_dir depth values as
described above.
2. User interface
=================
Affected commands:
* checkout
* switch
* update
* status
* info
The -N option becomes a synonym for --depth=files for these commands.
This changes the existing -N behavior for these commands, but in a
trivial way (see below).
checkout without --depth or -N behaves the same as it does today.
switch and update without --depth or -N behave the same way as
today IFF the working copy is fully depth-infinity. switch and
update without --depth or -N will NOT change depth values
(exception: a missing directory specified on the command line will
be pulled in).
Thus, 'checkout' is identical to 'checkout --depth=infinity', but
'switch' and 'update' are not the same as 'switch --depth=infinity' and
'update --depth=infinity'. The former update entries according to
existing depth values, while the latter pull in everything.
To get started, run checkout with --depth=empty or --depth=files.
If additional files or directories are desired, pull them in with
update commands using appropriate --depth options.
The 'svn status' should list the depth status of the directories, in
addition to whatever statuses are being currently listed.
The 'svn info' command should list the depth, IFF invoked on a directory.
[I believe it already does, on this branch. -kfogel]
3. Examples
===========
svn co http://.../A
Same as today; everything has depth-infinity.
svn co -N http://.../A
Today, this creates wc containing only mu. Now, this will be
identical to 'svn co --depth=files /A'.
svn co --depth=empty http://.../A Awc
Creates wc Awc, but *empty*.
Awc/.svn/entries this_dir depth-empty
svn co --depth=files http://.../A Awc1
Creates wc Awc1 with all files (i.e., Awc1/mu) but no
subdirectories.
Awc1/.svn/entries this_dir depth-files
...
svn co --depth=immediates http://.../A Awc2
Creates wc Awc2 with all files and all subdirectories, but
subdirectories are *empty*.
Awc2/.svn/entries this_dir depth-immediates
B
C
Awc2/B/.svn/entries this_dir depth-empty
Awc2/C/.svn/entries this_dir depth-empty
...
svn up Awc/B:
Since B is not yet checked out, add it at depth infinity.
Awc/.svn/entries this_dir depth-empty
B
Awc/B/.svn/entries this_dir depth-infinity
...
Awc/B/E/.svn/entries this_dir depth-infinity
...
...
svn up Awc
Since A is already checked out, don't change its depth, just
update it. B and everything under it is at depth-infinity,
so it will be updated just as today.
svn up --depth=immediates Awc/D
Since D is not yet checked out, add it at depth-immediates.
Awc/.svn/entries this_dir depth-empty
B
D
Awc/D/.svn/entries this_dir depth-immediates
...
Awc/D/G/.svn/entries this_dir depth-empty
...
svn up --depth=empty Awc/B/E
Remove everything under E, but leave E as an empty directory
since B is depth-infinity.
Awc/.svn/entries this_dir depth-empty
B
D
Awc/B/.svn/entries this_dir depth-infinity
...
Awc/B/E/.svn/entries this_dir depth-empty
...
svn up --depth=empty Awc/D
Remove everything under D, and D itself since A is depth-empty.
Awc/.svn/entries this_dir depth-empty
B
svn up Awc/D
Bring D back at depth-infinity.
Awc/.svn/entries this_dir depth-empty
...
Awc/D/.svn/entries this_dir depth-infinity
...
...
svn up --depth=immediates Awc
Bring in everything that's missing (C/ and mu) and empty all
subdirectories (and set their this_dir to depth-empty).
Awc/.svn/entries this_dir depth-immediates
B
C
Awc/B/.svn/entries this_dir depth-empty
Awc/C/.svn/entries this_dir depth-empty
...
4. Implementation Strategy
==========================
It would be nice if all this could be accomplished with just simple
tweaks to how we drive the update reporter (svn_ra_reporter2_t).
However, it looks like it's not going to be that easy.
Handling 'checkout --depth=empty' would be easy. It should get us
an empty directory at depth-empty, with no files and no subdirs,
and if we just report it as at HEAD every time, the server will
never send updates down (hmmm, this could be a problem for getting
dir property updates, though). Then any files or subdirs we have
explicitly included we can just report at their respective
revisions, and get proper updates; at least that'll work for the
depth infinity ones.
But consider 'checkout --depth=immediates'. The desired state is a
depth-files directory D, with all files up-to-date, and with
skeleton subdirs at depth empty. Plain updates should preserve this
state of affairs.
If we report D as at its BASE revision, files at their BASE
revisions, and subdirs at HEAD, then:
- When new files appear in the repos, they'll get sent down (good)
- When new subdirs appear, they'll get sent down in full (bad)
But if we don't report subdirs as at HEAD, then the server will try to
update them (bad). And if we report D at HEAD, then the working copy
won't receive new files that have appeared in the repository since D's
BASE revision (note that we *can* get updates for files we already
have, though, by continuing to report them at their respective BASEs).
The same logic applies to subdirectories at depth-files or
depth-immediates.
So, I think this means that for efficient depth handling, we'll
need to have the client directly reporting the desired depth to the
server; i.e., extending the RA protocol.
Meanwhile, legacy servers will send back a bunch of information the
client doesn't want, and the client will just ignore it, and
everything will be slower than it needs to be, and people will
complain on the users@ list, and we'll tell them to upgrade their
servers, and they'll say they can't because they don't have control
over the server, and we'll say "So? This ain't no Grand Hotel!"
5. Current Status
=================
http://svn.collab.net/repos/svn/branches/incomplete-directories/
contains the latest code.
*** The most important thing to know is that the branch code ***
*** implements an earlier three-depth scheme (0, 1, infinity) ***
*** and does not yet reflect the new four-depth universe. ***
A new enum type 'svn_depth_t depth' is defined in svn_types.h.
Both client and server side now understand the concept of depth,
and the basic update use cases handle depth. See depth_tests.py
for what is known to be working. (Many edge cases are not yet
handled correctly.)
On the client side, most of the svn_client.h interfaces that
formerly took 'svn_boolean_t recurse' now take 'svn_depth_t depth'.
Some of this recurse-becomes-depth change has propagated down into
libsvn_wc, which now stores a depth field in svn_wc_entry_t (and
therefore in .svn/entries). The update reporter knows to report
differing depths to the server, in the same way it already reports
differing revisions. In other words, take the concept of "mixed
revision" working copies and extend it to "mixed depth" working
copies.
On the server side, most of the significant changes are in
libsvn_repos/reporter.c. The code that receives update reports now
receives notice of paths that have different depths from their
parent, and of course the overall update operation has a global
depth, which applies whenever not shadowed by some local depth for
a given path.
The RA code on both sides knows how to send and receive depths; the
relevant svn_ra_* APIs now take depth arguments, which sometimes
supersede older 'recurse' booleans. In these cases, the RA layer
does the usual compatibility dance: receiving "recurse=FALSE" from
an older client causes the server to behave as if "depth=immediates"
had been transmitted.
Work remaining, in no particular order:
* There is still no compatibility code for new clients dealing
with old servers. This is the "legacy servers will send back
a bunch of information the client doesn't want" scenario
described in the 'Implementation' section above. The client
doesn't know how to ignore information it doesn't want yet,
it'll just do whatever the server tells it. I'm not sure how
useful the compatibility mode is, since it wouldn't really
shorten the wall clock time of the operations by much,
although it would still save the disk space.
* There's no interface for getting rid of stuff once you've
brought it into your working copy -- no "exclusion" interface,
in other words. So you can do this:
$ svn co --depth=empty http://.../repos/greek-tree/
$ cd greek-tree
$ svn up A
...and that will get you A/ at depth-infinity. But once you
no longer need A/, there's no way to do:
$ svn exclude A ## or whatever the command is named
In fact, you can't yet even do:
$ svn up --depth=empty A
...to at least "fold up" A/ and save the disk space.
* I've put a lot of "### TODO" comments on the branch. Do a
branch diff to see them.
* Certain APIs need to behave specially when passed
svn_depth_unknown: they need to treat it as meaning "go get
the depth from the working copy, and either use it directly or
calculate the appropriate depth based on it".
Right now, only svn_client_checkout3 really does this
properly (see r21910 and r21829). Other APIs that probably
should do it are:
svn_client_diff4()
svn_client_diff_summarize2()
svn_client_diff_summarize_peg2()
svn_client_diff_peg3()
svn_client_merge3()
svn_client_merge_peg3()
svn_client_update3()
* Some APIs still take recurse booleans. It's not clear to me
that all of these should be switched to depth, but the
question needs some more consideration:
svn_client_import() # Note: takes 'nonrecursive' right now
svn_client_revert() # Any point taking depth?
svn_client_commit4() # Is manual control of depth needed here?
svn_client_propset2() # Same question here.
svn_client_propget2() # Same question here.
svn_client_proplist2() # Same question here.
svn_client_resolved() # Same question here.
* Small bug in how depth is stored in entries file format:
Suppose we have A at depth:empty and A/B at depth:files.
A/.svn/entries will have a short "dir" entry for B.
Naturally, that entry does not mention B's revision or other
details, because that stuff should live in A/B/.svn/entries.
But for some reason, A/.svn/entries *does* list B's depth.
That's bad. It shouldn't talk about B's depth, one should go
look in A/B/.svn/entries for B's depth. I'm sure this is a
simple fix in the entry reading/writing code, just haven't had
a chance to chase it down yet.
* I haven't done anything with 'svn status' yet, don't know if
it would behave correctly w.r.t. depths out of the box or not.
Clearly, this needs investigation.
* I haven't done anything with either 'svn switch --depth' or
'svn switch' handling mixed-depth working copies automatically.
Probably some bits work right now, and other bits don't.
* All of my testing has been over svn:// and sometimes local://.
All the necessary changes are in for http:// as well, but they
are still untested.