| Sparse Directories Support in Subversion |
| (a.k.a. "sparse checkouts" / "incomplete directories") |
| |
| Contents |
| ======== |
| |
| 0. Goals |
| 1. Design |
| 2. User Interface |
| 3. Examples |
| 4. Implementation Strategy |
| 5. Compatibility Matters |
| 6. API Changes |
| 7. Work Remaining |
| |
| 0. Goals |
| ======== |
| |
| Many users have very large trees of which they only want to |
| checkout certain parts. In Subversion <= 1.4, 'checkout -N' is not |
| up to this task. |
| |
| Subversion 1.5 introduces the idea of "depth" (controlled via the |
| '--depth' and '--set-depth' options) as a replacement for mere |
| non-recursiveness (formerly controlled via the '-N' option). Depth |
| allows working copies to have exactly the contents the user wants, |
| leaving out everything else. |
| |
| 1. Design |
| ========= |
| |
| We have a new "depth" field in .svn/entries, which has (currently) |
| four possible values: depth-empty, depth-files, depth-immediates, |
| and depth-infinity. Only this_dir entries may have depths other |
| than depth-infinity. |
| |
| depth-empty ------> Updates will not pull in any files or |
| subdirectories not already present. |
| |
| depth-files ------> Updates will pull in any files not already |
| present, but not subdirectories. |
| |
| depth-immediates -> Updates will pull in any files or |
| subdirectories not already present; those |
| subdirectories' this_dir entries will |
| have depth-empty. |
| |
| depth-infinity ---> Updates will pull in any files or |
| subdirectories not already present; those |
| subdirectories' this_dir entries will |
| have depth-infinity. Equivalent to |
| today's default update behavior. |
| |
| The new '--depth' option limits how far an operation descends, and |
| the new '--set-depth' option changes the depth of a working copy tree. |
| |
| The new options are explained in more detail in 'Usage' below, but |
| these two concepts will aid understanding: |
| |
| "ambient depth" -----> The depth, or combination of depths, |
| of a given working copy. |
| |
| "requested depth" ---> The depth the user requested for a |
| particular operation (e.g., checkout, |
| update, switch). This is sometimes |
| called the "operational depth". |
| |
| When when running an operation in a working copy, the requested |
| depth never goes deeper than the ambient depth. For example, if |
| you run 'svn status --depth=infinity' in a working copy directory |
| that was checked out with '--depth=immediates', the status will go |
| as far as depth immediates. That is, it descends all the way |
| ("infinity") into what's available, but in this case what's |
| available is shallower than infinity. |
| |
| 2. User interface |
| ================= |
| |
| Quick start: |
| |
| Run checkout with --depth=empty or --depth=files. When you need |
| additional files or directories, pull them in with 'svn up NAME' |
| (passing --depth for directories as appropriate). |
| |
| Not-so-quick start: |
| |
| checkout without --depth or -N behaves the same as it does today, |
| which is the same as with --depth=infinity. |
| |
| checkout --depth=(empty|files|immediates) creates a working copy |
| that is (empty | has only files | has files and empty subdirs). |
| |
| Inside such a working copy, running 'svn up' by itself will update |
| only what is already present, but running 'svn up OMITTED_SUBDIR' |
| will cause OMITTED_SUBDIR to be brought in at depth-infinity, while |
| the rest of the parent working copy remains at its previous depth. |
| |
| The --depth option limits how far an operation recurses. The |
| operation will reach whatever is inside the intersection of the |
| ambient depth and the requested depth (see 'Design' section for |
| definitions). |
| |
| The --depth option never changes the depth of an existing dir. |
| Instead, use the new '--set-depth=NEW_DEPTH' option for that. |
| Right now, --set-depth can only extend (that is, make deeper) a |
| directory. (In the future, it will also be able to contract; see |
| issue #2843.) We disallow '--depth' and '--set-depth' together. |
| |
| The -N option has been deprecated, but still works: it simply maps |
| to one of --depth=files, --depth=empty, or --depth=immediates, |
| depending on context, for compatibility. For most commands, it's |
| --depth=files, but for status it's --depth=immediates, and for |
| revert and add it's --depth=empty, to be compatible with the |
| varying behaviors -N had across these commands. |
| |
| 'svn info' lists depth, iff invoked on a directory whose depth is |
| not the default (depth infinity). |
| |
| 3. Examples |
| =========== |
| |
| svn co http://.../A |
| |
| Same as today; everything has depth-infinity. |
| |
| svn co -N http://.../A |
| |
| Today, this creates wc containing only mu. Now, this will be |
| identical to 'svn co --depth=files /A'. |
| |
| svn co --depth=empty http://.../A Awc |
| |
| Creates wc Awc, but empty: no files, no subdirectories. |
| |
| Awc/.svn/entries this_dir depth-empty |
| |
| svn co --depth=files http://.../A Awc1 |
| |
| Creates wc Awc1 with all files (i.e., Awc1/mu) but no |
| subdirectories. |
| |
| Awc1/.svn/entries this_dir depth-files |
| ... |
| |
| svn co --depth=immediates http://.../A Awc2 |
| |
| Creates wc Awc2 with all files and all subdirectories, but |
| subdirectories are empty. |
| |
| Awc2/.svn/entries this_dir depth-immediates |
| B |
| C |
| Awc2/B/.svn/entries this_dir depth-empty |
| Awc2/C/.svn/entries this_dir depth-empty |
| ... |
| |
| svn up Awc/B: |
| |
| Since B is not yet checked out, add it at depth infinity. |
| |
| Awc/.svn/entries this_dir depth-empty |
| B |
| Awc/B/.svn/entries this_dir depth-infinity |
| ... |
| Awc/B/E/.svn/entries this_dir depth-infinity |
| ... |
| ... |
| |
| svn up Awc |
| |
| Since A is already checked out, don't change its depth, just |
| update it. B and everything under it is at depth-infinity, |
| so it will be updated just as today. |
| |
| svn up --depth=immediates Awc/D |
| |
| Since D is not yet checked out, add it at depth-immediates. |
| |
| Awc/.svn/entries this_dir depth-empty |
| B |
| D |
| Awc/D/.svn/entries this_dir depth-immediates |
| ... |
| Awc/D/G/.svn/entries this_dir depth-empty |
| ... |
| |
| svn up --depth=infinity Awc |
| |
| Update Awc at depth-empty, and Awc/B at depth-infinity, since |
| those are the ambient depths of those two directories already. |
| |
| Awc/.svn/entries this_dir depth-empty |
| ... |
| Awc/B/.svn/entries this_dir depth-infinity |
| ... |
| |
| svn up --set-depth=infinity Awc/D |
| |
| Pull everything into Awc/D, resulting in a subdirectory that is |
| just as if it had been pulled in with no --depth flag at all. |
| |
| Awc/.svn/entries this_dir depth-empty |
| B |
| D |
| Awc/D/.svn/entries this_dir depth-infinity |
| ... |
| ... |
| |
| svn up --set-depth=empty Awc/B/E |
| |
| Remove everything under E, but leave E as an empty directory |
| since B is depth-infinity. |
| |
| Awc/.svn/entries this_dir depth-infinity |
| B |
| D |
| Awc/B/.svn/entries this_dir depth-infinity |
| ... |
| Awc/B/E/.svn/entries this_dir depth-empty |
| ... |
| |
| svn up --set-depth=empty Awc/D |
| |
| Remove everything under D, and D itself since A is depth-empty. |
| |
| Awc/.svn/entries this_dir depth-empty |
| B |
| |
| svn up Awc/D |
| |
| Bring D back at depth-infinity. |
| |
| Awc/.svn/entries this_dir depth-empty |
| ... |
| Awc/D/.svn/entries this_dir depth-infinity |
| ... |
| ... |
| |
| svn up --set-depth=immediates Awc |
| |
| Bring in everything that's missing (C/ and mu) and empty all |
| subdirectories (and set their this_dir to depth-empty). |
| |
| Awc/.svn/entries this_dir depth-immediates |
| B |
| C |
| Awc/B/.svn/entries this_dir depth-empty |
| Awc/C/.svn/entries this_dir depth-empty |
| ... |
| |
| svn up --set-depth=files Awc |
| |
| Remove every subdirectories under Awc. but leave the files. |
| |
| Awc/.svn/entries this_dir depth-files |
| |
| |
| 4. Implementation Strategy |
| ========================== |
| |
| It would be nice if all this could be accomplished with just simple |
| tweaks to how we drive the update reporter (svn_ra_reporter2_t). |
| However, it's not that easy. |
| |
| Handling 'checkout --depth=empty' would be easy. It should get us |
| an empty directory at depth-empty, with no files and no subdirs, |
| and if we just report it as at HEAD every time, the server will |
| never send updates down (hmmm, this could be a problem for getting |
| dir property updates, though). Then any files or subdirs we have |
| explicitly included we can just report at their respective |
| revisions, and get proper updates; at least that'll work for the |
| depth infinity ones. |
| |
| But consider 'checkout --depth=immediates'. The desired state is a |
| depth-immediates directory D, with all files up-to-date, and with |
| skeleton subdirs at depth empty. Plain updates should preserve this |
| state of affairs. |
| |
| If we report D as at its BASE revision, files at their BASE |
| revisions, and subdirs at HEAD, then: |
| |
| - When new files appear in the repos, they'll get sent down (good) |
| - When new subdirs appear, they'll get sent down in full (bad) |
| |
| But if we don't report subdirs as at HEAD, then the server will try to |
| update them (bad). And if we report D at HEAD, then the working copy |
| won't receive new files that have appeared in the repository since D's |
| BASE revision (note that we *can* get updates for files we already |
| have, though, by continuing to report them at their respective BASEs). |
| |
| The same logic applies to subdirectories at depth-files or |
| depth-immediates. |
| |
| So, for efficient depth handling, the client directly reports the |
| desired depth to the server; i.e., we extend the RA protocol. |
| |
| Meanwhile, legacy servers will send back a bunch of information the |
| client doesn't want, and the client just ignores it, and the user |
| never knows except for the fact that everything seems slow (but |
| once their servers are upgraded, it'll speed up). |
| |
| 5. Compatibility Matters |
| ======================== |
| |
| This feature introduces two new concepts into the RA protocol which |
| will not be understood by older servers: |
| |
| * Reported Depths -- the depths associated with individual paths |
| included by the client in the description (via the |
| svn_ra_reporter_t) of its working copy state. |
| |
| * Requested Depth -- the single depth value used to limit the |
| scope of the server's response to the client. |
| |
| As such, it's useful to understand how these concepts will be |
| handled across the compatibility matrix of depth-aware and |
| non-depth-aware clients and servers. |
| |
| NOTE: in the sections below, it is not necessarily that case that a |
| value or state which is said to be "transmitted" literally has a |
| presence in the RA protocol. Some such bits of state have default |
| values in the protocol and can therefore be effectively transmitted |
| while not literally identifiable in a network trace of the |
| client-server traffic. |
| |
| Depth-aware Clients (DACs) |
| |
| DACs will transmit reported depths (with "infinity" as the |
| default) and will transmit a requested depth (with "unknown" as |
| the default). They will also -- for the sake of older, |
| non-depth-aware servers (NDASs) -- transmit a requested recurse |
| value derived from the requested depth: |
| |
| depth recurse |
| ----- ------- |
| empty no |
| files no |
| unknown yes |
| immediates yes |
| infinity yes |
| |
| When speaking to an NDAS, the requested recurse value is the |
| only thing the server understands , but is obviously more |
| "grainy" than the requested depth concept. The DAC, therefore, |
| must filter out any additional, unwanted data that the server |
| transmits in its response. (This filtering will happen in the |
| RA implementation itself so the RA APIs behave as expected |
| regardless of the server's pedigree.) |
| |
| When speaking to a depth-aware server (DAS), the requested |
| recurse value is ignored. A requested depth of "unknown" means |
| "only send information about the stuff in my report, |
| depth-aware-ily". Other requested depth values are honored by |
| the server properly, and the DAC must handle the transformation |
| of any working copy depths from their pre-update to their |
| post-update depths and content as described in `3. Examples'. |
| |
| Non-depth-aware Clients (NDACs) |
| |
| NDACs will never transmit reported depths and never transmit a |
| requested depth. But they will transmit a requested recurse |
| value (either "yes" or "no", with "yes" being the default). (A |
| DAS uses the presence of a requested depth in the actual protocol |
| to distinguish DACs from NDACs, and knows to ignore the |
| requested recurse value transmitted by a DAC.) |
| |
| When speaking to an NDAS, what happens happens. It's the past, |
| man -- you don't get to define the interaction this late in the |
| game! |
| |
| When speaking to a DAS, the not-reported depths are treated like |
| reported depths of "infinity", and the reported recurse values |
| "yes" and "no" map to depths of "infinity" and "files", |
| respectively. |
| |
| 6. API Changes |
| ============== |
| |
| A new enum type 'svn_depth_t depth' is defined in svn_types.h. |
| Both client and server side now understand the concept of depth, |
| and the basic update use cases handle depth. See depth_tests.py. |
| |
| On the client side, most of the svn_client.h interfaces that |
| formerly took 'svn_boolean_t recurse' have been revved and their |
| successors take 'svn_depth_t depth' instead. Each old API now |
| documents how it converts 'recurse' to 'depth'. |
| |
| Some of this recurse-becomes-depth change has propagated down into |
| libsvn_wc, which now stores a depth field in svn_wc_entry_t (and |
| therefore in .svn/entries). The update reporter knows to report |
| differing depths to the server, in the same way it already reports |
| differing revisions. In other words, take the concept of "mixed |
| revision" working copies and extend it to "mixed depth" working |
| copies. |
| |
| On the server side, most of the significant changes are in |
| libsvn_repos/reporter.c. The code that receives update reports now |
| receives notice of paths that have different depths from their |
| parent, and of course the overall update operation has a global |
| depth, which applies except when restricted by some shallower local |
| depth for a given path. |
| |
| The RA code on both sides knows how to send and receive depths; the |
| relevant svn_ra_* APIs now take depth arguments, which sometimes |
| supersede older 'recurse' booleans. In these cases, the RA layer |
| does the usual compatibility dance: receiving "recurse=FALSE" from |
| an older client causes the server to behave as if "depth=files" |
| had been transmitted. |
| |
| 7. Work Remaining |
| ================= |
| |
| * The list of outstanding issues is shown by this issue tracker query |
| (showing Summary fields that start with "[sparse-directories]"): |
| |
| <http://subversion.tigris.org/issues/buglist.cgi?component=subversion&issue_status=NEW&issue_status=STARTED&issue_status=REOPENED&short_desc=%5Bsparse-directories%5D&short_desc_type=casesubstring> |
| |
| * Update this doc (sections 1 & 6) to WC-NG. |
| |
| * Clarify whether an update should honour an incoming change that |
| adds a new node outside the ambient depth. For example, if a new |
| node 'A/foo' has been added in the repository, and directory 'A' is |
| depth-empty in the WC (and an 'A/foo' is not already present), |
| should the update add 'A/foo', or should it skip it on the basis |
| that it's outside the current ambient depth? If not, an update (or |
| merge) could delete an earlier node at the path 'A/foo' (that was |
| present in the WC despite being outside its parent's 'depth' |
| attribute) but could not then re-add a node of the same name in |
| order to perform both halves of an incoming replacement. Wouldn't |
| that be silly? |