| |
| "I have a cunning plan" |
| |
| or |
| |
| Entries Caching in the Access Batons |
| |
| |
| |
| 0. Preamble |
| -------- |
| |
| Issue 749 provides some history. The access batons now cache the |
| parsed entries file, as repeatedly reading, parsing and writing the |
| file proved to be a bottleneck. |
| |
| |
| 1. Caching Interface |
| ----------------- |
| |
| The basic functions to retrieve entries are svn_wc_entries_read and |
| svn_wc_entry. The function svn_wc__entries_write is used to update |
| the entries file on disk. The function svn_wc__entry_modify is |
| implemented in terms of entries_read and entries_write. |
| |
| 1.1 Write Caching Overview |
| |
| An overview of the update process. |
| |
| 1. Lock the directory |
| 2. Read the entries file and cache in memory |
| 3. Start the wc update |
| 3.1 Start a directory update |
| 3.1.1 Start file update |
| 3.1.1.1 Write a log file specific to this item |
| 3.1.3 Finish file update |
| 3.2. Finish directory update |
| 3.3. Run log files |
| 3.3.1. Log file commands modify entries in memory |
| 3.4 Finish log files |
| 3.5. Flush entries to disk |
| 3.6. Remove log files |
| 4. Finish update |
| 5. Unlock directory |
| |
| Each directory update may contain multiple file updates so when the |
| directory update is complete there may be multiple log files. While |
| the log files are being run the entries modifications are cached in |
| memory and written once when the log files are complete. The reason |
| for accumulating multiple log files is that flushing the entries to |
| disk involves writing the entire entries file, if it were done after |
| each file then the total amount of entries data written would grow |
| exponentially during a checkout. |
| |
| |
| 2. Interface Enhancements |
| ---------------------- |
| |
| 2.1 Entries Interface |
| |
| A lot of the entries interface has remained unchanged since the |
| pre-caching days, and it shows. Of particular concern is the |
| svn_wc_entries_read function, as this provides access to the raw data |
| within the cache. If the application carelessly modifies the data |
| things may go wrong. I would like to remove this function. |
| |
| One use of svn_wc_entries_read is in svn_wc__entry_modify, this is |
| "within the entries code" and so is not a problem. |
| |
| Of the other uses of svn_wc_entries_read the most common is where the |
| application wants to iterate over all the entries in a directory. I |
| would like to see an interface something like |
| |
| typedef struct svn_wc_entry_iterator_t svn_wc_entry_iterator_t; |
| |
| svn_wc_entry_iterator_t * |
| svn_wc_entry_first(svn_wc_adm_access_t *adm_access, |
| apr_pool_t *pool); |
| |
| svn_wc_entry_iterator_t * |
| svn_wc_entry_next(svn_wc_entry_iterator_t *entry_iterator); |
| |
| const svn_wc_entry_t * |
| svn_wc_entry_iterator_entry(svn_wc_entry_iterator_t *entry_iterator); |
| |
| Note that this provides only const access to the entries, the |
| application cannot modify the cached data. All modifications would go |
| through svn_wc__entry_modify, and the access batons could keep track |
| of whether modifications have been made and not yet written to disk. |
| |
| The other uses of svn_wc_entries_read tend to extract a single entry. |
| I hope these can be converted to use svn_wc_entry. One slight problem |
| is the use of svn_wc_entries_read to intentionally extract a |
| directory's entry from its parent. This is done because that's where |
| the "deleted" state is stored. I think the entry returned by |
| svn_wc_entry could contain this state. Why doesn't it? I don't know, |
| possibly it's an accident, or possibly it's intentional as in the past |
| parsing two entries files would have been expensive. |
| |
| 2.2 Access Baton Interface |
| |
| I would also like to modify the access baton interface. At present |
| the open function detects and skips missing directories when opening a |
| directory hierarchy. I would like to record this information in the |
| access baton set, and modify the retrieve functions to include an |
| svn_boolean_t* parameter that gets set TRUE when a request for a |
| missing directory is made. The advantage of doing this is that the |
| application could avoid making svn_io_check_path and svn_wc_check_wc |
| calls when the access baton already has the information. The function |
| prop_path_internal looks like a good candidate for this optimisation. |
| |
| |
| 3. Access Baton Sets |
| ----------------- |
| |
| Each access baton represents a directory. Access batons can associate |
| together in sets. Given an access baton in a set, it possible to |
| retrieve any other access baton in the set. When an access baton in a |
| set is closed, all other access batons in the set that represent |
| subdirectories are also closed. The set is implemented as a hash |
| table "owned" by the one baton in any set, but shared by all batons in |
| the set. |
| |
| At present in the code, access batons are opened in a parent->child |
| order. This works well with the shared hash being owned by the first |
| baton in each set. There is code to detect if closing a baton will |
| destroy the hash while other batons are using it, as far as I know it |
| doesn't currently trigger. If it turns out that this needs to be |
| supported it should be possible to transfer the hash information to |
| another baton. |
| |
| |
| 4. Access Baton Conversion |
| ----------------------- |
| |
| Given a function |
| svn_error_t *foo (const char *path); |
| if PATH is always a directory then the change that gets made is usually |
| svn_error_t *foo (svn_wc_adm_access_t *adm_access); |
| Within foo, the original const char* can be obtained using |
| const char *svn_wc_adm_access_path(svn_wc_adm_access_t *adm_access); |
| |
| The above case sometimes occurs as |
| svn_error_t *foo(const char *name, const char *dir); |
| where NAME is a single path component, and DIR is a directory. Conversion |
| is again simply in this case |
| svn_error_t *foo (const char *name, svn_wc_adm_access_t *adm_access); |
| |
| The more difficult case is |
| svn_error_t *foo (const char *path); |
| where PATH can be a file or a directory. This occurs a lot in the |
| current code. In the long term these may get converted to |
| svn_error_t *foo (const char *name, svn_wc_adm_access_t *adm_access); |
| where NAME is a single path component. However this involves more |
| changes to the code calling foo than are strictly necessary, so |
| initially they get converted to |
| svn_error_t *foo (const char *path, svn_wc_adm_access_t *adm_access); |
| where PATH is passed unchanged and an additional access baton is |
| passed. This interface is less than ideal, since there is duplicate |
| information in the path and baton, but since it involves fewer changes |
| in the calling code it makes a reasonable intermediate step. |
| |
| |
| 5. Logging |
| ------- |
| |
| As well as caching the other problem that needs to be addressed is the |
| issue of logging. Modifications to the working copy are supposed to |
| use the log file mechanism to ensure that multiple changes that need |
| to be atomic cannot be partially completed. If the individual changes |
| that may need to be logged are all forced to use an access baton, then |
| the access baton may be able to identify when the log file mechanism |
| should be used. Combine this with an access baton state that tracks |
| whether a log file is being run and we may be able to automatically |
| identify those places that are failing to use the log file mechanism. |
| |
| |
| 6. Status |
| ------ |
| |
| Entries caching has been implemented. |
| |
| The interface changes (section 2) have not been started. |
| |
| The access baton conversion is complete in so far as passing batons is |
| concerned. The path->name signature changes (section 4) have not been |
| made. |
| |
| Automatic detection of failure to use a log file (section 5) has not |
| been started. |