|  |  | 
|  | "I have a cunning plan" | 
|  |  | 
|  | or | 
|  |  | 
|  | Entries Caching in the Access Batons | 
|  |  | 
|  |  | 
|  |  | 
|  | 0. Preamble | 
|  | -------- | 
|  |  | 
|  | Issue 749 provides some history.  The access batons now cache the | 
|  | parsed entries file, as repeatedly reading, parsing and writing the | 
|  | file proved to be a bottleneck. | 
|  |  | 
|  |  | 
|  | 1. Caching Interface | 
|  | ----------------- | 
|  |  | 
|  | The basic functions to retrieve entries are svn_wc_entries_read and | 
|  | svn_wc_entry.  The function svn_wc__entries_write is used to update | 
|  | the entries file on disk.  The function svn_wc__entry_modify is | 
|  | implemented in terms of entries_read and entries_write. | 
|  |  | 
|  | 1.1 Write Caching Overview | 
|  |  | 
|  | An overview of the update process. | 
|  |  | 
|  | 1. Lock the directory | 
|  | 2. Read the entries file and cache in memory | 
|  | 3. Start the wc update | 
|  | 3.1  Start a directory update | 
|  | 3.1.1 Start file update | 
|  | 3.1.1.1 Write a log file specific to this item | 
|  | 3.1.3 Finish file update | 
|  | 3.2. Finish directory update | 
|  | 3.3. Run log files | 
|  | 3.3.1. Log file commands modify entries in memory | 
|  | 3.4  Finish log files | 
|  | 3.5. Flush entries to disk | 
|  | 3.6. Remove log files | 
|  | 4. Finish update | 
|  | 5. Unlock directory | 
|  |  | 
|  | Each directory update may contain multiple file updates so when the | 
|  | directory update is complete there may be multiple log files.  While | 
|  | the log files are being run the entries modifications are cached in | 
|  | memory and written once when the log files are complete.  The reason | 
|  | for accumulating multiple log files is that flushing the entries to | 
|  | disk involves writing the entire entries file, if it were done after | 
|  | each file then the total amount of entries data written would grow | 
|  | exponentially during a checkout. | 
|  |  | 
|  |  | 
|  | 2. Interface Enhancements | 
|  | ---------------------- | 
|  |  | 
|  | 2.1 Entries Interface | 
|  |  | 
|  | A lot of the entries interface has remained unchanged since the | 
|  | pre-caching days, and it shows.  Of particular concern is the | 
|  | svn_wc_entries_read function, as this provides access to the raw data | 
|  | within the cache.  If the application carelessly modifies the data | 
|  | things may go wrong.  I would like to remove this function. | 
|  |  | 
|  | One use of svn_wc_entries_read is in svn_wc__entry_modify, this is | 
|  | "within the entries code" and so is not a problem. | 
|  |  | 
|  | Of the other uses of svn_wc_entries_read the most common is where the | 
|  | application wants to iterate over all the entries in a directory. I | 
|  | would like to see an interface something like | 
|  |  | 
|  | typedef struct svn_wc_entry_iterator_t svn_wc_entry_iterator_t; | 
|  |  | 
|  | svn_wc_entry_iterator_t * | 
|  | svn_wc_entry_first(svn_wc_adm_access_t *adm_access, | 
|  | apr_pool_t *pool); | 
|  |  | 
|  | svn_wc_entry_iterator_t * | 
|  | svn_wc_entry_next(svn_wc_entry_iterator_t *entry_iterator); | 
|  |  | 
|  | const svn_wc_entry_t * | 
|  | svn_wc_entry_iterator_entry(svn_wc_entry_iterator_t *entry_iterator); | 
|  |  | 
|  | Note that this provides only const access to the entries, the | 
|  | application cannot modify the cached data.  All modifications would go | 
|  | through svn_wc__entry_modify, and the access batons could keep track | 
|  | of whether modifications have been made and not yet written to disk. | 
|  |  | 
|  | The other uses of svn_wc_entries_read tend to extract a single entry. | 
|  | I hope these can be converted to use svn_wc_entry.  One slight problem | 
|  | is the use of svn_wc_entries_read to intentionally extract a | 
|  | directory's entry from its parent.  This is done because that's where | 
|  | the "deleted" state is stored.  I think the entry returned by | 
|  | svn_wc_entry could contain this state.  Why doesn't it?  I don't know, | 
|  | possibly it's an accident, or possibly it's intentional as in the past | 
|  | parsing two entries files would have been expensive. | 
|  |  | 
|  | 2.2 Access Baton Interface | 
|  |  | 
|  | I would also like to modify the access baton interface.  At present | 
|  | the open function detects and skips missing directories when opening a | 
|  | directory hierarchy.  I would like to record this information in the | 
|  | access baton set, and modify the retrieve functions to include an | 
|  | svn_boolean_t* parameter that gets set TRUE when a request for a | 
|  | missing directory is made.  The advantage of doing this is that the | 
|  | application could avoid making svn_io_check_path and svn_wc_check_wc | 
|  | calls when the access baton already has the information.  The function | 
|  | prop_path_internal looks like a good candidate for this optimisation. | 
|  |  | 
|  |  | 
|  | 3. Access Baton Sets | 
|  | ----------------- | 
|  |  | 
|  | Each access baton represents a directory.  Access batons can associate | 
|  | together in sets.  Given an access baton in a set, it possible to | 
|  | retrieve any other access baton in the set.  When an access baton in a | 
|  | set is closed, all other access batons in the set that represent | 
|  | subdirectories are also closed.  The set is implemented as a hash | 
|  | table "owned" by the one baton in any set, but shared by all batons in | 
|  | the set. | 
|  |  | 
|  | At present in the code, access batons are opened in a parent->child | 
|  | order.  This works well with the shared hash being owned by the first | 
|  | baton in each set.  There is code to detect if closing a baton will | 
|  | destroy the hash while other batons are using it, as far as I know it | 
|  | doesn't currently trigger.  If it turns out that this needs to be | 
|  | supported it should be possible to transfer the hash information to | 
|  | another baton. | 
|  |  | 
|  |  | 
|  | 4. Access Baton Conversion | 
|  | ----------------------- | 
|  |  | 
|  | Given a function | 
|  | svn_error_t *foo (const char *path); | 
|  | if PATH is always a directory then the change that gets made is usually | 
|  | svn_error_t *foo (svn_wc_adm_access_t *adm_access); | 
|  | Within foo, the original const char* can be obtained using | 
|  | const char *svn_wc_adm_access_path(svn_wc_adm_access_t *adm_access); | 
|  |  | 
|  | The above case sometimes occurs as | 
|  | svn_error_t *foo(const char *name, const char *dir); | 
|  | where NAME is a single path component, and DIR is a directory. Conversion | 
|  | is again simply in this case | 
|  | svn_error_t *foo (const char *name, svn_wc_adm_access_t *adm_access); | 
|  |  | 
|  | The more difficult case is | 
|  | svn_error_t *foo (const char *path); | 
|  | where PATH can be a file or a directory.  This occurs a lot in the | 
|  | current code. In the long term these may get converted to | 
|  | svn_error_t *foo (const char *name, svn_wc_adm_access_t *adm_access); | 
|  | where NAME is a single path component.  However this involves more | 
|  | changes to the code calling foo than are strictly necessary, so | 
|  | initially they get converted to | 
|  | svn_error_t *foo (const char *path, svn_wc_adm_access_t *adm_access); | 
|  | where PATH is passed unchanged and an additional access baton is | 
|  | passed.  This interface is less than ideal, since there is duplicate | 
|  | information in the path and baton, but since it involves fewer changes | 
|  | in the calling code it makes a reasonable intermediate step. | 
|  |  | 
|  |  | 
|  | 5. Logging | 
|  | ------- | 
|  |  | 
|  | As well as caching the other problem that needs to be addressed is the | 
|  | issue of logging.  Modifications to the working copy are supposed to | 
|  | use the log file mechanism to ensure that multiple changes that need | 
|  | to be atomic cannot be partially completed.  If the individual changes | 
|  | that may need to be logged are all forced to use an access baton, then | 
|  | the access baton may be able to identify when the log file mechanism | 
|  | should be used.  Combine this with an access baton state that tracks | 
|  | whether a log file is being run and we may be able to automatically | 
|  | identify those places that are failing to use the log file mechanism. | 
|  |  | 
|  |  | 
|  | 6. Status | 
|  | ------ | 
|  |  | 
|  | Entries caching has been implemented. | 
|  |  | 
|  | The interface changes (section 2) have not been started. | 
|  |  | 
|  | The access baton conversion is complete in so far as passing batons is | 
|  | concerned.  The path->name signature changes (section 4) have not been | 
|  | made. | 
|  |  | 
|  | Automatic detection of failure to use a log file (section 5) has not | 
|  | been started. |