subversion/libsvn_fs_x/TODO - subversion - Git at Google


 TODO (see also DONE section below)
 ==================================

 Internal API cleanup
 --------------------

 During refactoring, some functions had to be declared in header files
 to make them available to other fsfs code.  We need to revisit those
 function definitions to turn them into a proper API that may be useful
 to other code (such as fsfs tools).


 Checksum all metadata elements
 ------------------------------

 All elements of an FS-X repository shall be guarded by checksums. That
 includes indexes, noderevs etc.  Larger data structures, such as index
 files, should have checksummed sub-elements such that corrupted parts
 may be identified and potentially repaired / circumvented in a meaningful
 way.

 Those checksums may be quite simple such as Adler32 because that meta-
 data can be cross-verified with other parts as well and acts only as a
 fallback to narrow down the affected parts.

 'svnadmin verify' shall check consistency based on those checksums.


 Port existing FSFS tools
 ------------------------

 fsfs-stats, fsfsverify.py and possibly others should have equivalents
 in the FS-X world.


 Optimize data ordering during pack
 ----------------------------------

 I/O optimized copy algorithms are yet to be implemented.  The current
 code is relatively slow as it performs quasi-random I/O on the
 input stream.


 TxDelta v2
 ----------

 Version 1 of txdelta turns out to be limited in its effectiveness for
 larger files when data gets inserted or removed.  For typical office
 documents (zip files), deltification often becomes ineffective.

 Version 2 shall introduce the following changes:

 - increase the delta window from 100kB to 1MB
 - use a sliding window instead of a fixed-sized one
 - use a slightly more efficient instruction encoding

 When introducing it,  we will make it an option at the txdelta interfaces
 (e.g. a format number).  The version will be indicated in the 'SVN\x1' /
 'SVN\x2' stream header.  While at it, (try to) fix the layering violations
 where those prefixes are being read or written.


 Large file storage
 ------------------

 Even most source code repositories contain large, hard to compress,
 hard to deltify binaries.  Reconstructing their content becomes very I/O
 intense and it "dilutes" the data in our pack files.  The latter makes
 e.g. caching, prefetching and packing less efficient.

 Once a representation exceeds a certain configured threshold (16M default),
 the fulltext of that item will be stored in a separate file.  This will
 be marked in the representation_t by an extra flag and future reps will
 not be deltified against it.  From that location, the data can be forwarded
 directly via SendFile and the fulltext caches will not be used for it.

 Note that by making the decision contingent upon the size of the deltified
 and packed representation,  all large data that benefit from these (i.e.
 have smaller increments) will still be stored within the rev and pack files.
 If a future representation is smaller than the threshold, it may be

 /* danielsh: so if we have a file which is 20MB over many revisions, it'll
 be stored in fulltext every single time unless the configured threshold is
 changed?  Wondering if that's the best solution... */


 Sorted binary directory representations
 ---------------------------------------

 Lookup of entries in a directory is a frequent operation when following
 cached paths.  The represents directories as arrays sorted by entry name
 to allow for binary search during that lookup.  However, all external
 representation uses hashes and the conversion is expensive.

 FS-X shall store directory representations sorted by element names and
 all use that array representation internally wherever appropriate.  This
 will minimize the conversion overhead for long directories, especially
 during transaction building.

 Moreover, switch from the key/value representation to a slightly tighter
 and easier to process binary representation (validity is already guaranteed
 by checksums).


 Star-Deltification
 ------------------

 Current implementation is incomplete. TODO: actually support & use base
 representations, optimize instruction table.

 Combine this with Txdelta 2 such that the corresponding windows from
 all representations get stored in a common star-delta container.


 Multiple pack stages
 --------------------

 FSFS only knows one packing level - the shard.  For repositories with
 a large number of revisions, it may be more efficient to start with small
 packs (10-ish) and later pack them into larger and larger ones.


 Open less files when opening a repository
 -----------------------------------------

 Opening a repository reads numerous files in db/ (besides several more in
 ../conf): uuid, current, format, fs-type, fsfs.conf, min-unpacked-rev, ...

 Combine most of them into one or two files (eg uuid|format(|fs-type?),
 current|min-unpacked-revprop).


 Sharded transaction directories
 -------------------------------

 Transaction directories contain 3 OS files per FS file modified in the
 transaction.  That doesn't scale well; find something better.


 DONE
 ====

 Turn into separate FS
 ---------------------

 Make FS-X a separate file system alongside BDB and FSFS.  Rip out all
 FSFS compatibility code.


 Logical addressing
 ------------------

 To allow for moving data structures around within the repository, we must
 replace the current absolute addressing using file offsets with a logical
 one.  All references will no take the form of (revision, index) pairs and
 a replacement to the format 6 manifest files will map that to actual file
 offsets.

 Having the need to map revision-local offsets to pack-file global offsets
 today already gives us some localized address mapping code that simply
 needs to be replaced.


 Optimize data ordering during pack
 ----------------------------------

 Replace today's simple concatenating shard packing process with a one
 placing fragments (representations and noderevs) from various revisions
 close to each other if they are likely needed to serve in the same request.

 We will optimize on a per-shard basis.  The general strategy is

 * place all change lists at the beginning of the pack file
   - strict revision order
   - place newest ones first
 * place all file properties reps next
   - place newer reps first
 * place all directory properties next
   - place newer reps first
 * place all root nodes and root directories
   - ordered newest rev -> oldest rev
   - place rep delta chains 'en block'
   - place root node in front of rep, if that rep has not already
     been placed as part of a rep delta chain
 * place remaining content as follows:
   - place node rev directly in front of their reps (where they have one)
   - start with the latest root directory not placed, yet
   - recurse to sub-folders first with, sorted by name
   - per folder, place files in naming order
   - place rep deltification chains in deltification order (new->old)
 * no fragments should be left but if they are, put them at the end


 Index pack files
 ----------------

 In addition to the manifest we need for the (revision, index) -> offset
 mapping, we also introduce an offset -> (revision, index, type) index
 file.  This will allow us to parse any data in a pack file without walking
 the DAG top down.


 Data prefetch
 -------------

 This builds on the previous.  The idea is that whenever a cache lookup
 fails,  we will not just read the single missing fragment but parse all
 data within the APR file buffer and put that into the cache.

 For maximum efficiency,  we will align the data blocks being read to
 multiples of the block size and allow that buffer size to be configured
 (where supported by APR).  The default block size will be raised to 64kB.


 Extend 'svnadmin verify'
 ------------------------

 Format 7 provides many extra chances to verify contents plus contains
 extra indexes that must be consistent with the pack / rev files.  We
 must extend the tests to cover all that.


 Containers
 ----------

 Extend the index format support containers, i.e. map a logical item index
 to (file offset, sub-index) pairs.  The whole container will be read and
 cached and the specific item later accessed from the whole structure.

 Use these containers for reps, noderevs and changes.  Provide specific
 data container types for each of these item types and different item
 types cannot be put into the same container.  Containers are binaries,
 i.e. there is no textual representations of their contents.

 This allows for significant space savings on disk due to deltification
 amongst e.g. revprops.  More importantly, it reduces the size of the
 runtime data structures within the cache *and* reduces the number of
 cache entries (the cache is can't handle items < 500 bytes very well).


 Packed change lists
 -------------------

 Change lists tend to be large, in some cases >20% of the repo.  Due to the
 new ordering of pack data,  the change lists can be the largest part of
 data to read for svn log.  Use our standard compression method to save
 70 .. 80% of the disk space.

 Packing will only be applied to binary representations of change lists
 to keep the number of possible combinations low.


 Star-Deltification
 ------------------

 Most node contents are smaller than 500k, i.e. less than Txdelta 2 window.
 Those contents shall be aggregated into star-delta containers upon pack.
 This will save significant amounts of disk space, particularly in case
 of heavy branching.  Also, the data extraction is independent of the
 number of deltas, i.e. delta chain length) within the same container.


 Support for arbitrary chars in path names
 -----------------------------------------

 FSFS's textual item representations breaks when path names contain
 newlines.  FS-X revisions shall escape all control chars (e.g. < 0x20)
 in path names when using them in textual item representations.

	TODO (see also DONE section below)
	==================================

	Internal API cleanup
	--------------------

	During refactoring, some functions had to be declared in header files
	to make them available to other fsfs code. We need to revisit those
	function definitions to turn them into a proper API that may be useful
	to other code (such as fsfs tools).


	Checksum all metadata elements
	------------------------------

	All elements of an FS-X repository shall be guarded by checksums. That
	includes indexes, noderevs etc. Larger data structures, such as index
	files, should have checksummed sub-elements such that corrupted parts
	may be identified and potentially repaired / circumvented in a meaningful
	way.

	Those checksums may be quite simple such as Adler32 because that meta-
	data can be cross-verified with other parts as well and acts only as a
	fallback to narrow down the affected parts.

	'svnadmin verify' shall check consistency based on those checksums.


	Port existing FSFS tools
	------------------------

	fsfs-stats, fsfsverify.py and possibly others should have equivalents
	in the FS-X world.


	Optimize data ordering during pack
	----------------------------------

	I/O optimized copy algorithms are yet to be implemented. The current
	code is relatively slow as it performs quasi-random I/O on the
	input stream.


	TxDelta v2
	----------

	Version 1 of txdelta turns out to be limited in its effectiveness for
	larger files when data gets inserted or removed. For typical office
	documents (zip files), deltification often becomes ineffective.

	Version 2 shall introduce the following changes:

	- increase the delta window from 100kB to 1MB
	- use a sliding window instead of a fixed-sized one
	- use a slightly more efficient instruction encoding

	When introducing it, we will make it an option at the txdelta interfaces
	(e.g. a format number). The version will be indicated in the 'SVN\x1' /
	'SVN\x2' stream header. While at it, (try to) fix the layering violations
	where those prefixes are being read or written.


	Large file storage
	------------------

	Even most source code repositories contain large, hard to compress,
	hard to deltify binaries. Reconstructing their content becomes very I/O
	intense and it "dilutes" the data in our pack files. The latter makes
	e.g. caching, prefetching and packing less efficient.

	Once a representation exceeds a certain configured threshold (16M default),
	the fulltext of that item will be stored in a separate file. This will
	be marked in the representation_t by an extra flag and future reps will
	not be deltified against it. From that location, the data can be forwarded
	directly via SendFile and the fulltext caches will not be used for it.

	Note that by making the decision contingent upon the size of the deltified
	and packed representation, all large data that benefit from these (i.e.
	have smaller increments) will still be stored within the rev and pack files.
	If a future representation is smaller than the threshold, it may be

	/* danielsh: so if we have a file which is 20MB over many revisions, it'll
	be stored in fulltext every single time unless the configured threshold is
	changed? Wondering if that's the best solution... */


	Sorted binary directory representations
	---------------------------------------

	Lookup of entries in a directory is a frequent operation when following
	cached paths. The represents directories as arrays sorted by entry name
	to allow for binary search during that lookup. However, all external
	representation uses hashes and the conversion is expensive.

	FS-X shall store directory representations sorted by element names and
	all use that array representation internally wherever appropriate. This
	will minimize the conversion overhead for long directories, especially
	during transaction building.

	Moreover, switch from the key/value representation to a slightly tighter
	and easier to process binary representation (validity is already guaranteed
	by checksums).


	Star-Deltification
	------------------

	Current implementation is incomplete. TODO: actually support & use base
	representations, optimize instruction table.

	Combine this with Txdelta 2 such that the corresponding windows from
	all representations get stored in a common star-delta container.


	Multiple pack stages
	--------------------

	FSFS only knows one packing level - the shard. For repositories with
	a large number of revisions, it may be more efficient to start with small
	packs (10-ish) and later pack them into larger and larger ones.


	Open less files when opening a repository
	-----------------------------------------

	Opening a repository reads numerous files in db/ (besides several more in
	../conf): uuid, current, format, fs-type, fsfs.conf, min-unpacked-rev, ...

	Combine most of them into one or two files (eg uuid\|format(\|fs-type?),
	current\|min-unpacked-revprop).


	Sharded transaction directories
	-------------------------------

	Transaction directories contain 3 OS files per FS file modified in the
	transaction. That doesn't scale well; find something better.


	DONE
	====

	Turn into separate FS
	---------------------

	Make FS-X a separate file system alongside BDB and FSFS. Rip out all
	FSFS compatibility code.


	Logical addressing
	------------------

	To allow for moving data structures around within the repository, we must
	replace the current absolute addressing using file offsets with a logical
	one. All references will no take the form of (revision, index) pairs and
	a replacement to the format 6 manifest files will map that to actual file
	offsets.

	Having the need to map revision-local offsets to pack-file global offsets
	today already gives us some localized address mapping code that simply
	needs to be replaced.


	Optimize data ordering during pack
	----------------------------------

	Replace today's simple concatenating shard packing process with a one
	placing fragments (representations and noderevs) from various revisions
	close to each other if they are likely needed to serve in the same request.

	We will optimize on a per-shard basis. The general strategy is

	* place all change lists at the beginning of the pack file
	- strict revision order
	- place newest ones first
	* place all file properties reps next
	- place newer reps first
	* place all directory properties next
	- place newer reps first
	* place all root nodes and root directories
	- ordered newest rev -> oldest rev
	- place rep delta chains 'en block'
	- place root node in front of rep, if that rep has not already
	been placed as part of a rep delta chain
	* place remaining content as follows:
	- place node rev directly in front of their reps (where they have one)
	- start with the latest root directory not placed, yet
	- recurse to sub-folders first with, sorted by name
	- per folder, place files in naming order
	- place rep deltification chains in deltification order (new->old)
	* no fragments should be left but if they are, put them at the end


	Index pack files
	----------------

	In addition to the manifest we need for the (revision, index) -> offset
	mapping, we also introduce an offset -> (revision, index, type) index
	file. This will allow us to parse any data in a pack file without walking
	the DAG top down.


	Data prefetch
	-------------

	This builds on the previous. The idea is that whenever a cache lookup
	fails, we will not just read the single missing fragment but parse all
	data within the APR file buffer and put that into the cache.

	For maximum efficiency, we will align the data blocks being read to
	multiples of the block size and allow that buffer size to be configured
	(where supported by APR). The default block size will be raised to 64kB.


	Extend 'svnadmin verify'
	------------------------

	Format 7 provides many extra chances to verify contents plus contains
	extra indexes that must be consistent with the pack / rev files. We
	must extend the tests to cover all that.


	Containers
	----------

	Extend the index format support containers, i.e. map a logical item index
	to (file offset, sub-index) pairs. The whole container will be read and
	cached and the specific item later accessed from the whole structure.

	Use these containers for reps, noderevs and changes. Provide specific
	data container types for each of these item types and different item
	types cannot be put into the same container. Containers are binaries,
	i.e. there is no textual representations of their contents.

	This allows for significant space savings on disk due to deltification
	amongst e.g. revprops. More importantly, it reduces the size of the
	runtime data structures within the cache and reduces the number of
	cache entries (the cache is can't handle items < 500 bytes very well).


	Packed change lists
	-------------------

	Change lists tend to be large, in some cases >20% of the repo. Due to the
	new ordering of pack data, the change lists can be the largest part of
	data to read for svn log. Use our standard compression method to save
	70 .. 80% of the disk space.

	Packing will only be applied to binary representations of change lists
	to keep the number of possible combinations low.


	Star-Deltification
	------------------

	Most node contents are smaller than 500k, i.e. less than Txdelta 2 window.
	Those contents shall be aggregated into star-delta containers upon pack.
	This will save significant amounts of disk space, particularly in case
	of heavy branching. Also, the data extraction is independent of the
	number of deltas, i.e. delta chain length) within the same container.


	Support for arbitrary chars in path names
	-----------------------------------------

	FSFS's textual item representations breaks when path names contain
	newlines. FS-X revisions shall escape all control chars (e.g. < 0x20)
	in path names when using them in textual item representations.