|  | Oh Most Noble and Fragrant Emacs, please be in -*- outline -*- mode! | 
|  |  | 
|  | Newline Conversion and Keyword Substitution | 
|  | =========================================== | 
|  |  | 
|  | * Newline conversion | 
|  |  | 
|  | We've finally settled on a proposal articulated by Greg Hudson, | 
|  | derived from Ben's original proposal plus much list discussion. | 
|  |  | 
|  | Here's Greg's mail with the proposal, then a few clarifying selections | 
|  | from the followup discussion: | 
|  |  | 
|  | Alright, I'll make a proposal which is like yours but (in my | 
|  | opinion) a little clearer.  First, let's look at the different use | 
|  | cases: | 
|  |  | 
|  | 1. The most common case--text files which want native line endings. | 
|  | These should be stored in the repository using LF line endings, | 
|  | and in the working dir using native line endings. | 
|  |  | 
|  | 2. Binary files.  These files we don't want to touch at all. | 
|  |  | 
|  | 3. Text files which, for one reason or another, want a specific | 
|  | line ending format regardless of platform.  These should be | 
|  | stored in the repository and in the working directory using the | 
|  | specified line ending.  We probably don't have to worry so much | 
|  | about data safety for these files since a particular, odd | 
|  | behavior has been specified for them. | 
|  |  | 
|  | There are, of course, a hundred different ways we could arrange the | 
|  | metadata.  I propose an "svn:newline-style" property with the | 
|  | possible values "none", "native", "LF", "CR", and "CRLF".  The | 
|  | values mean: | 
|  |  | 
|  | none: Use case 2.  don't do any newline translation | 
|  |  | 
|  | native: Use case 1.  Store with LF in repository, and with native | 
|  | line endings in the working copy. | 
|  |  | 
|  | LF, CR, CRLF: Use case 3.  Store with specified format in the | 
|  | repository and in the working copy. | 
|  |  | 
|  | On commit, we apply the following rules to transform the data | 
|  | committed to the server: | 
|  |  | 
|  | If newline-style is none, do nothing. | 
|  |  | 
|  | If newline-stle is native, translate <native newline style> -> LF. | 
|  | If we notice any CRs or LFs which aren't part of a native-style | 
|  | newline, abort the commit. | 
|  |  | 
|  | If newline-style is LF, CR, or CRLF, translate <native newline | 
|  | style> -> <requested newline style>.  If we notice any CRs or LFs | 
|  | which aren't part of a native-style newline and aren't part of a | 
|  | requested-style newline, abort the commit.  If the commit succeeds, | 
|  | apply the <native newline style> -> <requested newline style> | 
|  | translation to the working copy as well, so that it matches what we | 
|  | would get from a checkout of the new rev. | 
|  |  | 
|  | On checkout, we translate LF -> <native newline style> if | 
|  | newline-style is native; otherwise, we leave the file alone. | 
|  |  | 
|  | For now, let's say the default value of svn:newline-style is none. | 
|  | In the future, we'll want to think about things like how to enable | 
|  | newline-translation over the whole repository except for files | 
|  | which don't appear to be text. | 
|  |  | 
|  | I think that's a complete proposal.  Some possible variations: | 
|  |  | 
|  | Variation 1: If newline-style is native, on commit, translate | 
|  | <first newline style seen> -> LF.  If we see any CRs or LFs which | 
|  | don't match the first newline style seen, abort the commit. | 
|  |  | 
|  | Variation 2: If newline-style is native, before commit, examine the | 
|  | file to see if it uses only the native newline style.  If it | 
|  | doesn't, set the newline-style property to "none" and commit with | 
|  | no translation. | 
|  |  | 
|  | Variation 3: Combine variations 1 and 2; if newline-style is | 
|  | native, then if before commit, examine the file to see if it uses a | 
|  | single consistent newline style.  If it does, translate <that | 
|  | newline style> -> LF; if not, commit with newline-style set to | 
|  | "none" and no translation. | 
|  |  | 
|  | Variation 4: If newline-style is native, then on commit, we edit a | 
|  | property "svn:newline-conversion" to something like "CRLF LF" to | 
|  | show what conversion we did.  This enables mechanical reversal of | 
|  | the translation if the file is later determined to be binary. | 
|  | (Particularly useful with variations 1 or 3 where the transform | 
|  | might not be obvious from the platform where the file was checked | 
|  | in.) | 
|  |  | 
|  | We decided to hold off on doing any of the "variations".  We'll just | 
|  | do the basic proposal first, then see how well things work out. | 
|  |  | 
|  | Next, Greg responded to an observation by William Uther, in which | 
|  | William pointed out that the behavior is not reversible when | 
|  | svn:newline-style is LF or CRLF.  Greg agrees, but explains why that's | 
|  | okay: | 
|  |  | 
|  | On Fri, 2001-12-14 at 15:48, William Uther wrote: | 
|  | > --On Friday, 14 December 2001 1:16 PM -0500 Greg Hudson | 
|  | > <ghudson@MIT.EDU> wrote: | 
|  |  | 
|  | > >   If newline-style is LF, CR, or CRLF, translate <native | 
|  | > > newline style> -> <requested newline style>.  If we notice any | 
|  | > > CRs or LFs which aren't part of a native-style newline and | 
|  | > > aren't part of a requested-style newline, abort the commit.  If | 
|  | > > the commit succeeds, apply the <native newline style> -> | 
|  | > > <requested newline style> translation to the working copy as | 
|  | > > well, so that it matches what we would get from a checkout of | 
|  | > > the new rev. | 
|  | > | 
|  | > I don't think this preserves reversability.  If a file contains | 
|  | > BOTH <native-style newline> and <requested-style newline> then | 
|  | > you neet to abort.  If you translate just <native-style newline> | 
|  | > then you can't undo the transformation - you don't know which | 
|  | > newlines need to be untransformed. | 
|  |  | 
|  | This particular transform (for files marked CRLF, CR, or LF) is not | 
|  | reversible.  See where I said: | 
|  |  | 
|  | "We probably don't have to worry so much about data safety for | 
|  | these files since a particular, odd behavior has been specified | 
|  | for them." | 
|  |  | 
|  | However, let's add a possible variation to my proposal, for those | 
|  | who are still uncomfortable with data-destroying transformations | 
|  | applied to such flies: | 
|  |  | 
|  | Variation 5: If the file is marked CRLF, CR, or LF, we translate | 
|  | <native-style newline> to <requested-style newline> during commit, | 
|  | and abort the commit if we notice any kind of mixing of newline | 
|  | styles.  (Can also combine with variation 1.) | 
|  |  | 
|  | Colin Putney also followed up to William Uther's post, agreeing with | 
|  | Greg and explaining why in even more detail: | 
|  |  | 
|  | > I don't think this preserves reversability.  If a file contains | 
|  | > BOTH <native-style newline> and <requested-style newline> then | 
|  | > you neet to abort.  If you translate just <native-style newline> | 
|  | > then you can't undo the transformation - you don't know which | 
|  | > newlines need to be untransformed. | 
|  |  | 
|  | > Stated simply: You should only translate when the newline style | 
|  | > is entirely consistent.  Anything else removes the inconsistency | 
|  | > and hence loses information. | 
|  |  | 
|  | True, this scheme doesn't preserve reversibility. But in this case | 
|  | that's OK, because the newline-style decrees what the newline style | 
|  | must be. If there are native-style newlines mixed in with the | 
|  | requested-style newlines, this is probably the result of corruption | 
|  | by some native-newline-obsessive user tool. So the non-reversible | 
|  | transform will actually undo the corruption. | 
|  |  | 
|  | For example, the file foo.dsp, which has newline-style of | 
|  | CRLF. It's stored in the repository with CRLF newlines and on | 
|  | checkout, no transformation is done. If Linus checks out the file | 
|  | and edits it in an old version of emacs, any lines he adds will be | 
|  | terminated with a bare LF. Since this is his native style of | 
|  | newline, the transformation Greg described will undo this damage. | 
|  |  | 
|  | If the newline-style is set to a specific newline-style (ie. CR, | 
|  | LF, or CRLF), then we know that (1) the file is text, not binary, | 
|  | and (2), any other style of newline present is corruption. | 
|  |  | 
|  | A file should not be marked with a specific newline style unless | 
|  | (1) user does so explicitly, or (2) it matches some heuristic when | 
|  | it's added, *and* the file contents conform to that newline style. | 
|  |  | 
|  | So the only real possibility for corruption is if some user tool | 
|  | creates a binary file that matches a heuristic for a specific | 
|  | newline style. In our running example, William creates a vector | 
|  | graphics file called foo.dsp and adds it. By chance, this file | 
|  | happens to have CRLFs scattered though it, but no bare CRs, LFs, | 
|  | '\0' characters or other harbingers of binary files. On the commit, | 
|  | svn will notice the extension, set the newline-style to CRLF and | 
|  | send it to the repository. William may get an error if he tries to | 
|  | commit a change that introduces a bare CR or LF, but he won't | 
|  | corrupt the file. | 
|  |  | 
|  | Linus can corrupt the file if he makes a change that introduces a | 
|  | bare LF, which will get transformed into CRLF on | 
|  | commit. Alternatively, Madeleine (was that her name?) Can introduce | 
|  | a bare CR and commit, which will also corrupt the file. | 
|  |  | 
|  | That's a pretty long string of unlikely coincidences though, while | 
|  | the opposite case, where this transformation *fixes* corruption, is | 
|  | quite common. | 
|  |  | 
|  | Colin | 
|  |  | 
|  | Finally, Greg followed up, confirming again what he meant by that | 
|  | portion of the proposal, but also saying that he could go the other | 
|  | way on the question too: | 
|  |  | 
|  | > +1 on Greg Hudson's latest proposal -- and I think we're now | 
|  | > ready to Actually Do It. :-) | 
|  |  | 
|  | I hope so.  For a while I was afraid we had hit our first failure | 
|  | to achieve livable consensus.  My apologies for not realizing the | 
|  | reversability thing until two days and several thousand lines of | 
|  | misguided debate had already gone by. | 
|  |  | 
|  | > My assumption is that "in the working copy" means both text-base | 
|  | > and working file, for the sake of an efficient is-modified-p | 
|  | > test, and since the repository file is just an automatic | 
|  | > transform off the text-base anyway. | 
|  |  | 
|  | Actually, I was assuming that text-base would be a verbatim copy of | 
|  | the repository contents.  But that's kind of an implementation | 
|  | detail; let's leave that up to Ben (assuming he's doing the | 
|  | implementation). | 
|  |  | 
|  | > Otherwise, then the is-modified-p check has to be tweaked in a | 
|  | > way that will make modifiedness checks a lot slower in some | 
|  | > cases. | 
|  |  | 
|  | No... it just means that if the mod times force a contents check, | 
|  | you have to translate the text-base contents as you compare them | 
|  | against the normal contents.  That's "a teeny tiny bit slower," | 
|  | not, "a lot slower." | 
|  |  | 
|  | > The second sentence of the above paragraph isn't about allowing | 
|  | > mixed-style files.  It's saying that if the entire file is native | 
|  | > format, allow that (and transform when necessary), OR if the | 
|  | > entire file is in the requested style, then allow that too.  The | 
|  | > latter situation could happen if someone used a LF-style tool | 
|  | > under Windows, for example, so that when an LF-style file got | 
|  | > saved, the whole thing would be LF-style now, not native style. | 
|  | > No reason to disallow this. | 
|  | > | 
|  | > Right? | 
|  |  | 
|  | See my last message, as well as Colin Putney's argument.  In | 
|  | summary, that's not actually what I meant, but I don't really care | 
|  | either way. | 
|  |  | 
|  | We're implementing Greg's original meaning, since (as Colin Putney | 
|  | pointed out) the chance of "bad" data corruption is very small, and | 
|  | the chance that it would undo corruption is actually greater. | 
|  |  | 
|  |  | 
|  | * Keyword substitution | 
|  |  | 
|  | Quick summary: there's one property, named "svn:keywords".  It's value | 
|  | is a whitespace-separated list of keywords to expand.  Since keywords | 
|  | have both long and short forms, both ways are allowed in the value of | 
|  | the svn:keywords property.  Here are all the keywords: | 
|  |  | 
|  | "LastChangedBy"       "Author"    ---> either one expands author | 
|  | "LastChangedDate"     "Date"      ---> either one expands date | 
|  | "LastChangedRevision" "Rev"       ---> either one expands rev | 
|  | "HeadURL"             "URL"       ---> either one expands url | 
|  |  | 
|  | Here are some example values of the property: | 
|  |  | 
|  | "Rev LastChangedDate HeadURL" | 
|  |  | 
|  | "Author\nDate  \n  LastChangedRevision       URL" | 
|  |  | 
|  | Unrecognized words are ignored; absence of the property is the same as | 
|  | an empty value or a value with no valid keywords in it. | 
|  |  | 
|  | Keywords (long and short forms) are case-sensitive, as in CVS. |