| <!-- |
| |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| --> |
| <html> |
| <body> |
| <h2>GSF Incremental Updating</h2> |
| <h3>Introduction</h3> |
| <p> |
| GSF supports incremental updating. As the name implies, this means that |
| there is support to do cheaper incremental updates to data, insead |
| of computing everything from scratch, when something changes in the editor. |
| This shows up in several places: |
| |
| <ul> |
| <li> |
| <b>Parsing</b>. Your parser is told that edits are confined to |
| a certain region in the document. If you know that this is inside |
| a single method function, you can simply parse <i>just that function</i> |
| again, do some surgery on your abstract syntax tree, and you're done. |
| This makes parsing faster. |
| <br/><br/> |
| </li> |
| <li> |
| <b>Embedding</b>. Suppose the user is editing JavaScript in a |
| <code><script></code> block in an HTML file. The virtual |
| source provider for CSS knows that this doesn't affect CSS. Therefore, |
| several optimizations are possible: |
| <ul> |
| <li> |
| It doesn't have to regenerate the CSS virtual source, it can use |
| the existing one, and therefore |
| </li> |
| <li> |
| The CSS virtual source doesn't have to be parsed again. In fact, |
| it doesn't even have to have the parse tree offsets updated, since |
| parse tree offsets apply to the virtual source only, and the |
| position mapping is maintained by the virtual source which updated |
| the position mappings while checking the embedded scenario. |
| </li> |
| <li> |
| Features downstream, like semantic highlighting, can tell that |
| the parsing result for CSS was not updated, so it can more cheaply |
| update itself, just reusing the most recent highlighting result |
| from CSS and just updating the offsets. |
| </li> |
| </ul> |
| <br/><br/> |
| </li> |
| |
| <li> |
| <b>GSF Features</b>. When features are aware of incremental support, they |
| can do less work. As described above, in the embedding scenario, |
| the GSF feature implementations can tell when a whole language is |
| skipped because its virtual source wasn't affected. Thus, the navigator, |
| semantic highglighting etc. only have to do work based on which languages |
| were involved in the edits. |
| <br/><br/> |
| Note however that this isn't limited to the embedding scenarios. |
| An incremental update aware parser can mark a parser result as |
| unchanged. For example, if the user edited whitespace (outside of a string) |
| or a comment, the abstract syntax tree is unaffected, and if the |
| parser communicates this by marking the parser result unchanged, then |
| GSF features like navigation won't have to do any work. |
| <br/><br/> |
| </li> |
| |
| <li> |
| <b>Faster Implementations</b>. Usually an incremental parse won't be |
| semantically identical to the previous parser result. Yes, you parsed |
| a single function again, but the user probably typed something such |
| that the method body is now slightly different. However, the fact |
| that just the method has changed is something you can use to your advantage |
| when implementing the various GSF feature callbacks. For example, |
| in semantic highlighting, you only have to re-analyze the updated |
| method (unless there are language specific or feature specific reasons |
| you have to analyze more than just the method). |
| <br/><br/> |
| </li> |
| </ul> |
| </p> |
| <h3>Support</h3> |
| <p> |
| If you tell the infrastructure that you support incremental updating, |
| GSF will keep some extra data around. First, it will keep your most recent |
| parser result (and for embedded files, you more recent virtual source translation). |
| Second, it will keep track of all the edits that have occurred since the |
| last parser request. |
| </p> |
| <p> |
| When it's time to parse your file again, your incremental parser will |
| be handed the edit history, and your previous parser result, and you |
| can create a new parser result by analyzing your previous result and the |
| edit deltas. |
| </p> |
| <h3>Edit History</h3> |
| <p> |
| The most important concept for incremental updating is the |
| <a href="org/netbeans/modules/gsf/api/EditHistory.html">EditHistory</a> object. |
| GSF provides you with an <code>EditHistory</code> object when |
| you are asked to parse (for incremental parsers) or when you are |
| asked to generate a virtual source (for incremental embedding models). |
| The <code>EditHistory</code> object provides information about |
| all the edits that have occurred since your last parse job. |
| Along with the <code>EditHistory</code> object, you are passed your |
| most recent result, such that you can base your incremental computations |
| on it. |
| </p> |
| <p> |
| The EditHistory tracks edits accurately, so you can use the |
| <code>convertOldToNew()</code> method to translate a pre-edits offsets |
| to a post-edits offsets. However, the EditHistory maintains a |
| more interesting concept: the <b>affected region</b>. The affected |
| region is roughly the start and end offsets in the original document |
| and the corresponding start and end offsets in the edited document. |
| You can think of the affected region as the "damaged region". |
| Everything before and after the affected region was unchanged. |
| The following key attributes are provided by EditHistory: |
| <ol> |
| <li> The start offset</li> |
| <li> The original size</li> |
| <li> The new size</li> |
| </ol> |
| These three parameters indicate that in the old document, the text between |
| <code>offset</code> and <code>offset+originalSize</code> has been modified, |
| and after the edits, this region corresponds to |
| <code>offset</code> to <code>offset+editedSize</code>. Put another way, |
| all document positions below <code>offset</code> are unaffected by the edits, |
| and all document positions above <code>offset+originalSize</code> are uniformly |
| shifted up by a delta of <code>editedSize-originalSize</code> (which can be negative, |
| when more text was deleted than added). |
| </p> |
| <p> |
| Here's how this works. First, let's suppose we insert some text: |
| <br/> |
| <img src="history1.png" /> |
| <br/> |
| <br/> |
| Here, we've inserted <code>XY</code> between the characters <code>F</code> |
| and <code>G</code>. Here, the <code>offset</code> where the differences |
| begin is 6. In the old document, the differences also end at offset 6, |
| whereas in the edited document, the differences end at 8. |
| Therefore, our <code>offset</code> is 6, the <code>originalSize</code> is 0, |
| the <code>editedSize</code> is 2, and therefore the <code>delta</code> is +2. |
| </p> |
| <p> |
| Here's a more complicated example, where we've performed multiple edits. |
| The affected region is going to be the extent surrounding all edits |
| from the edits to the lowest offset in the document, to the edits at the |
| highest offset in the document: |
| <br/> |
| <img src="history2.png" /> |
| <br/> |
| <br/> |
| Here's we've deleted one character from two different locations in |
| the document - <code>D</code> and <code>H</code>. The blue regions |
| show the old region and the new corresponding region. Yes, some |
| of the contents within the old region are still there in the new region, |
| but the <b>key</b> point is that <b>before</b> the affected offset, |
| both documents are identical, and similarly, after the end of the |
| affected regions, both documents are identical. Typically, when |
| a user is editing, the changes will be local to a small region |
| (such as a function body), and an incremental parser can decide |
| that the entire affected region is in an incrementally parseable block. |
| It can also use the block delta to adjust offsets - add the delta |
| to all offsets outside the old region block. |
| </p> |
| <h3>Incremental Parsing</h3> |
| <p> |
| To implement incremental parsing, rather than extending the plain |
| <a href="org/netbeans/modules/gsf/api/Parser.html">Parser</a> |
| interface, implement the |
| <a href="org/netbeans/modules/gsf/api/IncrementalParser.html">IncrementalParser</a> |
| interface instead. |
| GSF will store the most recent parser result, and track editing history, |
| for source files that are parsed by an incremental parser. |
| The first time the file is opened, no previous parser result is available, |
| so the plain parsing method will be called. However, for subsequent edits, |
| the incremental parser will be called instead. If it fails, the regular |
| parse method will be called instead. |
| </p> |
| <p> |
| Your incremental parser will typically perform the following steps: |
| <ul> |
| <li> |
| Look up the <code>EditHistory</code>'s offset. |
| </li> |
| <li> |
| First, look in the token hierarchy and see if the edits are confined |
| to whitespace or comments only, or other lexical regions that won't |
| affect the parse tree. If so, just return the previous parse result, |
| or a clone of it, and mark its <code>setUpdateState</code> |
| with <code>ParserResult.UpdateState.NO_SEMANTIC_CHANGE</code> |
| and you're done. |
| </li> |
| <li> |
| Look in the previous parsing result's abstract syntax tree |
| for the node surrounding the given offset. For example, in JavaScript, |
| we want to incrementally compile function bodies, so we look for |
| the function node surrounding the offset. If there is no such |
| function node, we're editing outside of functions and incremental |
| parsing isn't available, so we just exit. (Normal parsing will be |
| invoked by the infrastructure instead.) |
| </li> |
| <li> |
| Look at the <code>EditHistory</code> and make sure the entire |
| affected region (<code>history.getStart()</code> and <code>history.getOldEnd</code>) |
| is completely inside the function body. If not, we've edited outside |
| of just a single function, so just exit. |
| </li> |
| <li> |
| We now know the function to be incrementally edited. We can also compute |
| its <b>NEW</b> dimensions; it starts at the same offset as before, |
| and it ends at <code>history.convertOldToNew(oldFunctionEnd)</code>. |
| That's right, the end of the function is outside the edited region, |
| so (as described in the EditHistory section above) we just have to take |
| the old offset and shift it by <code>history.getSizeDelta()</code>. |
| </li> |
| <li> |
| With the new function offsets, we just look up the source code, via |
| <code>document.getText(offset, newEnd-offset)</code>. |
| </li> |
| <li> |
| We parse the new function body. This might require to run the parser |
| in a special mode. For JavaScript, I modified the parser to have a special |
| method which lets me parse function bodies and return a function node. |
| </li> |
| <li> |
| Next we need to adjust all the source offsets in the AST. The offset |
| in the input we passed to the parser was 0, but this function is really |
| at <code>oldFunctionStart</code>, so add <code>oldFunctionStart</code> |
| to all AST node start and end offsets to shift the AST nodes to their |
| correct place in the edited buffer. |
| </li> |
| <li> |
| Similarly, adjust the error offsets of any parser errors that were |
| registered during parsing of the method body. |
| </li> |
| <li> |
| Next, we need to remove the old function from the abstract syntax tree. |
| </li> |
| <li> |
| Next, we need to update the offsets for all the AST nodes. None of the |
| nodes should be in the actual edited area (which was all inside the |
| now recompiled function). Thus, we just have to iterate over the nodes |
| and adjust the offsets; all offsets less than <code>history.getStart()</code> |
| can be left alone, and all offsets greater than or equal to |
| <code>history.getOldEnd()</code> should be incremented by |
| <code>history.getSizeDelta()</code>. There are methods on the <code>EditHistory</code> |
| object to do this. |
| </li> |
| <li> |
| Next, we insert our own function node into the AST in the exact same place |
| the previous function node was sitting. |
| </li> |
| <li> |
| Next, we filter through all the error messages in the previous parser |
| result. Any errors that were in the old function (NOTE - in the whole old function |
| range, not just in the <code>EditHistory</code> affected region since |
| we recompiled the entire function, not just the affected region) should |
| be removed, and all other errors passed on. Finally, add in our new |
| error messages from the compiled method. |
| </li> |
| <li> |
| Finally, we create a new <code>ParserResult</code> instance, initialize |
| it with our AST, our updated error list, and set its <code>setUpdateState</code> |
| to <code>ParserResult.UpdateState.UPDATED</code>. We also store in our |
| own ParserResult, references to the old and new function nodes we replaced. |
| This will be used by feature implementations like our <code>SemanticAnalyzer</code> |
| to perform incremental semantic analysis by only looking at the given |
| function node subtree. |
| </li> |
| </ul> |
| </p> |
| Note that the JavaScript module already performs incremental parsing, so |
| you can look at its implementation for inspiration. |
| |
| |
| <h3>Incremental Updating</h3> |
| <p> |
| The previous section described how to perform incremental parsing. From the |
| checklist you can see that implementing incremental parsing isn't nontrivial. |
| It may be even more difficult in cases where you don't own the parser yourself. |
| For example, for Ruby, Python and Groovy the parser will probably be |
| JRuby, Jython, and Groovyc respectively, so parsing method bodies for example |
| will require support in the corresponding projects. |
| </p> |
| <p> |
| That doesn't mean incremental updating isn't possible. Parsing itself will have |
| to process the full AST, but you can still analyze the edited region and |
| reduce the amount of work. |
| </p> |
| <p> |
| As before, you should implement the |
| <a href="org/netbeans/modules/gsf/api/IncrementalParser.html">IncrementalParser</a> |
| interface, such that GSF will pass your previous parser result and the <code>EditHistory</code> |
| to you. Then, you parse the request - and unfortunately, you'll be parsing the entire |
| document. |
| </p> |
| <p> |
| However, you should now use the <code>EditHistory</code> along with the previous |
| <code>ParserResult</code> to see whether the changes were local to a single block |
| such as a method. If they were, you can also compute exactly which method was just |
| updated, by looking up the corresponding offset in the <code>EditHistory</code> |
| and looking in your new parser result. |
| </p> |
| <p> |
| Now you know that only a single method in your new parser result is actually "new". |
| In your downstream feature implementations, such as the semantic analyzer, |
| you can use this information to combine your previous result (which you stashed |
| on your previous parser result), with local computations based only on the changed |
| method. |
| </p> |
| <p> |
| Therefore, you can drive the often more expensive computations (type analysis, |
| semantic analysis, navigator/outline computations, etc) to do simple, incremental |
| updates, even if the parse tree itself was fully regenerated by a non-incremental |
| parser! |
| </p> |
| |
| <h3>Incremental Embedding Models</h3> |
| <p> |
| GSF supports embedding through "embedding models", which produce a "virtual source", |
| one for each targeted parser language in an embedded language. For example, |
| in an RHTML file, there is a virtual source generated for JavaScript (which gives |
| a JavaScript view of the file, typically concatenating the various |
| <code><script></code> blocks and a few other things), as well as one for |
| CSS, and one for Ruby. |
| </p> |
| <p> |
| Each virtual source translator takes the original source language and computes |
| a virtual source. With the incremental update support, a virtual source |
| translator can tell the infrastructure that it supports incremental updates. |
| First, instead of implementing the |
| <a href="org/netbeans/modules/gsf/api/EmbeddingModel.html">EmbeddingModel</a> |
| interface as before, implement |
| <a href="org/netbeans/modules/gsf/api/InrementalEmbeddingModel.html">IncrementalEmbeddingModel</a>. |
| </p> |
| <p> |
| Once you do that, GSF will cache your virtual source for you, and when it's time |
| to update the virtual source parse trees, it will call your incremental |
| embedding model and pass it your previous virtual source and an |
| <code>EditHistory</code> object. You can use the <code>EditHistory</code> to |
| determine if the edits affects your virtual source. |
| For example, for <code>JavaScript</code>, if you've edited something inside |
| the <code><style></code> element (CSS code), the edit can't possibly affect |
| the JavaScript virtual source. Therefore, the virtual source doesn't change, |
| and therefore the previous JavaScript parse tree for the virtual source doesn't |
| have to be reparsed - it doesn't even have to be updated for new AST offsets, |
| since the AST itself hasn't changed. However, the embedding model itself |
| is responsible for mapping AST offsets to source document offsets. |
| Therefore, the incremental embedding model needs to go through its |
| position mapping tables and update them. Again, this typically just means |
| shifting positions above the affected region up by the edit history's |
| size delta. |
| </p> |
| <p> |
| If this succeeds, your embedding model can just return |
| <code>IncrementalEmbeddingModel.UpdateState.COMPLETED</code>. This tells the |
| infrastructure that the embedding model was updated, and there is nothing else |
| to do - it doesn't have to parse the result of the virtual source! |
| </p> |
| <p> |
| But what if the user edited something in an edit region that affects the |
| virtual source? In that case, it can simply return |
| <code>IncrementalEmbeddingModel.UpdateState.FAILED</code>. This tells the |
| infrastructure that an incremental update cannot be completed, and GSF |
| will perform a new (non-incremental) virtual source translation, and |
| parse the result. |
| </p> |
| <p> |
| Finally, if you're somewhere in between - you updated the virtual source |
| such that it now reflects the recent edits, but the model changed such that |
| it must be reparsed, return |
| <code>IncrementalEmbeddingModel.UpdateState.UPDATED</code>. This tells the |
| infrastructure that the virtual source is up to date and that a parse |
| result should be computed for it. |
| </p> |
| <p> |
| Here's a complete example of how this works for an embedding model; this is the CSS embedding |
| model's incremental update: |
| <pre> |
| IncrementalEmbeddingModel.UpdateState incrementalUpdate(EditHistory history) { |
| // Clear cache |
| // prevLexOffset = prevAstOffset = 0; |
| prevLexOffset = history.convertOldToNew(prevLexOffset); |
| |
| int offset = history.getStart(); |
| int limit = history.getOldEnd(); |
| int delta = history.getSizeDelta(); |
| |
| boolean codeOverlaps = false; |
| for (CodeBlockData codeBlock : codeBlocks) { |
| // Block not affected by move |
| if (codeBlock.sourceEnd <= offset) { |
| continue; |
| } |
| if (codeBlock.sourceStart >= limit) { |
| codeBlock.sourceStart += delta; |
| codeBlock.sourceEnd += delta; |
| continue; |
| } |
| if (codeBlock.sourceStart <= offset && codeBlock.sourceEnd >= limit) { |
| codeBlock.sourceEnd += delta; |
| codeOverlaps = true; |
| continue; |
| } |
| return IncrementalEmbeddingModel.UpdateState.FAILED; |
| } |
| |
| return codeOverlaps ? IncrementalEmbeddingModel.UpdateState.UPDATED : IncrementalEmbeddingModel.UpdateState.COMPLETED; |
| } |
| </pre> |
| </p> |
| |
| |
| <h3>Incremental Feature Updates</h3> |
| <p> |
| <code>ParserResult</code> stores an <code>UpdateState</code> enum value |
| indicating what kind of update was performed. For example, if it is |
| <code>NO_SEMANTIC_CHANGE</code> we know that in this parser result, |
| nothing in the AST changed (though offsets may have changed). |
| This lets the infrastructure know that it can take certain shortcuts. |
| For example, semantic highlighting won't recompute the data, it will just |
| update its own offsets based on the edit history. |
| </p> |
| <p> |
| Not all GSF feature implementations are using the incremental data yet; |
| this will change as soon as possible... |
| </p> |
| <p> |
| You should use the same approach for feature implementations in your language |
| plugin. Start with the most expensive computations (for example, type |
| analysis), and use the "what changed" data (either the <code>EditHistory</code>, |
| or specific parse tree nodes derived from the <code>EditHistory</code>) |
| to just filter your previously computed result. |
| </p> |
| |
| <br/> |
| <span style="color: #cccccc">Tor Norbye <tor@netbeans.org></span> |
| </body> |
| </html> |