ide/csl.api/doc/indexer.html - netbeans - Git at Google

 <!--

     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.

 -->
 <html>
     <body>
         <h2>GSF Indexing and Querying</h2>
         <p>
             If you register an indexer with GSF, it will be called at the right times
             to extract relevant information from a ParseResult, and store it somewhere
             in persistent store. You can later quickly search this index. This lets
             you for example implement cross-file behavior such that a user can
             search the whole project for functions or classes (via the Open Type
             dialog in NetBeans), or cross-file go to declaration, code completion
             referencing symbols in other files, etc. And this can be achieved without
             having to go and parse other files on the fly when you're trying to
             collect project information. The index is always kept in sync by GSF.
             At startup, GSF looks at all the files in your project (as determined by
             the relevant source directories, see the section on <a href="#Classpath">Classpath</a>)
             and checks whether the timestamp is more recent than the previous timestamp
             that is recorded in the index. Files that have changed (or have never been
             indexed) will be indexed, in a background thread, at startup. Similarly,
             as soon as a file deleted, or edited and then closed, the index will be
             updated. GSF also tracks whether a file has been edited, such that if you
             attempt to access its index when the index is considered dirty, the indexer
             will be immediately called such that all clients of the index can assume
             the index is always up to date.
         </p>
         <p>
             Indexing is simple. It basically lets you, for any particular source file,
             store a series of "Documents", where each document contains a multimap.
             Each multimap consists of keys, and one or more values each key.
             Furthermore, keys can be designated as "searchable" or not.
             This just controls whether you can search the index by the given key. (Keys that
             are searchable require a bit more maintenance on the GSF side, which is
             why you avoid it for keys you never plan on searching by).
         </p>
         <p>
             It is up to <b>you</b> to decide what information to store in the index,
             how to divide it into documents, how to "encode" it into Strings (all
             [key,value] pairs are of type [String,String], and so on.
             Typically, you will make this decision based on how you can later retrieve
             the information. So, you need to think up front about what you will need,
             and organize the information based on that.
             When you search, you can search for a given key. This will return
             you a set of all the <b>documents</b> where the search occurs. This lets
             you find other values associated with the [key,value] pair in the
             same document.
         </p>
         <h3>Example: Ruby</h3>
         <p>
             Here's how the information is currently stored for Ruby.
             In Ruby, a file can contain many classes. I decided to put each
             class in its own search document. This means that if I for example
             search for methods that start with "foo", I can look in the same
             document for the "class" attribute and I will have information
             about the class containing the method. Here's an example of
             a Ruby file and the search documents created by the Ruby language
             plugin's indexer:
             <br/>
             <img src="indexing.png"/>
             <br/>
             <br/>
             As you can see, we get two documents, one for each class in the file.
             In the second class, we have two methods. Each method
             is recorded with a "method" key. Later, if I call
             <code>searchDoc.getValue("method")</code> I will get out two strings,
             <code>{"mymethod","myother"}</code>.  If I now want to know what
             class these methods are coming from, I can call
             <code>searchDoc.getValue("class")</code> - or to get the fully qualified
             name (fqn), <code>searchDoc.getValue("fqn")</code>. I can also look
             up attributes for this class with <code>searchDoc.getValue("attrs")</code>
             which for example lets me see if this is a class or a module. I store
             a number of attributes - whether a class is documented, or if it
             is explicitly to be ignored, and so on. These are used by the code
             completion and go to declaration feature implementations to for example
             pick the best candidate among multiple possibilities.
             There's a more complete list of the data stored in the Ruby index
             later in this document.
         </p>
         <h3>Indexing</h3>
         <p>
             To index your code, implement the
             <a href="org/netbeans/modules/gsf/api/Parser.html">Indexer</a> interface.
             First, you must implement the <code>isIndexable</code> method:
             <pre style="background: #ffffcc; color: black; border: solid 1px black; padding: 5px">
     boolean <b>isIndexable</b>(@NonNull ParserFile file);
             </pre>
             This should just be a quick check to see if the given file should
             be indexed by this indexer. Typically, you'll just look at the file
             name and make a decision.
             <div style="float:right; width: 300px; background: #ccffcc; color: black; border: solid 1px black; padding: 10px">
                 Indexers shouldn't have to know about potential languages they are embedded in.
                 GSF has this information already, via the embedding provider registrations.
                 This should be used somehow such that indexers don't have to know everything.
             </div>
             For JavaScript for example, we care about
             <code>.js</code> files, as well as <code>.html</code>, <code>.jsp</code>
             and <code>.erb</code>, since these embedded file types can contain
             JavaScript functions we want in the index.
             In some cases however, the logic can be slightly more complicated.
             In the case of JavaScript again, it's pretty common to have "minimized"
             versions of a file, where symbols have been replaced with single letter
             names, whitespace removed etc. to make the code as small and fast as
             possible. We typically don't want to index these versions, so the
             JavaScript indexer checks the file name and looks for name pattern
             overlaps. If it for example finds <code>foo-debug.js</code>, <code>foo-min.js</code>
             and <code>foo.js</code>, it will return <code>false</code> from <code>isIndexable()</code>
             for both <code>foo-debug.js</code> and <code>foo-min.js</code>, and <code>true</code>
             only for <code>foo.js</code>.

         </p>
         <p>
             <b>IMPORTANT</b>: You may be tempted to base your decision on the mimetype of
             a file, but be careful calling <code>file.getFileObject().getMimeType()</code>.
             First, <code>file.getFileObject()</code> can be a performance issue.
             During startup, your indexer will be asked by the system for all the
             source files in the project if they are indexable. Each <code>getFileObject</code>
             can end up doing a lot of work.<br/><br/>
             A bigger problem is that when source files are deleted, the index needs to
             be cleaned up. The IDE will ask all the indexers if they care about this file,
             and if they do, the index entries will be cleared for this file.
             (TODO: Perhaps I should just unconditionally try to delete the file in all
             indices?)
             In any case, when a file is deleted, its <code>FileObject</code> no longer
             exists, and calling <code>file.getFileObject()</code> will return null.
             You can end up getting an NPE in these cases if you're not careful.
         </p>
         <p>
             The second method you have to implement in your indexer is
             <pre style="background: #ffffcc; color: black; border: solid 1px black; padding: 5px">
     List&lt;IndexDocument&gt; <b>index</b>(ParserResult result, IndexDocumentFactory factory)
             </pre>

             Here, all you have to do is look up your own AST from your <code>ParserResult</code>.
             Then, look through your AST, pick out the information you want to store, and store
             it in the index. You do that by creating one or more documents (by calling
             the <code>IndexDocumentFactory</code> which is passed to you above.
             The <code>IndexDocumentFactory</code> lets you create an <code>IndexDocument</code>.
             And each <code>IndexDocument</code> object has a single method:
             <pre style="background: #ffffcc; color: black; border: solid 1px black; padding: 5px">
     void <b>addPair</b>(String key, String value, boolean searchable);
             </pre>
             So, all you have to do here is call <code>addPair</code> repeatedly, with key,value pairs.
             You also get to decide if your key is searchable. You <b>must</b> be consistent
             in how you call this method with respect to the <code>searchable</code> parameter
             for a given key in a given document.
         </p>

         <h3>Searching</h3>
         <p>
             You can search your index. For the various feature implementations (code completion,
             go to declaration, and so on), you are passed a
             <a href="org/netbeans/modules/gsf/api/CompilationInfo.html">CompilationInfo</a> instance.
             The <code>CompilationInfo</code> holds a lot of vital state: your parser result,
             the file object and document corresponding to the current file, and so on.
             It also gives you access to the
             <a href="org/netbeans/modules/gsf/api/Index.html">Index</a> object.
             The <code>Index</code> class has one method:
             <div style="float:right; width: 300px; background: #ccffcc; color: black; border: solid 1px black; padding: 10px">
                 I plan to add a simple <code>SearchContext</code> class which holds some of the
                 search parameters, and the search method will change into a simple
                 <code>search(SearchContext)</code> method. I also plan to remove the result list
                 input parameter and just return it (created by the index) instead.
             </div>
             <pre style="background: #ffffcc; color: black; border: solid 1px black; padding: 5px">
     public abstract void <b>search</b>(
             @NonNull final String <b>key</b>,
             @NonNull final String <b>value</b>,
             @NonNull final NameKind kind,
             @NonNull final Set&lt;SearchScope&gt; scope,
             @NonNull Set&lt;SearchResult&gt; <b>result</b>,
             @NonNull final Set&lt;String&gt; includeKeys);
             </pre>
             The way this works is that you create your own empty set of SearchResults,
             and then you call the search method on the index with a key, a value (which
             for example can be a prefix), and so on. This will fill your SearchResult
             set with matches.
         </p>
         <p>
             You can now iterate over the matching SearchResults (which will correspond to
             <code>IndexDocument</code>s your indexer have created earlier), and for each,
             call one of the getValue methods:
             <pre style="background: #ffffcc; color: black; border: solid 1px black; padding: 5px">
     @NonNull String <b>getValue</b>(@NonNull String <b>key</b>);
     @NonNull String[] <b>getValues</b>(@NonNull String <b>key</b>);
             </pre>
             If you know that you're storing just one value for a given in the index
             (such as the <code>class</code> or <code>fqn</code> keys in the Ruby index),
             you can call the <code>getValue</code> method to get it directly. If you're
             storing many values (such as the <code>method</code> key in the Ruby index),
             call <code>getValues</code> to get an array of all the results.
         </p>
         <h3>Classpath: Searching What?</h3>
         <p>
             In the search section above I glossed over some of the other parameters
             in the <code>search</code> method. In particular, the <code>scope</code>
             parameter, which lets you control whether to search just the current
             project, or just the libraries, or both. But how does GSF know what source
             files are relevant in the project? And what about the libraries?
         </p>
         <p>
             This is controlled by a different API in GSF, which you use to integrate
             into different project types and tell GSF which directories are relevant
             for this project, as well as register libraries. This is described in
             more detail in the <a href="classpath.html">Classpath</a> document.
         </p>

         <h3>Index Abstraction</h3>
         <p>
             I recommend that you build an abstraction on top of the index. Your various
             feature implementations shouldn't go into the index and look for specific keys
             and so forth. That will make changing the index (which I will discuss in the
             next section) very difficult.  Instead, you should create a class which
             wraps the index with logical methods. For example, in Ruby, I have a RubyIndex
             class which has dedicated methods like
             <ul>
                 <li><code>getClasses()</code></li>
                 <li><code>getMethods()</code></li>
                 <li><code>getSubclassesOf()</code></li>
                 <li><code>getRequires()</code></li>
                 <li><code>getTransitiveRequires()</code></li>
                 <li><code>getSuperClass()</code></li>
                 <li><code>getOverridingMethod()</code></li>
                 <li><code>getInheritedMethods()</code></li>
             </ul>
             and so on. This places all the logic of your index schema in two classes -
             your index abstraction and your indexer, and you can more easily change it
             later.
         </p>
         <h3>Making Changes</h3>
         <p>
             During development, including after your first release, you'll probably find
             that you want to make changes to the index, either because of performance
             reasons, or because you want to store more information to support new features.
             However, if you just make the change in your indexer and index query code,
             you might have a really hard time dealing with the fact that your users
             can have existing indices out there with the old data format.
         </p>
         <p>
             Dealing with this is easy. The <code>Indexer</code> class has these two
             methods:
             <pre style="background: #ffffcc; color: black; border: solid 1px black; padding: 5px">
     @NonNull String <b>getIndexerName</b>();
     @NonNull String <b>getIndexVersion</b>();
             </pre>
             Any time you make an incompatible change to the way you are storing data
             in the index, just change the value of your <code>getIndexVersion()</code>
             method. This will force reindexing of all the code whenever your users
             run with your new version of the indexer.
         </p>
         <p>
             The way this works on the implementation side is that each index database
             is isolated, and in particular, they are stored in the user directory,
             under <code>var/cache/gsf-index/<i>your-index-name</i>/<i>your-index-version</i></code>.
             For example, the Ruby indexer names its indexer "ruby", and the JavaScript
             indexer names its indexer "javascript". This ensures that these two systems
             don't interfere which each others databases. The index version is just a string,
             and you can put anything you want here (as long as it's a string that is
             filesystem safe, meaning it can be written on all file systems where NetBeans
             runs). A good convention is to just use a number - e.g. "42" or "6.5.2", etc.
         </p>
         <h3>Unit Testing</h3>
         <p>
             Unit testing your indexer is trivial. Create a unit test class for your
             indexer, and make sure it extends <code>GsfTestBase</code>. Then, just
             create new tests with the following single line call:
             <pre style="background: #ffffcc; color: black; border: solid 1px black; padding: 5px">
     public class RubyIndexerTest extends RubyTestBase {
         public void testIndexData() throws Exception {
             checkIndexer("testfiles/date.rb");
         }

         public void testRails1() throws Exception {
             checkIndexer("testfiles/action_controller.rb");
         }
         ...etc
     }
             </pre>
             All you have to do is put the test files you want into <code>test/unit/data</code>,
             refer to them from the <code>checkIndexer</code> call, and the GSF test infrastructure
             will do the rest: it will load the file, parse it using your parser,
             call your indexer, pretty print it, see if a golden file (named the same as
             the input file name, plus the extra extension <code>.indexed</code>) exists,
             and if not create it, or otherwise diff the computed results with the golden
             file. The test fails if the diffs aren't identical.
         </p>
         <p>
             See the <a href="unit-testing.html">Unit Testing</a> document for more information.
         </p>

         <h3>Debugging Query Problems</h3>
         <p>
             Sometimes a feature fails -- for example, code completion doesn't return the
             results you expect. One way to debug this is to use the GSF diagnostics tools.
             The Lucene Index Browser is described in the
             <a href="gsf-tools.html#index-browser">GSF Diagnostics Tools</a> document.
         </p>
         <h3>Ruby Indexed Data</h3>
         <p>
             The following table shows the data currently stored in the Ruby index, which
             will hopefully give some ideas for how a complete indexer/query system
             can work:
             <table border="1" style="background: #ffffcc; border-collapse: collapse; border:solid 1px black">
                 <tr>
                     <th>Key</th><th>Value</th>
                 </tr>
                 <tr>
                     <td>fqn</td>
                     <td>Fully qualified name of a class or module, such as "Test::Unit" for the Unit class.</td>
                 </tr>
                 <tr>
                     <td>class</td>
                     <td>The "basename" of a class or module, such as "Unit" for the Test::Unit class.</td>
                 </tr>
                 <tr>
                     <td>class-ig</td>
                     <td>A lowercased version of the class name. Used for case insensitive queries. (This shouldn't be necessary).</td>
                 </tr>
                 <tr>
                     <td>extends</td>
                     <td>The fully qualified name of a superclass, if any. Used to compute inherited methods by iterating through class documents.</td>
                 </tr>
                 <tr>
                     <td>in</td>
                     <td>The name of the module this class is contained in.</td>
                 </tr>
                 <tr>
                     <td>require</td>
                     <td>The string required to include this class from other files, e.g. <code>io/nonblock</code> for the nonblock.rb file. Used for require-completion.</td>
                 </tr>
                 <tr>
                     <td>requires</td>
                     <td>A list of all the files <i>this</i> file requires. Used to compute file inclusion transitively.</td>
                 </tr>
                 <tr>
                     <td>includes</td>
                     <td>A list of modules included (not the same as requires) by this class or module.</td>
                 </tr>
                 <tr>
                     <td>extendWith</td>
                     <td>Used to simulate the way classes can dynamically be extended with a given module in Ruby.</td>
                 </tr>
                 <tr>
                     <td>method</td>
                     <td>A method in the class. This isn't just a method name; it's a pretty complicated encoding
                         of the method name, its parameter list, its (optional) return type, its (optional) parameter
                         hints or types, as well as a set of attributes for the method (is documented, is ignored,
                     etc.)</td>
                 </tr>
                 <tr>
                     <td>field</td>
                     <td>A field in the class</td>
                 </tr>
                 <tr>
                     <td>attribute</td>
                     <td>An attribute in the class.</td>
                 </tr>
                 <tr>
                     <td>constant</td>
                     <td>A constant in the class</td>
                 </tr>
                 <tr>
                     <td>attrs</td>
                     <td>A set of attributes for the class, written as a hex value string</td>
                 </tr>
                 <tr>
                     <td>dbtable</td>
                     <td>For active record database table completion, the table name this migration refers to</td>
                 </tr>
                 <tr>
                     <td>dbversion</td>
                     <td>The version number of this migration (used to apply the migrations in the correct order)</td>
                 </tr>
                 <tr>
                     <td>dbcolumn</td>
                     <td>A column name to be added, removed or renamed (indicated with + or - after the name)</td>
                 </tr>
             </table>
         </p>
         <br/>
         <span style="color: #cccccc">Tor Norbye &lt;tor@netbeans.org&gt;</span>
     </body>
 </html>
	<!--

	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.

	-->
	<html>
	<body>
	<h2>GSF Indexing and Querying</h2>
	<p>
	If you register an indexer with GSF, it will be called at the right times
	to extract relevant information from a ParseResult, and store it somewhere
	in persistent store. You can later quickly search this index. This lets
	you for example implement cross-file behavior such that a user can
	search the whole project for functions or classes (via the Open Type
	dialog in NetBeans), or cross-file go to declaration, code completion
	referencing symbols in other files, etc. And this can be achieved without
	having to go and parse other files on the fly when you're trying to
	collect project information. The index is always kept in sync by GSF.
	At startup, GSF looks at all the files in your project (as determined by
	the relevant source directories, see the section on <a href="#Classpath">Classpath</a>)
	and checks whether the timestamp is more recent than the previous timestamp
	that is recorded in the index. Files that have changed (or have never been
	indexed) will be indexed, in a background thread, at startup. Similarly,
	as soon as a file deleted, or edited and then closed, the index will be
	updated. GSF also tracks whether a file has been edited, such that if you
	attempt to access its index when the index is considered dirty, the indexer
	will be immediately called such that all clients of the index can assume
	the index is always up to date.
	</p>
	<p>
	Indexing is simple. It basically lets you, for any particular source file,
	store a series of "Documents", where each document contains a multimap.
	Each multimap consists of keys, and one or more values each key.
	Furthermore, keys can be designated as "searchable" or not.
	This just controls whether you can search the index by the given key. (Keys that
	are searchable require a bit more maintenance on the GSF side, which is
	why you avoid it for keys you never plan on searching by).
	</p>
	<p>
	It is up to <b>you</b> to decide what information to store in the index,
	how to divide it into documents, how to "encode" it into Strings (all
	[key,value] pairs are of type [String,String], and so on.
	Typically, you will make this decision based on how you can later retrieve
	the information. So, you need to think up front about what you will need,
	and organize the information based on that.
	When you search, you can search for a given key. This will return
	you a set of all the <b>documents</b> where the search occurs. This lets
	you find other values associated with the [key,value] pair in the
	same document.
	</p>
	<h3>Example: Ruby</h3>
	<p>
	Here's how the information is currently stored for Ruby.
	In Ruby, a file can contain many classes. I decided to put each
	class in its own search document. This means that if I for example
	search for methods that start with "foo", I can look in the same
	document for the "class" attribute and I will have information
	about the class containing the method. Here's an example of
	a Ruby file and the search documents created by the Ruby language
	plugin's indexer:
	<br/>
	<img src="indexing.png"/>
	<br/>
	<br/>
	As you can see, we get two documents, one for each class in the file.
	In the second class, we have two methods. Each method
	is recorded with a "method" key. Later, if I call
	<code>searchDoc.getValue("method")</code> I will get out two strings,
	<code>{"mymethod","myother"}</code>. If I now want to know what
	class these methods are coming from, I can call
	<code>searchDoc.getValue("class")</code> - or to get the fully qualified
	name (fqn), <code>searchDoc.getValue("fqn")</code>. I can also look
	up attributes for this class with <code>searchDoc.getValue("attrs")</code>
	which for example lets me see if this is a class or a module. I store
	a number of attributes - whether a class is documented, or if it
	is explicitly to be ignored, and so on. These are used by the code
	completion and go to declaration feature implementations to for example
	pick the best candidate among multiple possibilities.
	There's a more complete list of the data stored in the Ruby index
	later in this document.
	</p>
	<h3>Indexing</h3>
	<p>
	To index your code, implement the
	<a href="org/netbeans/modules/gsf/api/Parser.html">Indexer</a> interface.
	First, you must implement the <code>isIndexable</code> method:
	<pre style="background: #ffffcc; color: black; border: solid 1px black; padding: 5px">
	boolean <b>isIndexable</b>(@NonNull ParserFile file);
	</pre>
	This should just be a quick check to see if the given file should
	be indexed by this indexer. Typically, you'll just look at the file
	name and make a decision.
	<div style="float:right; width: 300px; background: #ccffcc; color: black; border: solid 1px black; padding: 10px">
	Indexers shouldn't have to know about potential languages they are embedded in.
	GSF has this information already, via the embedding provider registrations.
	This should be used somehow such that indexers don't have to know everything.
	</div>
	For JavaScript for example, we care about
	<code>.js</code> files, as well as <code>.html</code>, <code>.jsp</code>
	and <code>.erb</code>, since these embedded file types can contain
	JavaScript functions we want in the index.
	In some cases however, the logic can be slightly more complicated.
	In the case of JavaScript again, it's pretty common to have "minimized"
	versions of a file, where symbols have been replaced with single letter
	names, whitespace removed etc. to make the code as small and fast as
	possible. We typically don't want to index these versions, so the
	JavaScript indexer checks the file name and looks for name pattern
	overlaps. If it for example finds <code>foo-debug.js</code>, <code>foo-min.js</code>
	and <code>foo.js</code>, it will return <code>false</code> from <code>isIndexable()</code>
	for both <code>foo-debug.js</code> and <code>foo-min.js</code>, and <code>true</code>
	only for <code>foo.js</code>.

	</p>
	<p>
	<b>IMPORTANT</b>: You may be tempted to base your decision on the mimetype of
	a file, but be careful calling <code>file.getFileObject().getMimeType()</code>.
	First, <code>file.getFileObject()</code> can be a performance issue.
	During startup, your indexer will be asked by the system for all the
	source files in the project if they are indexable. Each <code>getFileObject</code>
	can end up doing a lot of work.<br/><br/>
	A bigger problem is that when source files are deleted, the index needs to
	be cleaned up. The IDE will ask all the indexers if they care about this file,
	and if they do, the index entries will be cleared for this file.
	(TODO: Perhaps I should just unconditionally try to delete the file in all
	indices?)
	In any case, when a file is deleted, its <code>FileObject</code> no longer
	exists, and calling <code>file.getFileObject()</code> will return null.
	You can end up getting an NPE in these cases if you're not careful.
	</p>
	<p>
	The second method you have to implement in your indexer is
	<pre style="background: #ffffcc; color: black; border: solid 1px black; padding: 5px">
	List<IndexDocument> <b>index</b>(ParserResult result, IndexDocumentFactory factory)
	</pre>

	Here, all you have to do is look up your own AST from your <code>ParserResult</code>.
	Then, look through your AST, pick out the information you want to store, and store
	it in the index. You do that by creating one or more documents (by calling
	the <code>IndexDocumentFactory</code> which is passed to you above.
	The <code>IndexDocumentFactory</code> lets you create an <code>IndexDocument</code>.
	And each <code>IndexDocument</code> object has a single method:
	<pre style="background: #ffffcc; color: black; border: solid 1px black; padding: 5px">
	void <b>addPair</b>(String key, String value, boolean searchable);
	</pre>
	So, all you have to do here is call <code>addPair</code> repeatedly, with key,value pairs.
	You also get to decide if your key is searchable. You <b>must</b> be consistent
	in how you call this method with respect to the <code>searchable</code> parameter
	for a given key in a given document.
	</p>

	<h3>Searching</h3>
	<p>
	You can search your index. For the various feature implementations (code completion,
	go to declaration, and so on), you are passed a
	<a href="org/netbeans/modules/gsf/api/CompilationInfo.html">CompilationInfo</a> instance.
	The <code>CompilationInfo</code> holds a lot of vital state: your parser result,
	the file object and document corresponding to the current file, and so on.
	It also gives you access to the
	<a href="org/netbeans/modules/gsf/api/Index.html">Index</a> object.
	The <code>Index</code> class has one method:
	<div style="float:right; width: 300px; background: #ccffcc; color: black; border: solid 1px black; padding: 10px">
	I plan to add a simple <code>SearchContext</code> class which holds some of the
	search parameters, and the search method will change into a simple
	<code>search(SearchContext)</code> method. I also plan to remove the result list
	input parameter and just return it (created by the index) instead.
	</div>
	<pre style="background: #ffffcc; color: black; border: solid 1px black; padding: 5px">
	public abstract void <b>search</b>(
	@NonNull final String <b>key</b>,
	@NonNull final String <b>value</b>,
	@NonNull final NameKind kind,
	@NonNull final Set<SearchScope> scope,
	@NonNull Set<SearchResult> <b>result</b>,
	@NonNull final Set<String> includeKeys);
	</pre>
	The way this works is that you create your own empty set of SearchResults,
	and then you call the search method on the index with a key, a value (which
	for example can be a prefix), and so on. This will fill your SearchResult
	set with matches.
	</p>
	<p>
	You can now iterate over the matching SearchResults (which will correspond to
	<code>IndexDocument</code>s your indexer have created earlier), and for each,
	call one of the getValue methods:
	<pre style="background: #ffffcc; color: black; border: solid 1px black; padding: 5px">
	@NonNull String <b>getValue</b>(@NonNull String <b>key</b>);
	@NonNull String[] <b>getValues</b>(@NonNull String <b>key</b>);
	</pre>
	If you know that you're storing just one value for a given in the index
	(such as the <code>class</code> or <code>fqn</code> keys in the Ruby index),
	you can call the <code>getValue</code> method to get it directly. If you're
	storing many values (such as the <code>method</code> key in the Ruby index),
	call <code>getValues</code> to get an array of all the results.
	</p>
	<h3>Classpath: Searching What?</h3>
	<p>
	In the search section above I glossed over some of the other parameters
	in the <code>search</code> method. In particular, the <code>scope</code>
	parameter, which lets you control whether to search just the current
	project, or just the libraries, or both. But how does GSF know what source
	files are relevant in the project? And what about the libraries?
	</p>
	<p>
	This is controlled by a different API in GSF, which you use to integrate
	into different project types and tell GSF which directories are relevant
	for this project, as well as register libraries. This is described in
	more detail in the <a href="classpath.html">Classpath</a> document.
	</p>

	<h3>Index Abstraction</h3>
	<p>
	I recommend that you build an abstraction on top of the index. Your various
	feature implementations shouldn't go into the index and look for specific keys
	and so forth. That will make changing the index (which I will discuss in the
	next section) very difficult. Instead, you should create a class which
	wraps the index with logical methods. For example, in Ruby, I have a RubyIndex
	class which has dedicated methods like
	<ul>
	<li><code>getClasses()</code></li>
	<li><code>getMethods()</code></li>
	<li><code>getSubclassesOf()</code></li>
	<li><code>getRequires()</code></li>
	<li><code>getTransitiveRequires()</code></li>
	<li><code>getSuperClass()</code></li>
	<li><code>getOverridingMethod()</code></li>
	<li><code>getInheritedMethods()</code></li>
	</ul>
	and so on. This places all the logic of your index schema in two classes -
	your index abstraction and your indexer, and you can more easily change it
	later.
	</p>
	<h3>Making Changes</h3>
	<p>
	During development, including after your first release, you'll probably find
	that you want to make changes to the index, either because of performance
	reasons, or because you want to store more information to support new features.
	However, if you just make the change in your indexer and index query code,
	you might have a really hard time dealing with the fact that your users
	can have existing indices out there with the old data format.
	</p>
	<p>
	Dealing with this is easy. The <code>Indexer</code> class has these two
	methods:
	<pre style="background: #ffffcc; color: black; border: solid 1px black; padding: 5px">
	@NonNull String <b>getIndexerName</b>();
	@NonNull String <b>getIndexVersion</b>();
	</pre>
	Any time you make an incompatible change to the way you are storing data
	in the index, just change the value of your <code>getIndexVersion()</code>
	method. This will force reindexing of all the code whenever your users
	run with your new version of the indexer.
	</p>
	<p>
	The way this works on the implementation side is that each index database
	is isolated, and in particular, they are stored in the user directory,
	under <code>var/cache/gsf-index/<i>your-index-name</i>/<i>your-index-version</i></code>.
	For example, the Ruby indexer names its indexer "ruby", and the JavaScript
	indexer names its indexer "javascript". This ensures that these two systems
	don't interfere which each others databases. The index version is just a string,
	and you can put anything you want here (as long as it's a string that is
	filesystem safe, meaning it can be written on all file systems where NetBeans
	runs). A good convention is to just use a number - e.g. "42" or "6.5.2", etc.
	</p>
	<h3>Unit Testing</h3>
	<p>
	Unit testing your indexer is trivial. Create a unit test class for your
	indexer, and make sure it extends <code>GsfTestBase</code>. Then, just
	create new tests with the following single line call:
	<pre style="background: #ffffcc; color: black; border: solid 1px black; padding: 5px">
	public class RubyIndexerTest extends RubyTestBase {
	public void testIndexData() throws Exception {
	checkIndexer("testfiles/date.rb");
	}

	public void testRails1() throws Exception {
	checkIndexer("testfiles/action_controller.rb");
	}
	...etc
	}
	</pre>
	All you have to do is put the test files you want into <code>test/unit/data</code>,
	refer to them from the <code>checkIndexer</code> call, and the GSF test infrastructure
	will do the rest: it will load the file, parse it using your parser,
	call your indexer, pretty print it, see if a golden file (named the same as
	the input file name, plus the extra extension <code>.indexed</code>) exists,
	and if not create it, or otherwise diff the computed results with the golden
	file. The test fails if the diffs aren't identical.
	</p>
	<p>
	See the <a href="unit-testing.html">Unit Testing</a> document for more information.
	</p>

	<h3>Debugging Query Problems</h3>
	<p>
	Sometimes a feature fails -- for example, code completion doesn't return the
	results you expect. One way to debug this is to use the GSF diagnostics tools.
	The Lucene Index Browser is described in the
	<a href="gsf-tools.html#index-browser">GSF Diagnostics Tools</a> document.
	</p>
	<h3>Ruby Indexed Data</h3>
	<p>
	The following table shows the data currently stored in the Ruby index, which
	will hopefully give some ideas for how a complete indexer/query system
	can work:
	<table border="1" style="background: #ffffcc; border-collapse: collapse; border:solid 1px black">
	<tr>
	<th>Key</th><th>Value</th>
	</tr>
	<tr>
	<td>fqn</td>
	<td>Fully qualified name of a class or module, such as "Test::Unit" for the Unit class.</td>
	</tr>
	<tr>
	<td>class</td>
	<td>The "basename" of a class or module, such as "Unit" for the Test::Unit class.</td>
	</tr>
	<tr>
	<td>class-ig</td>
	<td>A lowercased version of the class name. Used for case insensitive queries. (This shouldn't be necessary).</td>
	</tr>
	<tr>
	<td>extends</td>
	<td>The fully qualified name of a superclass, if any. Used to compute inherited methods by iterating through class documents.</td>
	</tr>
	<tr>
	<td>in</td>
	<td>The name of the module this class is contained in.</td>
	</tr>
	<tr>
	<td>require</td>
	<td>The string required to include this class from other files, e.g. <code>io/nonblock</code> for the nonblock.rb file. Used for require-completion.</td>
	</tr>
	<tr>
	<td>requires</td>
	<td>A list of all the files <i>this</i> file requires. Used to compute file inclusion transitively.</td>
	</tr>
	<tr>
	<td>includes</td>
	<td>A list of modules included (not the same as requires) by this class or module.</td>
	</tr>
	<tr>
	<td>extendWith</td>
	<td>Used to simulate the way classes can dynamically be extended with a given module in Ruby.</td>
	</tr>
	<tr>
	<td>method</td>
	<td>A method in the class. This isn't just a method name; it's a pretty complicated encoding
	of the method name, its parameter list, its (optional) return type, its (optional) parameter
	hints or types, as well as a set of attributes for the method (is documented, is ignored,
	etc.)</td>
	</tr>
	<tr>
	<td>field</td>
	<td>A field in the class</td>
	</tr>
	<tr>
	<td>attribute</td>
	<td>An attribute in the class.</td>
	</tr>
	<tr>
	<td>constant</td>
	<td>A constant in the class</td>
	</tr>
	<tr>
	<td>attrs</td>
	<td>A set of attributes for the class, written as a hex value string</td>
	</tr>
	<tr>
	<td>dbtable</td>
	<td>For active record database table completion, the table name this migration refers to</td>
	</tr>
	<tr>
	<td>dbversion</td>
	<td>The version number of this migration (used to apply the migrations in the correct order)</td>
	</tr>
	<tr>
	<td>dbcolumn</td>
	<td>A column name to be added, removed or renamed (indicated with + or - after the name)</td>
	</tr>
	</table>
	</p>
	<br/>
	<span style="color: #cccccc">Tor Norbye <tor@netbeans.org></span>
	</body>
	</html>