blob: 2b860e5176cd87542171d6ff5789b795bdf2e50f [file] [log] [blame]
{
"Egothor.Stemmer.Cell.html": {
"href": "Egothor.Stemmer.Cell.html",
"title": "Class Cell | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Class Cell A Cell is a portion of a Trie . Inheritance System.Object Cell Inherited Members System.Object.Equals(System.Object) System.Object.Equals(System.Object, System.Object) System.Object.GetHashCode() System.Object.GetType() System.Object.MemberwiseClone() System.Object.ReferenceEquals(System.Object, System.Object) Namespace : Egothor.Stemmer Assembly : Lucene.Net.Analysis.Stempel.dll Syntax public class Cell Methods | Improve this Doc View Source ToString() Return a string containing this Cell 's attributes. Declaration public override string ToString() Returns Type Description System.String a string representation of this Cell Overrides System.Object.ToString()"
},
"Egothor.Stemmer.Compile.html": {
"href": "Egothor.Stemmer.Compile.html",
"title": "Class Compile | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Class Compile The Compile class is used to compile a stemmer table. Inheritance System.Object Compile Inherited Members System.Object.Equals(System.Object) System.Object.Equals(System.Object, System.Object) System.Object.GetHashCode() System.Object.GetType() System.Object.MemberwiseClone() System.Object.ReferenceEquals(System.Object, System.Object) System.Object.ToString() Namespace : Egothor.Stemmer Assembly : Lucene.Net.Analysis.Stempel.dll Syntax public class Compile Methods | Improve this Doc View Source Main(String[]) Entry point to the Compile application. This program takes any number of arguments: the first is the name of the desired stemming algorithm to use (a list is available in the package description) , all of the rest should be the path or paths to a file or files containing a stemmer table to compile. Declaration public static void Main(string[] args) Parameters Type Name Description System.String [] args the command line arguments"
},
"Egothor.Stemmer.Diff.html": {
"href": "Egothor.Stemmer.Diff.html",
"title": "Class Diff | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Class Diff The Diff object generates a patch string. A patch string is actually a command to a stemmer telling it how to reduce a word to its root. For example, to reduce the word teacher to its root teach the patch string Db would be generated. This command tells the stemmer to delete the last 2 characters from the word teacher to reach the stem (the patch commands are applied starting from the last character in order to save Inheritance System.Object Diff Inherited Members System.Object.Equals(System.Object) System.Object.Equals(System.Object, System.Object) System.Object.GetHashCode() System.Object.GetType() System.Object.MemberwiseClone() System.Object.ReferenceEquals(System.Object, System.Object) System.Object.ToString() Namespace : Egothor.Stemmer Assembly : Lucene.Net.Analysis.Stempel.dll Syntax public class Diff Constructors | Improve this Doc View Source Diff() Constructor for the Diff object. Declaration public Diff() | Improve this Doc View Source Diff(Int32, Int32, Int32, Int32) Constructor for the Diff object Declaration public Diff(int ins, int del, int rep, int noop) Parameters Type Name Description System.Int32 ins Description of the Parameter System.Int32 del Description of the Parameter System.Int32 rep Description of the Parameter System.Int32 noop Description of the Parameter Methods | Improve this Doc View Source Apply(StringBuilder, String) Apply the given patch string diff to the given string dest Declaration public static void Apply(StringBuilder dest, string diff) Parameters Type Name Description System.Text.StringBuilder dest Destination string System.String diff Patch string | Improve this Doc View Source Exec(String, String) Construct a patch string that transforms a to b. Declaration public string Exec(string a, string b) Parameters Type Name Description System.String a 1st string System.String b 2nd string Returns Type Description System.String"
},
"Egothor.Stemmer.DiffIt.html": {
"href": "Egothor.Stemmer.DiffIt.html",
"title": "Class DiffIt | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Class DiffIt The DiffIt class is a means generate patch commands from an already prepared stemmer table. Inheritance System.Object DiffIt Inherited Members System.Object.Equals(System.Object) System.Object.Equals(System.Object, System.Object) System.Object.GetHashCode() System.Object.GetType() System.Object.MemberwiseClone() System.Object.ReferenceEquals(System.Object, System.Object) System.Object.ToString() Namespace : Egothor.Stemmer Assembly : Lucene.Net.Analysis.Stempel.dll Syntax public class DiffIt Methods | Improve this Doc View Source Main(String[]) Entry point to the DiffIt application. This application takes one argument, the path to a file containing a stemmer table. The program reads the file and generates the patch commands for the stems. Declaration public static void Main(string[] args) Parameters Type Name Description System.String [] args the path to a file containing a stemmer table"
},
"Egothor.Stemmer.Gener.html": {
"href": "Egothor.Stemmer.Gener.html",
"title": "Class Gener | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Class Gener The Gener object helps in the discarding of nodes which break the reduction effort and defend the structure against large reductions. Inheritance System.Object Reduce Gener Inherited Members System.Object.Equals(System.Object) System.Object.Equals(System.Object, System.Object) System.Object.GetHashCode() System.Object.GetType() System.Object.MemberwiseClone() System.Object.ReferenceEquals(System.Object, System.Object) System.Object.ToString() Namespace : Egothor.Stemmer Assembly : Lucene.Net.Analysis.Stempel.dll Syntax public class Gener : Reduce Constructors | Improve this Doc View Source Gener() Constructor for the Gener object. Declaration public Gener() Methods | Improve this Doc View Source Eat(Row, Int32[]) Test whether the given Row of Cells in a Trie should be included in an optimized Trie. Declaration public bool Eat(Row in, int[] remap) Parameters Type Name Description Row in the Row to test System.Int32 [] remap Description of the Parameter Returns Type Description System.Boolean true if the Row should remain; otherwise, false | Improve this Doc View Source Optimize(Trie) Return a Trie with infrequent values occurring in the given Trie removed. Declaration public override Trie Optimize(Trie orig) Parameters Type Name Description Trie orig the Trie to optimize Returns Type Description Trie a new optimized Trie Overrides Reduce.Optimize(Trie)"
},
"Egothor.Stemmer.html": {
"href": "Egothor.Stemmer.html",
"title": "Namespace Egothor.Stemmer | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Namespace Egothor.Stemmer <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> Egothor stemmer API. Classes Cell A Cell is a portion of a Trie . Compile The Compile class is used to compile a stemmer table. Diff The Diff object generates a patch string. A patch string is actually a command to a stemmer telling it how to reduce a word to its root. For example, to reduce the word teacher to its root teach the patch string Db would be generated. This command tells the stemmer to delete the last 2 characters from the word teacher to reach the stem (the patch commands are applied starting from the last character in order to save DiffIt The DiffIt class is a means generate patch commands from an already prepared stemmer table. Gener The Gener object helps in the discarding of nodes which break the reduction effort and defend the structure against large reductions. Lift The Lift class is a data structure that is a variation of a Patricia trie. Lift's raison d'etre is to implement reduction of the trie via the Lift-Up method., which makes the data structure less liable to overstemming. MultiTrie The MultiTrie is a Trie of Trie s. It stores words and their associated patch commands. The MultiTrie handles patch commands individually (each command by itself). MultiTrie2 The MultiTrie is a Trie of Trie s. It stores words and their associated patch commands. The MultiTrie handles patch commands broken into their constituent parts, as a MultiTrie does, but the commands are delimited by the skip command. Optimizer The Optimizer class is a Trie that will be reduced (have empty rows removed). The reduction will be made by joining two rows where the first is a subset of the second. Optimizer2 The Optimizer class is a Trie that will be reduced (have empty rows removed). This is the result of allowing a joining of rows when there is no collision between non- null values in the rows. Information loss, resulting in the stemmer not being able to recognize words (as in Optimizer), is curtailed, allowing the stemmer to recognize words for which the original trie was built. Use of this class allows the stemmer to be self-teaching. Reduce The Reduce object is used to remove gaps in a Trie which stores a dictionary. Row The Row class represents a row in a matrix representation of a Trie . Trie A Trie is used to store a dictionary of words and their stems. Actually, what is stored are words with their respective patch commands. A trie can be termed forward (keys read from left to right) or backward (keys read from right to left). This property will vary depending on the language for which a Trie is constructed."
},
"Egothor.Stemmer.Lift.html": {
"href": "Egothor.Stemmer.Lift.html",
"title": "Class Lift | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Class Lift The Lift class is a data structure that is a variation of a Patricia trie. Lift's raison d'etre is to implement reduction of the trie via the Lift-Up method., which makes the data structure less liable to overstemming. Inheritance System.Object Reduce Lift Inherited Members System.Object.Equals(System.Object) System.Object.Equals(System.Object, System.Object) System.Object.GetHashCode() System.Object.GetType() System.Object.MemberwiseClone() System.Object.ReferenceEquals(System.Object, System.Object) System.Object.ToString() Namespace : Egothor.Stemmer Assembly : Lucene.Net.Analysis.Stempel.dll Syntax public class Lift : Reduce Constructors | Improve this Doc View Source Lift(Boolean) Constructor for the Lift object. Declaration public Lift(bool changeSkip) Parameters Type Name Description System.Boolean changeSkip when set to true , comparison of two Cells takes a skip command into account Methods | Improve this Doc View Source LiftUp(Row, IList<Row>) Reduce the trie using Lift-Up reduction. The Lift-Up reduction propagates all leaf-values (patch commands), where possible, to higher levels which are closer to the root of the trie. Declaration public void LiftUp(Row in, IList<Row> nodes) Parameters Type Name Description Row in the Row to consider when optimizing System.Collections.Generic.IList < Row > nodes contains the patch commands | Improve this Doc View Source Optimize(Trie) Optimize (eliminate rows with no content) the given Trie and return the reduced Trie. Declaration public override Trie Optimize(Trie orig) Parameters Type Name Description Trie orig the Trie to optimized Returns Type Description Trie the reduced Trie Overrides Reduce.Optimize(Trie)"
},
"Egothor.Stemmer.MultiTrie.html": {
"href": "Egothor.Stemmer.MultiTrie.html",
"title": "Class MultiTrie | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Class MultiTrie The MultiTrie is a Trie of Trie s. It stores words and their associated patch commands. The MultiTrie handles patch commands individually (each command by itself). Inheritance System.Object Trie MultiTrie MultiTrie2 Inherited Members Trie.GetAll(String) Trie.GetCells() Trie.GetCellsPnt() Trie.GetCellsVal() System.Object.Equals(System.Object) System.Object.Equals(System.Object, System.Object) System.Object.GetHashCode() System.Object.GetType() System.Object.MemberwiseClone() System.Object.ReferenceEquals(System.Object, System.Object) System.Object.ToString() Namespace : Egothor.Stemmer Assembly : Lucene.Net.Analysis.Stempel.dll Syntax public class MultiTrie : Trie Constructors | Improve this Doc View Source MultiTrie(IDataInput) Constructor for the MultiTrie object. Declaration public MultiTrie(IDataInput is) Parameters Type Name Description J2N.IO.IDataInput is the input stream Exceptions Type Condition System.IO.IOException if an I/O error occurs | Improve this Doc View Source MultiTrie(Boolean) Constructor for the MultiTrie object Declaration public MultiTrie(bool forward) Parameters Type Name Description System.Boolean forward set to true if the elements should be read left to right Fields | Improve this Doc View Source m_tries Declaration protected List<Trie> m_tries Field Value Type Description System.Collections.Generic.List < Trie > Methods | Improve this Doc View Source Add(String, String) Add an element to this structure consisting of the given key and patch command. This method will return without executing if the cmd parameter's length is 0. Declaration public override void Add(string key, string cmd) Parameters Type Name Description System.String key the key System.String cmd the patch command Overrides Trie.Add(String, String) | Improve this Doc View Source GetFully(String) Return the element that is stored in a cell associated with the given key. Declaration public override string GetFully(string key) Parameters Type Name Description System.String key the key to the cell holding the desired element Returns Type Description System.String the element Overrides Trie.GetFully(String) | Improve this Doc View Source GetLastOnPath(String) Return the element that is stored as last on a path belonging to the given key. Declaration public override string GetLastOnPath(string key) Parameters Type Name Description System.String key the key associated with the desired element Returns Type Description System.String the element that is stored as last on a path Overrides Trie.GetLastOnPath(String) | Improve this Doc View Source PrintInfo(TextWriter, String) Print the given prefix and the position(s) in the Trie where it appears. Declaration public override void PrintInfo(TextWriter out, string prefix) Parameters Type Name Description System.IO.TextWriter out System.String prefix the desired prefix Overrides Trie.PrintInfo(TextWriter, String) | Improve this Doc View Source Reduce(Reduce) Remove empty rows from the given Trie and return the newly reduced Trie . Declaration public override Trie Reduce(Reduce by) Parameters Type Name Description Reduce by the Trie to reduce Returns Type Description Trie the newly reduced Trie Overrides Trie.Reduce(Reduce) | Improve this Doc View Source Store(IDataOutput) Write this data structure to the given output stream. Declaration public override void Store(IDataOutput os) Parameters Type Name Description J2N.IO.IDataOutput os the output stream Overrides Trie.Store(IDataOutput) Exceptions Type Condition System.IO.IOException if an I/O error occurs"
},
"Egothor.Stemmer.MultiTrie2.html": {
"href": "Egothor.Stemmer.MultiTrie2.html",
"title": "Class MultiTrie2 | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Class MultiTrie2 The MultiTrie is a Trie of Trie s. It stores words and their associated patch commands. The MultiTrie handles patch commands broken into their constituent parts, as a MultiTrie does, but the commands are delimited by the skip command. Inheritance System.Object Trie MultiTrie MultiTrie2 Inherited Members MultiTrie.m_tries MultiTrie.PrintInfo(TextWriter, String) Trie.GetAll(String) Trie.GetCells() Trie.GetCellsPnt() Trie.GetCellsVal() System.Object.Equals(System.Object) System.Object.Equals(System.Object, System.Object) System.Object.GetHashCode() System.Object.GetType() System.Object.MemberwiseClone() System.Object.ReferenceEquals(System.Object, System.Object) System.Object.ToString() Namespace : Egothor.Stemmer Assembly : Lucene.Net.Analysis.Stempel.dll Syntax public class MultiTrie2 : MultiTrie Constructors | Improve this Doc View Source MultiTrie2(IDataInput) Constructor for the MultiTrie object. Declaration public MultiTrie2(IDataInput is) Parameters Type Name Description J2N.IO.IDataInput is the input stream Exceptions Type Condition System.IO.IOException if an I/O error occurs | Improve this Doc View Source MultiTrie2(Boolean) Constructor for the MultiTrie2 object Declaration public MultiTrie2(bool forward) Parameters Type Name Description System.Boolean forward set to true if the elements should be read left to right Methods | Improve this Doc View Source Add(String, String) Add an element to this structure consisting of the given key and patch command. This method will return without executing if the cmd parameter's length is 0. Declaration public override void Add(string key, string cmd) Parameters Type Name Description System.String key the key System.String cmd the patch command Overrides MultiTrie.Add(String, String) | Improve this Doc View Source Decompose(String) Break the given patch command into its constituent pieces. The pieces are delimited by NOOP commands. Declaration public virtual string[] Decompose(string cmd) Parameters Type Name Description System.String cmd the patch command Returns Type Description System.String [] an array containing the pieces of the command | Improve this Doc View Source GetFully(String) Return the element that is stored in a cell associated with the given key. Declaration public override string GetFully(string key) Parameters Type Name Description System.String key the key to the cell holding the desired element Returns Type Description System.String the element Overrides MultiTrie.GetFully(String) | Improve this Doc View Source GetLastOnPath(String) Return the element that is stored as last on a path belonging to the given key. Declaration public override string GetLastOnPath(string key) Parameters Type Name Description System.String key the key associated with the desired element Returns Type Description System.String the element that is stored as last on a path Overrides MultiTrie.GetLastOnPath(String) | Improve this Doc View Source Reduce(Reduce) Remove empty rows from the given Trie and return the newly reduced Trie. Declaration public override Trie Reduce(Reduce by) Parameters Type Name Description Reduce by the Trie to reduce Returns Type Description Trie the newly reduced Trie Overrides MultiTrie.Reduce(Reduce) | Improve this Doc View Source Store(IDataOutput) Write this data structure to the given output stream. Declaration public override void Store(IDataOutput os) Parameters Type Name Description J2N.IO.IDataOutput os the output stream Overrides MultiTrie.Store(IDataOutput) Exceptions Type Condition System.IO.IOException if an I/O error occurs"
},
"Egothor.Stemmer.Optimizer.html": {
"href": "Egothor.Stemmer.Optimizer.html",
"title": "Class Optimizer | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Class Optimizer The Optimizer class is a Trie that will be reduced (have empty rows removed). The reduction will be made by joining two rows where the first is a subset of the second. Inheritance System.Object Reduce Optimizer Optimizer2 Inherited Members System.Object.Equals(System.Object) System.Object.Equals(System.Object, System.Object) System.Object.GetHashCode() System.Object.GetType() System.Object.MemberwiseClone() System.Object.ReferenceEquals(System.Object, System.Object) System.Object.ToString() Namespace : Egothor.Stemmer Assembly : Lucene.Net.Analysis.Stempel.dll Syntax public class Optimizer : Reduce Constructors | Improve this Doc View Source Optimizer() Constructor for the Optimizer object. Declaration public Optimizer() Methods | Improve this Doc View Source Merge(Cell, Cell) Merge the given Cell s and return the resulting Cell . Declaration public virtual Cell Merge(Cell m, Cell e) Parameters Type Name Description Cell m the master Cell Cell e the existing Cell Returns Type Description Cell the resulting Cell , or null if the operation cannot be realized | Improve this Doc View Source Merge(Row, Row) Merge the given rows and return the resulting Row . Declaration public Row Merge(Row master, Row existing) Parameters Type Name Description Row master the master Row Row existing the existing Row Returns Type Description Row the resulting Row , or null if the operation cannot be realized | Improve this Doc View Source Optimize(Trie) Optimize (remove empty rows) from the given Trie and return the resulting Trie. Declaration public override Trie Optimize(Trie orig) Parameters Type Name Description Trie orig the Trie to consolidate Returns Type Description Trie the newly consolidated Trie Overrides Reduce.Optimize(Trie)"
},
"Egothor.Stemmer.Optimizer2.html": {
"href": "Egothor.Stemmer.Optimizer2.html",
"title": "Class Optimizer2 | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Class Optimizer2 The Optimizer class is a Trie that will be reduced (have empty rows removed). This is the result of allowing a joining of rows when there is no collision between non- null values in the rows. Information loss, resulting in the stemmer not being able to recognize words (as in Optimizer), is curtailed, allowing the stemmer to recognize words for which the original trie was built. Use of this class allows the stemmer to be self-teaching. Inheritance System.Object Reduce Optimizer Optimizer2 Inherited Members Optimizer.Optimize(Trie) Optimizer.Merge(Row, Row) System.Object.Equals(System.Object) System.Object.Equals(System.Object, System.Object) System.Object.GetHashCode() System.Object.GetType() System.Object.MemberwiseClone() System.Object.ReferenceEquals(System.Object, System.Object) System.Object.ToString() Namespace : Egothor.Stemmer Assembly : Lucene.Net.Analysis.Stempel.dll Syntax public class Optimizer2 : Optimizer Constructors | Improve this Doc View Source Optimizer2() Constructor for the Optimizer2 object. Declaration public Optimizer2() Methods | Improve this Doc View Source Merge(Cell, Cell) Merge the given Cell s and return the resulting Cell . Declaration public override Cell Merge(Cell m, Cell e) Parameters Type Name Description Cell m the master Cell Cell e the existing Cell Returns Type Description Cell the resulting Cell , or null if the operation cannot be realized Overrides Optimizer.Merge(Cell, Cell)"
},
"Egothor.Stemmer.Reduce.html": {
"href": "Egothor.Stemmer.Reduce.html",
"title": "Class Reduce | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Class Reduce The Reduce object is used to remove gaps in a Trie which stores a dictionary. Inheritance System.Object Reduce Gener Lift Optimizer Inherited Members System.Object.Equals(System.Object) System.Object.Equals(System.Object, System.Object) System.Object.GetHashCode() System.Object.GetType() System.Object.MemberwiseClone() System.Object.ReferenceEquals(System.Object, System.Object) System.Object.ToString() Namespace : Egothor.Stemmer Assembly : Lucene.Net.Analysis.Stempel.dll Syntax public class Reduce Constructors | Improve this Doc View Source Reduce() Constructor for the Reduce object. Declaration public Reduce() Methods | Improve this Doc View Source Optimize(Trie) Optimize (remove holes in the rows) the given Trie and return the restructured Trie . Declaration public virtual Trie Optimize(Trie orig) Parameters Type Name Description Trie orig the Trie to optimize Returns Type Description Trie the restructured Trie"
},
"Egothor.Stemmer.Row.html": {
"href": "Egothor.Stemmer.Row.html",
"title": "Class Row | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Class Row The Row class represents a row in a matrix representation of a Trie . Inheritance System.Object Row Inherited Members System.Object.Equals(System.Object) System.Object.Equals(System.Object, System.Object) System.Object.GetHashCode() System.Object.GetType() System.Object.MemberwiseClone() System.Object.ReferenceEquals(System.Object, System.Object) System.Object.ToString() Namespace : Egothor.Stemmer Assembly : Lucene.Net.Analysis.Stempel.dll Syntax public class Row Constructors | Improve this Doc View Source Row() The default constructor for the Row object. Declaration public Row() | Improve this Doc View Source Row(Row) Construct a Row using the cells of the given Row . Declaration public Row(Row old) Parameters Type Name Description Row old the Row to copy | Improve this Doc View Source Row(IDataInput) Construct a Row object from input carried in via the given input stream. Declaration public Row(IDataInput is) Parameters Type Name Description J2N.IO.IDataInput is the input stream Exceptions Type Condition System.IO.IOException if an I/O error occurs Methods | Improve this Doc View Source GetCells() Return the number of cells in use. Declaration public int GetCells() Returns Type Description System.Int32 the number of cells in use | Improve this Doc View Source GetCellsPnt() Return the number of references (how many transitions) to other rows. Declaration public int GetCellsPnt() Returns Type Description System.Int32 the number of references | Improve this Doc View Source GetCellsVal() Return the number of patch commands saved in this Row. Declaration public int GetCellsVal() Returns Type Description System.Int32 the number of patch commands | Improve this Doc View Source GetCmd(Char) Return the command in the Cell associated with the given System.Char . Declaration public int GetCmd(char way) Parameters Type Name Description System.Char way the System.Char associated with the Cell holding the desired command Returns Type Description System.Int32 the command | Improve this Doc View Source GetCnt(Char) Return the number of patch commands were in the Cell associated with the given System.Char before the Trie containing this Row was reduced. Declaration public int GetCnt(char way) Parameters Type Name Description System.Char way the System.Char associated with the desired Cell Returns Type Description System.Int32 the number of patch commands before reduction | Improve this Doc View Source GetRef(Char) Return the reference to the next Row in the Cell associated with the given System.Char . Declaration public int GetRef(char way) Parameters Type Name Description System.Char way the System.Char associated with the desired Cell Returns Type Description System.Int32 the reference, or -1 if the Cell is null | Improve this Doc View Source Print(TextWriter) Write the contents of this Row to the System.IO.TextWriter . Declaration public virtual void Print(TextWriter out) Parameters Type Name Description System.IO.TextWriter out | Improve this Doc View Source SetCmd(Char, Int32) Set the command in the Cell of the given System.Char to the given System.Int32 . Declaration public void SetCmd(char way, int cmd) Parameters Type Name Description System.Char way the System.Char defining the Cell System.Int32 cmd the new command | Improve this Doc View Source SetRef(Char, Int32) Set the reference to the next row in the Cell of the given System.Char to the given System.Int32 . Declaration public void SetRef(char way, int ref) Parameters Type Name Description System.Char way the System.Char defining the Cell System.Int32 ref The new ref value | Improve this Doc View Source Store(IDataOutput) Write the contents of this Row to the given output stream. Declaration public virtual void Store(IDataOutput os) Parameters Type Name Description J2N.IO.IDataOutput os the output stream Exceptions Type Condition System.IO.IOException if an I/O error occurs | Improve this Doc View Source UniformCmd(Boolean) Return the number of identical Cell s (containing patch commands) in this Row. Declaration public int UniformCmd(bool eqSkip) Parameters Type Name Description System.Boolean eqSkip when set to false the removed patch commands are considered Returns Type Description System.Int32 the number of identical Cell s, or -1 if there are (at least) two different Cell s"
},
"Egothor.Stemmer.Trie.html": {
"href": "Egothor.Stemmer.Trie.html",
"title": "Class Trie | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Class Trie A Trie is used to store a dictionary of words and their stems. Actually, what is stored are words with their respective patch commands. A trie can be termed forward (keys read from left to right) or backward (keys read from right to left). This property will vary depending on the language for which a Trie is constructed. Inheritance System.Object Trie MultiTrie Inherited Members System.Object.Equals(System.Object) System.Object.Equals(System.Object, System.Object) System.Object.GetHashCode() System.Object.GetType() System.Object.MemberwiseClone() System.Object.ReferenceEquals(System.Object, System.Object) System.Object.ToString() Namespace : Egothor.Stemmer Assembly : Lucene.Net.Analysis.Stempel.dll Syntax public class Trie Constructors | Improve this Doc View Source Trie(IDataInput) Constructor for the Trie object. Declaration public Trie(IDataInput is) Parameters Type Name Description J2N.IO.IDataInput is the input stream Exceptions Type Condition System.IO.IOException if an I/O error occurs | Improve this Doc View Source Trie(Boolean) Constructor for the Trie object. Declaration public Trie(bool forward) Parameters Type Name Description System.Boolean forward set to true | Improve this Doc View Source Trie(Boolean, Int32, IList<String>, IList<Row>) Constructor for the Trie object. Declaration public Trie(bool forward, int root, IList<string> cmds, IList<Row> rows) Parameters Type Name Description System.Boolean forward true if read left to right, false if read right to left System.Int32 root index of the row that is the root node System.Collections.Generic.IList < System.String > cmds the patch commands to store System.Collections.Generic.IList < Row > rows a Vector of Vectors. Each inner Vector is a node of this Trie Methods | Improve this Doc View Source Add(String, String) Add the given key associated with the given patch command. If either parameter is null this method will return without executing. Declaration public virtual void Add(string key, string cmd) Parameters Type Name Description System.String key the key System.String cmd the patch command | Improve this Doc View Source GetAll(String) Gets the all attribute of the Trie object Declaration public virtual string[] GetAll(string key) Parameters Type Name Description System.String key Description of the Parameter Returns Type Description System.String [] The all value | Improve this Doc View Source GetCells() Return the number of cells in this Trie object. Declaration public virtual int GetCells() Returns Type Description System.Int32 the number of cells | Improve this Doc View Source GetCellsPnt() Gets the cellsPnt attribute of the Trie object Declaration public virtual int GetCellsPnt() Returns Type Description System.Int32 The cellsPnt value | Improve this Doc View Source GetCellsVal() Gets the cellsVal attribute of the Trie object Declaration public virtual int GetCellsVal() Returns Type Description System.Int32 The cellsVal value | Improve this Doc View Source GetFully(String) Return the element that is stored in a cell associated with the given key. Declaration public virtual string GetFully(string key) Parameters Type Name Description System.String key the key Returns Type Description System.String the associated element | Improve this Doc View Source GetLastOnPath(String) Return the element that is stored as last on a path associated with the given key. Declaration public virtual string GetLastOnPath(string key) Parameters Type Name Description System.String key the key associated with the desired element Returns Type Description System.String the last on path element | Improve this Doc View Source PrintInfo(TextWriter, String) writes debugging info to the printstream Declaration public virtual void PrintInfo(TextWriter out, string prefix) Parameters Type Name Description System.IO.TextWriter out System.String prefix | Improve this Doc View Source Reduce(Reduce) Remove empty rows from the given Trie and return the newly reduced Trie . Declaration public virtual Trie Reduce(Reduce by) Parameters Type Name Description Reduce by the Trie to reduce Returns Type Description Trie newly reduced Trie | Improve this Doc View Source Store(IDataOutput) Write this Trie to the given output stream. Declaration public virtual void Store(IDataOutput os) Parameters Type Name Description J2N.IO.IDataOutput os the output stream Exceptions Type Condition System.IO.IOException if an I/O error occurs"
},
"Lucene.Net.Analysis.Pl.html": {
"href": "Lucene.Net.Analysis.Pl.html",
"title": "Namespace Lucene.Net.Analysis.Pl | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Namespace Lucene.Net.Analysis.Pl <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> Analyzer for Polish. Classes PolishAnalyzer Lucene.Net.Analysis.Analyzer for Polish."
},
"Lucene.Net.Analysis.Pl.PolishAnalyzer.html": {
"href": "Lucene.Net.Analysis.Pl.PolishAnalyzer.html",
"title": "Class PolishAnalyzer | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Class PolishAnalyzer Lucene.Net.Analysis.Analyzer for Polish. Inheritance System.Object Lucene.Net.Analysis.Analyzer Lucene.Net.Analysis.Util.StopwordAnalyzerBase PolishAnalyzer Implements System.IDisposable Inherited Members Lucene.Net.Analysis.Util.StopwordAnalyzerBase.m_stopwords Lucene.Net.Analysis.Util.StopwordAnalyzerBase.m_matchVersion Lucene.Net.Analysis.Util.StopwordAnalyzerBase.StopwordSet Lucene.Net.Analysis.Util.StopwordAnalyzerBase.LoadStopwordSet(System.Boolean, System.Type, System.String, System.String) Lucene.Net.Analysis.Util.StopwordAnalyzerBase.LoadStopwordSet(System.IO.FileInfo, Lucene.Net.Util.LuceneVersion) Lucene.Net.Analysis.Util.StopwordAnalyzerBase.LoadStopwordSet(System.IO.TextReader, Lucene.Net.Util.LuceneVersion) Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>) Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, ReuseStrategy) Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>) Analyzer.NewAnonymous(Func<String, TextReader, TokenStreamComponents>, Func<String, TextReader, TextReader>, ReuseStrategy) Analyzer.GetTokenStream(String, TextReader) Analyzer.GetTokenStream(String, String) Analyzer.InitReader(String, TextReader) Analyzer.GetPositionIncrementGap(String) Analyzer.GetOffsetGap(String) Lucene.Net.Analysis.Analyzer.Strategy Lucene.Net.Analysis.Analyzer.Dispose() Analyzer.Dispose(Boolean) Lucene.Net.Analysis.Analyzer.GLOBAL_REUSE_STRATEGY Lucene.Net.Analysis.Analyzer.PER_FIELD_REUSE_STRATEGY System.Object.Equals(System.Object) System.Object.Equals(System.Object, System.Object) System.Object.GetHashCode() System.Object.GetType() System.Object.MemberwiseClone() System.Object.ReferenceEquals(System.Object, System.Object) System.Object.ToString() Namespace : Lucene.Net.Analysis.Pl Assembly : Lucene.Net.Analysis.Stempel.dll Syntax public sealed class PolishAnalyzer : StopwordAnalyzerBase, IDisposable Constructors | Improve this Doc View Source PolishAnalyzer(LuceneVersion) Builds an analyzer with the default stop words: DEFAULT_STOPWORD_FILE . Declaration public PolishAnalyzer(LuceneVersion matchVersion) Parameters Type Name Description Lucene.Net.Util.LuceneVersion matchVersion lucene compatibility version | Improve this Doc View Source PolishAnalyzer(LuceneVersion, CharArraySet) Builds an analyzer with the given stop words. Declaration public PolishAnalyzer(LuceneVersion matchVersion, CharArraySet stopwords) Parameters Type Name Description Lucene.Net.Util.LuceneVersion matchVersion lucene compatibility version Lucene.Net.Analysis.Util.CharArraySet stopwords a stopword set | Improve this Doc View Source PolishAnalyzer(LuceneVersion, CharArraySet, CharArraySet) Builds an analyzer with the given stop words. If a non-empty stem exclusion set is provided this analyzer will add a Lucene.Net.Analysis.Miscellaneous.SetKeywordMarkerFilter before stemming. Declaration public PolishAnalyzer(LuceneVersion matchVersion, CharArraySet stopwords, CharArraySet stemExclusionSet) Parameters Type Name Description Lucene.Net.Util.LuceneVersion matchVersion lucene compatibility version Lucene.Net.Analysis.Util.CharArraySet stopwords a stopword set Lucene.Net.Analysis.Util.CharArraySet stemExclusionSet a set of terms not to be stemmed Fields | Improve this Doc View Source DEFAULT_STEMMER_FILE File containing default Polish stemmer table. Declaration public static readonly string DEFAULT_STEMMER_FILE Field Value Type Description System.String | Improve this Doc View Source DEFAULT_STOPWORD_FILE File containing default Polish stopwords. Declaration public static readonly string DEFAULT_STOPWORD_FILE Field Value Type Description System.String Properties | Improve this Doc View Source DefaultStopSet Returns an unmodifiable instance of the default stop words set. Declaration public static CharArraySet DefaultStopSet { get; } Property Value Type Description Lucene.Net.Analysis.Util.CharArraySet default stop words set. | Improve this Doc View Source DefaultTable Returns an unmodifiable instance of the default stemmer table. Declaration public static Trie DefaultTable { get; } Property Value Type Description Trie Methods | Improve this Doc View Source CreateComponents(String, TextReader) Creates a Lucene.Net.Analysis.TokenStreamComponents which tokenizes all the text in the provided System.IO.TextReader . Declaration protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader) Parameters Type Name Description System.String fieldName System.IO.TextReader reader Returns Type Description Lucene.Net.Analysis.TokenStreamComponents A Lucene.Net.Analysis.TokenStreamComponents built from an Lucene.Net.Analysis.Standard.StandardTokenizer filtered with Lucene.Net.Analysis.Standard.StandardFilter , Lucene.Net.Analysis.Core.LowerCaseFilter , Lucene.Net.Analysis.Core.StopFilter , Lucene.Net.Analysis.Miscellaneous.SetKeywordMarkerFilter if a stem excusion set is provided and StempelFilter . Overrides Analyzer.CreateComponents(String, TextReader) Implements System.IDisposable"
},
"Lucene.Net.Analysis.Stempel.html": {
"href": "Lucene.Net.Analysis.Stempel.html",
"title": "Namespace Lucene.Net.Analysis.Stempel | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Namespace Lucene.Net.Analysis.Stempel <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> Stempel - Algorithmic Stemmer for Polish Language Introduction A method for conflation of different inflected word forms is an important component of many Information Retrieval systems. It helps to improve the system's recall and can significantly reduce the index size. This is especially true for highly-inflectional languages like those from the Slavic language family (Czech, Slovak, Polish, Russian, Bulgarian, etc). This page describes a software package consisting of high-quality stemming tables for Polish, and a universal algorithmic stemmer, which operates using these tables. The stemmer code is taken virtually unchanged from the Egothor project . The software distribution includes stemmer tables prepared using an extensive corpus of Polish language (see details below). This work is available under Apache-style Open Source license - the stemmer code is covered by Egothor License, the tables and other additions are covered by Apache License 2.0. Both licenses allow to use the code in Open Source as well as commercial (closed source) projects. Terminology A short explanation is in order about the terminology used in this text. In the following sections I make a distinction between stem and lemma . Lemma is a base grammatical form (dictionary form, headword) of a word. Lemma is an existing, grammatically correct word in some human language. Stem on the other hand is just a unique token, not necessarily making any sense in any human language, but which can serve as a unique label instead of lemma for the same set of inflected forms. Quite often stem is referred to as a \"root\" of the word - which is incorrect and misleading (stems sometimes have very little to do with the linguistic root of a word, i.e. a pattern found in a word which is common to all inflected forms or within a family of languages). For an IR system stems are usually sufficient, for a morphological analysis system obviously lemmas are a must. In practice, various stemmers produce a mix of stems and lemmas, as is the case with the stemmer described here. Additionally, for some languages, which use suffix-based inflection rules many stemmers based on suffix-stripping will produce a large percentage of stems equivalent to lemmas. This is however not the case for languages with complex, irregular inflection rules (such as Slavic languages) - here simplistic suffix-stripping stemmers produce very poor results. Background Lemmatization is a process of finding the base, non-inflected form of a word. The result of lemmatization is a correct existing word, often in nominative case for nouns and infinitive form for verbs. A given inflected form may correspond to several lemmas (e.g. \"found\" -> find, found) - the correct choice depends on the context. Stemming is concerned mostly with finding a unique \"root\" of a word, which not necessarily results in any existing word or lemma. The quality of stemming is measured by the rate of collisions (overstemming - which causes words with different lemmas to be incorrectly conflated into one \"root\"), and the rate of superfluous word \"roots\" (understemming - which assigns several \"roots\" to words with the same lemma). Both stemmer and lemmatizer can be implemented in various ways. The two most common approaches are: dictionary-based: where the stemmer uses an extensive dictionary of morphological forms in order to find the corresponding stem or lemma algorithmic: where the stemmer uses an algorithm, based on general morphological properties of a given language plus a set of heuristic rules There are many existing and well-known implementations of stemmers for English (Porter, Lovins, Krovetz) and other European languages ( Snowball ). There are also good quality commercial lemmatizers for Polish. However, there is only one freely available Polish stemmer, implemented by Dawid Weiss , based on the \"ispell\" dictionary and Jan Daciuk's FSA package . That stemmer is dictionary-based. This means that even though it can achieve perfect accuracy for previously known word forms found in its dictionary, it completely fails in case of all other word forms. This deficiency is somewhat mitigated by the comprehensive dictionary distributed with this stemmer (so there is a high probability that most of the words in the input text will be found in the dictionary), however the problem still remains (please see the page above for more detailed description). The implementation described here uses an algorithmic method. This method and particular algorithm implementation are described in detail in [1][2]. The main advantage of algorithmic stemmers is their ability to process previously unseen word forms with high accuracy. This particular algorithm uses a set of transformation rules (patch commands), which describe how a word with a given pattern should be transformed to its stem. These rules are first learned from a training corpus. They don't cover all possible cases, so there is always some loss of precision/recall (which means that even the words from the training corpus are sometimes incorrectly stemmed). Algorithm and implementation The algorithm and its Java implementation is described in detail in the publications cited below. Here's just a short excerpt from [2]: \"The aim is separation of the stemmer execution code from the data structures [...]. In other words, a static algorithm configurable by data must be developed. The word transformations that happen in the stemmer must be then encoded to the data tables. The tacit input of our method is a sample set (a so-called dictionary) of words (as keys) and their stems. Each record can be equivalently stored as a key and the record of key's transformation to its respective stem. The transformation record is termed a patch command (P-command). It must be ensured that P-commands are universal, and that P-commands can transform any word to its stem. Our solution[6,8] is based on the Levenstein metric [10], which produces P-command as the minimum cost path in a directed graph. One can imagine the P-command as an algorithm for an operator (editor) that rewrites a string to another string. The operator can use these instructions (PP-command's): removal - deletes a sequence of characters starting at the current cursor position and moves the cursor to the next character. The length of this sequence is the parameter; insertion - inserts a character ch, without moving the cursor. The character ch is a parameter; substitution - rewrites a character at the current cursor position to the character ch and moves the cursor to the next character. The character ch is a parameter; no operation (NOOP) - skip a sequence of characters starting at the current cursor position. The length of this sequence is the parameter. The P-commands are applied from the end of a word (right to left). This assumption can reduce the set of P-command's, because the last NOOP, moving the cursor to the end of a string without any changes, need not be stored.\" Data structure used to keep the dictionary (words and their P-commands) is a trie. Several optimization steps are applied in turn to reduce and optimize the initial trie, by eliminating useless information and shortening the paths in the trie. Finally, in order to obtain a stem from the input word, the word is passed once through a matching path in the trie (applying at each node the P-commands stored there). The result is a word stem. Corpus (to be completed...) The following Polish corpora have been used: Polish dictionary from ispell distribution Wzbogacony korpus słownika frekwencyjnego The Bible (so called \"Tysiąclecia\") - unauthorized electronic version Analizator morfologiczny SAM v. 3.4 - this was used to recover lemmas missing from other texts This step was the most time-consuming - and it would probably be even more tedious and difficult if not for the help of Python . The source texts had to be brought to a common encoding (UTF-8) - some of them used quite ancient encodings like Mazovia or DHN - and then scripts were written to collect all lemmas and inflected forms from the source texts. In cases when the source text was not tagged, I used the SAM analyzer to produce lemmas. In cases of ambiguous lemmatization I decided to put references to inflected forms from all base forms. All grammatical categories were allowed to appear in the corpus, i.e. nouns, verbs, adjectives, numerals, and pronouns. The resulting corpus consisted of roughly 87,000+ inflection sets, i.e. each set consisted of one base form (lemma) and many inflected forms. However, because of the nature of the training method I restricted these sets to include only those where there were at least 4 inflected forms. Sets with 3 or less inflected forms were removed, so that the final corpus consisted of ~69,000 unique sets, which in turn contained ~1.5 mln inflected forms. Testing I tested the stemmer tables produced using the implementation described above. The following sections give some details about the testing setup. Testing procedure The testing procedure was as follows: the whole corpus of ~69,000 unique sets was shuffled, so that the input sets were in random order. the corpus was split into two parts - one with 30,000 sets (Part 1), the other with ~39,000 sets (Part 2). Training samples were drawn in sequential order from the Part 1. Since the sets were already randomized, the training samples were also randomized, but this procedure ensured that each larger training sample contained all smaller samples. Part 2 was used for testing. Note: this means that the testing run used only words previously unseen during the training phase. This is the worst scenario, because it means that stemmer must extrapolate the learned rules to unknown cases. This also means that in a real-life case (where the input is a mix between known and unknown words) the F-measure of the stemmer will be even higher than in the table below. Test results The following table summarizes test results for varying sizes of training samples. The meaning of the table columns is described below: training sets: the number of training sets. One set consists of one lemma and at least 4 and up to ~80 inflected forms (including pre- and suffixed forms). testing forms: the number of testing forms. Only inflected forms were used in testing. stem OK: the number of cases when produced output was a correct (unique) stem. Note: quite often correct stems were also correct lemmas. lemma OK: the number of cases when produced output was a correct lemma. missing: the number of cases when stemmer was unable to provide any output. stem bad: the number of cases when produced output was a stem, but already in use identifying a different set. lemma bad: the number of cases when produced output was an incorrect lemma. Note: quite often in such case the output was a correct stem. table size: the size in bytes of the stemmer table. Training sets Testing forms Stem OK Lemma OK Missing Stem Bad Lemma Bad Table size [B] 100 1022985 842209 593632 172711 22331 256642 28438 200 1022985 862789 646488 153288 16306 223209 48660 500 1022985 885786 685009 130772 14856 207204 108798 700 1022985 909031 704609 107084 15442 211292 139291 1000 1022985 926079 725720 90117 14941 207148 183677 2000 1022985 942886 746641 73429 14903 202915 313516 5000 1022985 954721 759930 61476 14817 201579 640969 7000 1022985 956165 764033 60364 14620 198588 839347 10000 1022985 965427 775507 50797 14662 196681 1144537 12000 1022985 967664 782143 48722 14284 192120 1313508 15000 1022985 973188 788867 43247 14349 190871 1567902 17000 1022985 974203 791804 42319 14333 188862 1733957 20000 1022985 976234 791554 40058 14601 191373 1977615 I also measured the time to produce a stem (which involves traversing a trie, retrieving a patch command and applying the patch command to the input string). On a machine running Windows XP (Pentium 4, 1.7 GHz, JDK 1.4.2_03 HotSpot), for tables ranging in size from 1,000 to 20,000 cells, the time to produce a single stem varies between 5-10 microseconds. This means that the stemmer can process up to 200,000 words per second , an outstanding result when compared to other stemmers (Morfeusz - ~2,000 w/s, FormAN (MS Word analyzer) - ~1,000 w/s). The package contains a class org.getopt.stempel.Benchmark , which you can use to produce reports like the one below: --------- Stemmer benchmark report: ----------- Stemmer table: /res/tables/stemmer_2000.out Input file: ../test3.txt Number of runs: 3 RUN NUMBER: 1 2 3 Total input words 1378176 1378176 1378176 Missed output words 112 112 112 Time elapsed [ms] 6989 6940 6640 Hit rate percent 99.99% 99.99% 99.99% Miss rate percent 00.01% 00.01% 00.01% Words per second 197192 198584 207557 Time per word [us] 5.07 5.04 4.82 Summary The results of these tests are very encouraging. It seems that using the training corpus and the stemming algorithm described above results in a high-quality stemmer useful for most applications. Moreover, it can also be used as a better than average lemmatizer. Both the author of the implementation (Leo Galambos, <leo.galambos AT egothor DOT org>) and the author of this compilation (Andrzej Bialecki ) would appreciate any feedback and suggestions for further improvements. Bibliography Galambos, L.: Multilingual Stemmer in Web Environment, PhD Thesis, Faculty of Mathematics and Physics, Charles University in Prague, in press. Galambos, L.: Semi-automatic Stemmer Evaluation. International Intelligent Information Processing and Web Mining Conference, 2004, Zakopane, Poland. Galambos, L.: Lemmatizer for Document Information Retrieval Systems in JAVA. <http://www.informatik.uni-trier.de/%7Eley/db/conf/sofsem/sofsem2001.html#Galambos01> SOFSEM 2001, Piestany, Slovakia. Classes StempelFilter Transforms the token stream as per the stemming algorithm. Note: the input to the stemming filter must already be in lower case, so you will need to use Lucene.Net.Analysis.Core.LowerCaseFilter or Lucene.Net.Analysis.Core.LowerCaseTokenizer farther down the Tokenizer chain in order for this to work properly! StempelPolishStemFilterFactory Factory for StempelFilter using a Polish stemming table. StempelStemmer Stemmer class is a convenient facade for other stemmer-related classes. The core stemming algorithm and its implementation is taken verbatim from the Egothor project ( www.egothor.org ). Even though the stemmer tables supplied in the distribution package are built for Polish language, there is nothing language-specific here."
},
"Lucene.Net.Analysis.Stempel.StempelFilter.html": {
"href": "Lucene.Net.Analysis.Stempel.StempelFilter.html",
"title": "Class StempelFilter | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Class StempelFilter Transforms the token stream as per the stemming algorithm. Note: the input to the stemming filter must already be in lower case, so you will need to use Lucene.Net.Analysis.Core.LowerCaseFilter or Lucene.Net.Analysis.Core.LowerCaseTokenizer farther down the Tokenizer chain in order for this to work properly! Inheritance System.Object Lucene.Net.Util.AttributeSource Lucene.Net.Analysis.TokenStream Lucene.Net.Analysis.TokenFilter StempelFilter Implements System.IDisposable Inherited Members Lucene.Net.Analysis.TokenFilter.m_input Lucene.Net.Analysis.TokenFilter.End() TokenFilter.Dispose(Boolean) Lucene.Net.Analysis.TokenFilter.Reset() Lucene.Net.Analysis.TokenStream.Dispose() Lucene.Net.Util.AttributeSource.GetAttributeFactory() Lucene.Net.Util.AttributeSource.GetAttributeClassesEnumerator() Lucene.Net.Util.AttributeSource.GetAttributeImplsEnumerator() Lucene.Net.Util.AttributeSource.AddAttributeImpl(Lucene.Net.Util.Attribute) Lucene.Net.Util.AttributeSource.AddAttribute<T>() Lucene.Net.Util.AttributeSource.HasAttributes Lucene.Net.Util.AttributeSource.HasAttribute<T>() Lucene.Net.Util.AttributeSource.GetAttribute<T>() Lucene.Net.Util.AttributeSource.ClearAttributes() Lucene.Net.Util.AttributeSource.CaptureState() Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State) Lucene.Net.Util.AttributeSource.GetHashCode() AttributeSource.Equals(Object) AttributeSource.ReflectAsString(Boolean) Lucene.Net.Util.AttributeSource.ReflectWith(Lucene.Net.Util.IAttributeReflector) Lucene.Net.Util.AttributeSource.CloneAttributes() Lucene.Net.Util.AttributeSource.CopyTo(Lucene.Net.Util.AttributeSource) Lucene.Net.Util.AttributeSource.ToString() System.Object.Equals(System.Object, System.Object) System.Object.GetType() System.Object.MemberwiseClone() System.Object.ReferenceEquals(System.Object, System.Object) Namespace : Lucene.Net.Analysis.Stempel Assembly : Lucene.Net.Analysis.Stempel.dll Syntax public sealed class StempelFilter : TokenFilter, IDisposable Constructors | Improve this Doc View Source StempelFilter(TokenStream, StempelStemmer) Create filter using the supplied stemming table. Declaration public StempelFilter(TokenStream in, StempelStemmer stemmer) Parameters Type Name Description Lucene.Net.Analysis.TokenStream in input token stream StempelStemmer stemmer stemmer | Improve this Doc View Source StempelFilter(TokenStream, StempelStemmer, Int32) Create filter using the supplied stemming table. Declaration public StempelFilter(TokenStream in, StempelStemmer stemmer, int minLength) Parameters Type Name Description Lucene.Net.Analysis.TokenStream in input token stream StempelStemmer stemmer stemmer System.Int32 minLength For performance reasons words shorter than minLength characters are not processed, but simply returned. Fields | Improve this Doc View Source DEFAULT_MIN_LENGTH Minimum length of input words to be processed. Shorter words are returned unchanged. Declaration public static readonly int DEFAULT_MIN_LENGTH Field Value Type Description System.Int32 Methods | Improve this Doc View Source IncrementToken() Returns the next input Token , after being stemmed Declaration public override bool IncrementToken() Returns Type Description System.Boolean Overrides Lucene.Net.Analysis.TokenStream.IncrementToken() Implements System.IDisposable"
},
"Lucene.Net.Analysis.Stempel.StempelPolishStemFilterFactory.html": {
"href": "Lucene.Net.Analysis.Stempel.StempelPolishStemFilterFactory.html",
"title": "Class StempelPolishStemFilterFactory | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Class StempelPolishStemFilterFactory Factory for StempelFilter using a Polish stemming table. Inheritance System.Object Lucene.Net.Analysis.Util.AbstractAnalysisFactory Lucene.Net.Analysis.Util.TokenFilterFactory StempelPolishStemFilterFactory Inherited Members Lucene.Net.Analysis.Util.TokenFilterFactory.ForName(System.String, System.Collections.Generic.IDictionary<System.String, System.String>) Lucene.Net.Analysis.Util.TokenFilterFactory.LookupClass(System.String) Lucene.Net.Analysis.Util.TokenFilterFactory.AvailableTokenFilters Lucene.Net.Analysis.Util.TokenFilterFactory.ReloadTokenFilters() Lucene.Net.Analysis.Util.AbstractAnalysisFactory.LUCENE_MATCH_VERSION_PARAM Lucene.Net.Analysis.Util.AbstractAnalysisFactory.m_luceneMatchVersion Lucene.Net.Analysis.Util.AbstractAnalysisFactory.OriginalArgs Lucene.Net.Analysis.Util.AbstractAnalysisFactory.AssureMatchVersion() Lucene.Net.Analysis.Util.AbstractAnalysisFactory.LuceneMatchVersion Lucene.Net.Analysis.Util.AbstractAnalysisFactory.Require(System.Collections.Generic.IDictionary<System.String, System.String>, System.String) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.Require(System.Collections.Generic.IDictionary<System.String, System.String>, System.String, System.Collections.Generic.ICollection<System.String>) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.Require(System.Collections.Generic.IDictionary<System.String, System.String>, System.String, System.Collections.Generic.ICollection<System.String>, System.Boolean) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.Get(System.Collections.Generic.IDictionary<System.String, System.String>, System.String, System.String) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.Get(System.Collections.Generic.IDictionary<System.String, System.String>, System.String, System.Collections.Generic.ICollection<System.String>) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.Get(System.Collections.Generic.IDictionary<System.String, System.String>, System.String, System.Collections.Generic.ICollection<System.String>, System.String) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.Get(System.Collections.Generic.IDictionary<System.String, System.String>, System.String, System.Collections.Generic.ICollection<System.String>, System.String, System.Boolean) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.RequireInt32(System.Collections.Generic.IDictionary<System.String, System.String>, System.String) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.GetInt32(System.Collections.Generic.IDictionary<System.String, System.String>, System.String, System.Int32) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.RequireBoolean(System.Collections.Generic.IDictionary<System.String, System.String>, System.String) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.GetBoolean(System.Collections.Generic.IDictionary<System.String, System.String>, System.String, System.Boolean) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.RequireSingle(System.Collections.Generic.IDictionary<System.String, System.String>, System.String) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.GetSingle(System.Collections.Generic.IDictionary<System.String, System.String>, System.String, System.Single) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.RequireChar(System.Collections.Generic.IDictionary<System.String, System.String>, System.String) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.GetChar(System.Collections.Generic.IDictionary<System.String, System.String>, System.String, System.Char) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.GetSet(System.Collections.Generic.IDictionary<System.String, System.String>, System.String) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.GetPattern(System.Collections.Generic.IDictionary<System.String, System.String>, System.String) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.GetCulture(System.Collections.Generic.IDictionary<System.String, System.String>, System.String, System.Globalization.CultureInfo) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.GetWordSet(Lucene.Net.Analysis.Util.IResourceLoader, System.String, System.Boolean) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.GetLines(Lucene.Net.Analysis.Util.IResourceLoader, System.String) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.GetSnowballWordSet(Lucene.Net.Analysis.Util.IResourceLoader, System.String, System.Boolean) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.SplitFileNames(System.String) Lucene.Net.Analysis.Util.AbstractAnalysisFactory.GetClassArg() Lucene.Net.Analysis.Util.AbstractAnalysisFactory.IsExplicitLuceneMatchVersion System.Object.Equals(System.Object) System.Object.Equals(System.Object, System.Object) System.Object.GetHashCode() System.Object.GetType() System.Object.MemberwiseClone() System.Object.ReferenceEquals(System.Object, System.Object) System.Object.ToString() Namespace : Lucene.Net.Analysis.Stempel Assembly : Lucene.Net.Analysis.Stempel.dll Syntax public class StempelPolishStemFilterFactory : TokenFilterFactory Constructors | Improve this Doc View Source StempelPolishStemFilterFactory(IDictionary<String, String>) Creates a new StempelPolishStemFilterFactory Declaration public StempelPolishStemFilterFactory(IDictionary<string, string> args) Parameters Type Name Description System.Collections.Generic.IDictionary < System.String , System.String > args Methods | Improve this Doc View Source Create(TokenStream) Declaration public override TokenStream Create(TokenStream input) Parameters Type Name Description Lucene.Net.Analysis.TokenStream input Returns Type Description Lucene.Net.Analysis.TokenStream Overrides Lucene.Net.Analysis.Util.TokenFilterFactory.Create(Lucene.Net.Analysis.TokenStream)"
},
"Lucene.Net.Analysis.Stempel.StempelStemmer.html": {
"href": "Lucene.Net.Analysis.Stempel.StempelStemmer.html",
"title": "Class StempelStemmer | Apache Lucene.NET 4.8.0-beta00010 Documentation",
"keywords": "Class StempelStemmer Stemmer class is a convenient facade for other stemmer-related classes. The core stemming algorithm and its implementation is taken verbatim from the Egothor project ( www.egothor.org ). Even though the stemmer tables supplied in the distribution package are built for Polish language, there is nothing language-specific here. Inheritance System.Object StempelStemmer Inherited Members System.Object.Equals(System.Object) System.Object.Equals(System.Object, System.Object) System.Object.GetHashCode() System.Object.GetType() System.Object.MemberwiseClone() System.Object.ReferenceEquals(System.Object, System.Object) System.Object.ToString() Namespace : Lucene.Net.Analysis.Stempel Assembly : Lucene.Net.Analysis.Stempel.dll Syntax public class StempelStemmer Constructors | Improve this Doc View Source StempelStemmer(Trie) Create a Stemmer using pre-loaded stemmer table Declaration public StempelStemmer(Trie stemmer) Parameters Type Name Description Trie stemmer pre-loaded stemmer table | Improve this Doc View Source StempelStemmer(Stream) Create a Stemmer using selected stemmer table Declaration public StempelStemmer(Stream stemmerTable) Parameters Type Name Description System.IO.Stream stemmerTable stemmer table. Methods | Improve this Doc View Source Load(Stream) Load a stemmer table from an inputstream. Declaration public static Trie Load(Stream stemmerTable) Parameters Type Name Description System.IO.Stream stemmerTable Returns Type Description Trie | Improve this Doc View Source Stem(String) Stem a word. Declaration public StringBuilder Stem(string word) Parameters Type Name Description System.String word input word to be stemmed. Returns Type Description System.Text.StringBuilder stemmed word, or null if the stem could not be generated."
}
}