blob: 2f719a4d832b42fcb18326d8f4d5577d8fc357fa [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE]><![endif]-->
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>Introduction | Apache Lucene.NET 4.8.0 </title>
<meta name="viewport" content="width=device-width">
<meta name="title" content="Introduction | Apache Lucene.NET 4.8.0 ">
<meta name="generator" content="docfx 2.58.0.0">
<link rel="shortcut icon" href="../logo/favicon.ico">
<link rel="stylesheet" href="../styles/docfx.vendor.css">
<link rel="stylesheet" href="../styles/docfx.css">
<link rel="stylesheet" href="../styles/main.css">
<meta property="docfx:navrel" content="../toc.html">
<meta property="docfx:tocrel" content="toc.html">
<link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:400,700%7CMerriweather%7CRoboto+Mono">
<link rel="stylesheet" href="/styles/site.css">
</head>
<body data-spy="scroll" data-target="#affix" data-offset="120">
<span id="forkongithub"><a href="https://github.com/apache/lucenenet" target="_blank">Fork me on GitHub</a></span>
<div id="wrapper">
<header>
<nav id="autocollapse" class="navbar ng-scope" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="../index.html">
<img id="logo" class="svg" src="../logo/lucene-net-color.png" alt="">
</a>
</div>
<div class="collapse navbar-collapse" id="navbar">
<form class="navbar-form navbar-right" role="search" id="search">
<div class="form-group">
<input type="text" class="form-control" id="search-query" placeholder="Search" autocomplete="off">
</div>
</form>
</div>
</div>
</nav>
<div class="subnav navbar navbar-default">
<div class="container hide-when-search" id="breadcrumb">
<ul class="breadcrumb">
<li></li>
</ul>
</div>
</div>
</header>
<div role="main" class="container body-content hide-when-search">
<div class="sidenav hide-when-search">
<a class="btn toc-toggle collapse" data-toggle="collapse" href="#sidetoggle" aria-expanded="false" aria-controls="sidetoggle">Show / Hide Table of Contents</a>
<div class="sidetoggle collapse" id="sidetoggle">
<div id="sidetoc"></div>
</div>
</div>
<div class="article row grid-right">
<div class="col-md-10">
<article class="content wrap" id="_content" data-uid="quick-start/introduction">
<h1 id="introduction">Introduction</h1>
<hr>
<h2 id="background">Background</h2>
<p>Apache Lucene.NET is a C# port of Java based Apache Lucene. Apache Lucene has a huge following and is used directly or indirectly to power search by many companies you probably know including Amazon, Twitter, LinkedIn, Netflix, Salesforce, SAS, and Microsoft Power BI.</p>
<p>Apache Lucene is the core search library used by popular open source search servers like Apache Solr, ElasticSearch and OpenSearch. The reason Apache Lucene is so widely used is because it's extremely powerful and can index large amounts of data quickly -- think 100s of GB/Hours. And it can perform full text search on that data in sub-second time. And unlike traditional sql databases, it's data engine is optimized for full text search.</p>
<p>The codebase for Apache Lucene is very mature. In March 2020, the open source project celebrated it's 20th birthday. You can scroll through the years and see the major <a href="https://www.elastic.co/celebrating-lucene">Apache Lucene milestones</a>.</p>
<p>Apache Lucene.NET 4.8 is an open source project who's aim is to be a line by line c# port of java based Apache Lucene 4.8. This port makes the power of Lucene available to all .NET developers. And makes it easy for them to contribute to the project or customize it since it's pure C#.</p>
<p>Currently Lucene.NET 4.8 is in Beta but it is extremely stable and many developers already use it in production. It has far more features then Lucene.NET 3.03 and has much better unit test coverage then the older version. Lucene.NET has more than 7800+ passing unit tests. This test coverage is what makes Lucene.NET so stable.</p>
<h2 id="evolution-of-lucene">Evolution of Lucene</h2>
<p>Porting Lucene from java to C# is a huge undertaking. There are over <a href="https://lucenenet.apache.org/images/contributing/source/lucenenet-repo-lines-of-code--jan-2022.png">644K lines of code</a> not counting outside dependencies. This is why only a few specific versions have been ported. The prior Lucene.NET release was version 3.0.3 and the current release (which receives all the focus) is Lucene.NET 4.8. Version 4.8 is now in late Beta and, as I already mentioned, is used in production by many developers.</p>
<p>You might be aware that Java Lucene is at version 9.x. But don't be misled by the number. The step up in features between 3.x and 4.x was the biggest in Lucene's history and after that it was followed my many smaller releases. <strong>So the reality is that Lucene.NET 4.8 contains the vast majority of features found in Java Lucene 9.x and in fact Lucene 4.x is more similar to Lucene 9.x than to Lucene 3.x.</strong> If you'd like to dive deeper into this topic, <a href="https://www.giftoasis.com/blog/lucene-net/lucene-net--4-8--vs--java-lucene--9-x">Lucene.NET 4.8 vs Java Lucene 9.x</a> is an community written article that covers it in more detail.</p>
<h2 id="lucenenet-is-multi-platform">Lucene.NET is Multi-Platform</h2>
<p>Lucene.NET 4.8 runs everywhere .NET runs: Windows, Unix or Mac. And as a library it can be used to power search in desktop applications, websites, mobile apps (iOS or Android) or even on IoT devices like the Raspberry Pi. And because it's licensed under the permissive Apache 2.0 license it's typically considered suitable for both commercial and non-commercial use.</p>
<h2 id="lucenes-lsm-inspired-architecture">Lucene's LSM Inspired Architecture</h2>
<p>At this early stage of your journey it's probably good to cover a few things about how Lucene stores data. We are just going to hit the highlights here because it's a deep topic.</p>
<p>Lucene and hence Lucene.NET stores data in immutable &quot;segments.&quot; Segments are made of multiple files. Segments automatically get merged together to form new bigger segments and then the old segments are typically deleted by the merge process. This approach is based on what is called a Log Structured Merge (LSM) design.</p>
<p>LSM has become the defacto standard for NoSql databases and is used not only by Lucene but also by Google BigTable, Apache Hbase, Apache Cassandra and many others. The details of each implementation vary as does the number and types of files used. So let's take a look at what those files might look like for a Lucene.NET index.</p>
<p>Here is an example of Lucene.NET's files for a brand new index with one segment:</p>
<p><img src="https://lucenenet.apache.org/images/quick-start/introduction/one-segment-example.gif" alt="Example files for single segment"></p>
<p>Here is a two segment example that has gone through merges many times:</p>
<p><img src="https://lucenenet.apache.org/images/quick-start/introduction/two-segment-example.gif" alt="Example files for two segments"></p>
<h2 id="important-lucene-concepts">Important Lucene Concepts</h2>
<h3 id="documents-and-fields">Documents and Fields</h3>
<p>Lucene stores document and documents are comprised of fields. The fields can be a variety of types like Text, string or Int32.</p>
<p>Documents may have as many fields as you like. There is no concept of schema and documents don't need to all have the same fields. When searching you can search any field and it will only return documents which have that field and where the data in that field matches the specified search criteria.</p>
<h3 id="writing-and-reading-documents">Writing and Reading Documents</h3>
<p>Documents are written via an <code>IndexWriter</code> and read via an <code>IndexReader</code>. Although in practice we often use an <code>IndexSearcher</code> (which wraps an <code>IndexReader</code>) for searching and reading documents.</p>
<h3 id="lucene-directories">Lucene Directories</h3>
<p>We already mentioned that the data is stored in segments. Those segments can be stored via different classes that inherit from <code>Lucene.Net.Store.Directory</code>. Some of those classes, like <code>FSDirectory</code> store to your local file system, other can store elsewhere. For example a <code>RAMDirectory</code> can be useful for unit tests as it stores the segments in RAM. So one of the things that we must provide an <code>IndexWriter</code> is a instance of a <code>Lucene.Net.Store.Directory</code> that is the type of directory we want to work with.</p>
<h3 id="how-the-pieces-fit-together">How the Pieces Fit Together</h3>
<p>The diagram below provides a birds eye view of how the various parts of the system work together. It makes it easier to conceptualize the parts of Lucene that we have been talking about. It might be good idea to keep this diagram in mind when you work through the <a class="xref" href="tutorial.html">tutorial examples</a> and review the code provided there.</p>
<div class="diagram">
<p><img src="https://lucenenet.apache.org/images/quick-start/introduction/lucene-high-level-diagram.svg" alt="Lucene High Level Diagram"></p>
</div>
<p>* In the diagram above &quot;Files&quot; is in quotes because if you are using a <code>RAMDirectory</code>, say for testing, then there will be no physical file, but rather their representation will be in memory only.</p>
<h3 id="wrapping-up">Wrapping Up</h3>
<p>I hope this introduction has helped you to understand a bit about Lucene.NET. The information we have covered so far should give you a bit of a foundation to work from as you work through the <a class="xref" href="tutorial.html">tutorial examples</a> and then dig deaper into the <a class="xref" href="../docs.html">Lucene.NET Documentation</a> and <a class="xref" href="learning-resources.html">Learning Resources</a>.</p>
</article>
</div>
<div class="hidden-sm col-md-2" role="complementary">
<div class="sideaffix">
<div class="contribution">
<ul class="nav">
<li>
<a href="https://github.com/apache/lucenenet/blob/master/websites/site/quick-start/introduction.md/#L1" class="contribution-link">Improve this Doc</a>
</li>
</ul>
</div>
<nav class="bs-docs-sidebar hidden-print hidden-xs hidden-sm affix" id="affix">
<h5>In This Article</h5>
<div></div>
</nav>
</div>
</div>
</div>
</div>
<footer>
<div class="grad-bottom"></div>
<div class="footer">
<div class="container">
<span class="pull-right">
<a href="#top">Back to top</a>
</span>
Copyright &copy; 2022 The Apache Software Foundation, Licensed under the <a href='http://www.apache.org/licenses/LICENSE-2.0' target='_blank'>Apache License, Version 2.0</a><br> <small>Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation. <br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</small>
</div>
</div>
</footer>
</div>
<script type="text/javascript" src="../styles/docfx.vendor.js"></script>
<script type="text/javascript" src="../styles/docfx.js"></script>
<script type="text/javascript" src="../styles/main.js"></script>
</body>
</html>