blob: 1c99514cb00b4babfe7533612f5b5b711b267656 [file] [log] [blame]
eZ Component: Search, Design
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
:Author: Derick Rethans
:Revision: $Rev$
:Date: $Date$
.. contents::
Design description
==================
The search component provides an interface to allow for multiple search
backends. For this to work, abstraction on several levels is required. First of
all, the definition of document fields; and secondly the search query syntax.
The logic is very similar to that of PersistentObject, where a mapping is made
between class properties and database fields. For search a mapping is needed
between class properties and search index fields. Finding persistent objects is
done through the Database's component SQL abstraction to allow for multiple SQL
dialects. The Search component requires something as well to allow for
different search query dialects, similarly to what the Database component
provides. Therefore the use of the search component will mostly be modeled
after the design of the Database and PersistentObject components.
Classes
=======
ezcSearchSession
----------------
ezcSearchSession is the main runtime interface for indexing and searching
documents. Documents can be indexed calling index(), and searching for
documents is done through find(). Unlike with the PersistentObject component,
find() does not simply return an array of objects for each of the found
documents. Instead it returns an ezcSearchResult object containing information
about the search result. The find() method accepts as parameter an object of
class ezcSearchFindQuery (or one of it's children). This query object is created by
calling the createFindQuery() method on this class. Besides createFindQuery(),
a method to create a query for deleting indexed documents will be provided too.
The classes representing documents need to implement an interface though that
specifies getState() and setState() - something we forgot for
PersistentObject.
ezcSearchSessionInstance
------------------------
Holds search session instances for global access throughout an application.
ezcSearchDefinitionManager
--------------------------
Loads definition files that describe document types with all their fields. It
depends on the backend on how those definitions are mapped to search engine
specific fields/options.
ezcSearchDocumentDefinition
---------------------------
Describes all the fields of one document type. It is loaded by the
ezcSearchDefinitionManager and used by the backends to both index and find
documents from the search backends. For each document field it stores a
ezcSearchObjectProperty. It also defines a field with which a document
can be uniquely identified, as well as a default search field. In future
versions it could also group fields for easier searching of multiple fields
etc.
ezcSearchHandler
----------------
The base class that all search backends implement. The handlers now how to
communicate to the backends, generate correct search query strings, and how to
present results. Handlers can also accept search-backend specific options. For
the first version only ezcSearchSolrHandler is planned, while later versions
might also have backends for Google, Yahoo! etc. A backend does not have to
implement the index(), createDeleteQuery() and delete() methods, as they are
not available for every handler. Therefore the search handlers can optionally
implement the interface ezcSearchIndexHandler.
ezcSearchSolrHandler
--------------------
An implementation of ezcSearchHandler that communicates with Apache
Lucene/Solr. This will be the reference implementation.
ezcSearchQuery
--------------
Implements a fluent language to query the search index. The methods are all
quite the same as ezcDbQuery. This class is inherited by ezcSearchFindQuery and
ezcSearchDeleteQuery for searching in, or deleting from the search
index.
Data structures
===============
ezcSearchObjectProperty
Defines the name of the document field, its type and a hint for the field
name in the search index.
ezcSearchResult
Provides meta data about the search (time, number of results, etc.) as well
as an array of the found results. Depending on the database backend, the
array of found documents can be of different classes, as the document types
could be different.
Example Usage
=============
::
<?php
$backend = new ezcSearchSolrHandler( 'localhost', 6983 );
$session = new ezcSearchSession(
$backend,
new ezcSearchDefinitionManager( 'path/to/definitions' )
);
// indexing a document
$session->index( $document );
// finding documents where name = Derick
$q = $session->createFindQuery();
$q->find( $q->eq( 'name', "Derick" ) );
$ret = $session->find( $q );
// finding documents where any field contains Derick, from row 10 and 7
// rows long
$q = $session->createFindQuery();
$q->find( $q->eq( '*', "Derick" ) );
$ret = $session->find( $q )->limit( 7, 10 );
// finding documents where text contains Derick and Tiger, only
// having name as returned field, and order by published date.
$q = $session->createFindQuery();
$q->select( 'name' )
->find( $q->and(
$q->eq( 'text', "Derick" ),
$q->eq( 'text', 'Tiger' )
)
)
->orderBy( 'published' );
$ret = $session->find( $q );
// finding documents where text contains Derick or Tiger
$q = $session->createFindQuery();
$q->find( $q->in( 'text', array( 'Derick', 'Tiger' ) ) );
$ret = $session->find( $q );
// finding documents containing 'Ramius' published between 2007-01-01 and
// 2007-12-31
$q = $session->createFindQuery();
$q->find( $q->and(
$q->eq( 'text', 'Ramius' ),
$q->between( 'published',
new DateTime( '2007-01-01' ), // DateTime object
strtotime( "2007-12-31" ) // timestamp
)
)
);
$ret = $session->find( $q );
// finding documents containing 'plane' and putting facets on the
// categories, limiting result set to 8 and facets to 4
$q = $session->createFindQuery();
$q->find( $q->eq( 'description', 'plane' ) )
->limit( 8 )
->facet( 'category' )->limit( 4 );
?>
..
Local Variables:
mode: rst
fill-column: 78
End:
vim: et syn=rst tw=79