blob: 0dd090ea0c071239ecaf5ee23f669acbd0214b11 [file] [log] [blame]
predictionio Package Documentation
====================================
.. automodule:: predictionio
The SDK comprises of two clients:
1. EventClient, it is for importing data into the PredictionIO platform.
2. EngineClient, it is for querying PredictionIO Engine Instance, submit query
and extract prediction results.
The SDK also provides a FileExporter for you to write events to a JSON file
in the same way as EventClient. The JSON file can be used by "pio import"
for batch data import.
Please read `PredictionIO Event API <http://docs.prediction.io/datacollection/eventapi/>`_ for explanation of
how SDK can be used to import events.
predictionio.EventClient Class
------------------------------
.. autoclass:: EventClient
:members:
.. note::
The "threads" parameter specifies the number of connection threads to
the PredictionIO server. Minimum is 1. The client object will spawn
out the specified number of threads. Each of them will establish a
connection with the PredictionIO server and handle requests
concurrently.
.. note::
If you ONLY use `blocking request methods`,
setting "threads" to 1 is enough (higher number will not improve
anything since every request will be blocking). However, if you want
to take full advantage of
`asynchronous request methods`, you should
specify a larger number for "threads" to increase the performance of
handling concurrent requests (although setting "threads" to 1 will still
work). The optimal setting depends on your system and application
requirement.
predictionio.EngineClient Class
------------------------------
.. autoclass:: EngineClient
:members:
predictionio.AsyncRequest Class
------------------------------
.. autoclass:: AsyncRequest
:members:
predictionio.FileExporter Class
-------------------------------
.. autoclass:: FileExporter
:members:
predictionio SDK Usage Notes
-------------------------
Asynchronous Requests
^^^^^^^^^^^^^^^^^^^^^
In addition to normal `blocking (synchronous) request methods`,
this SDK also provides `non-blocking (asynchronous) request methods`.
All methods
prefixed with 'a' are asynchronous (eg, :meth:`~EventClient.aset_user`,
:meth:`~EventClient.aset_item`). Asynchronous requests are handled by separate
threads in the background, so you can generate multiple requests at the same
time without waiting for any of them to finish. These methods return
immediately without waiting for results, allowing your code to proceed to work
on something else. The concept is to break a normal blocking request (such as
:meth:`~EventClient.set_user`) into two steps:
1. generate the request (e.g., calling :meth:`~EngineClient.asend_query`);
2. get the request's response by calling :meth:`~AsyncRequest.get_response`.
This allows you to do other work between these two steps.
.. note::
In some cases you may not care whether the request is successful for performance or application-specific reasons, then you can simply skip step 2.
.. note::
If you do care about the request status or need to get the return data, then at a later time you will need to call :meth:`~Client.aresp` with the AsyncRequest object returned in step 1.
Please refer to the documentation of :ref:`asynchronous request methods <async-methods-label>` for more details.
For example, the following code first generates an asynchronous request to
retrieve recommendations, then get the result at later time::
>>> # Generates asynchronous request and return an AsyncRequest object
>>> engine_client = EngineClient()
>>> request = engine_client.asend_query(data={"uid": "1", "n" : 3})
>>> <...you can do other things here...>
>>> try:
>>> result = request.get_response() # check the request status and get the return data.
>>> except:
>>> <log the error>
Batch Import Data with EventClient
^^^^^^^^^^^^^^^^^^^^^
When you import large amount of data at once, you may also use asynchronous
request methods to generate lots of requests in the beginning and then check the
status at a later time to minimize run time.
For example, to import 100000 of user records::
>>> # generate 100000 asynchronous requests and store the AsyncRequest objects
>>> event_client = EventClient(access_key=<YOUR_ACCESS_KEY>)
>>> for i in range(100000):
>>> event_client.aset_user(user_record[i].uid)
>>>
>>> <...you can do other things here...>
>>>
>>> # calling close will block until all requests are processed
>>> event_client.close()
Alternatively, you can use blocking requests to import large amount of data, but this has significantly lower performance::
>>> for i in range(100000):
>>> try:
>>> client.set_user(user_record[i].uid)
>>> except:
>>> <log the error>
Batch Import Data with FileExporter and "pio import"
^^^^^^^^^^^^^^^^^^^^^^^
You can use FileExporter to create events and write to a JSON file which can
be used by "pio import". Pleas see `Importing Data in Batch <http://docs.prediction.io/datacollection/batchimport/>`_ for more details.
Note that this method is much faster than batch import with EventClient.