blob: a1bb80ee7db0a66f2b5671d6c7d6e0edbdf02284 [file] [log] [blame]
========================
Creating Segment Objects
========================
The creation of ``Segment`` objects can be done through the use of five functions: ``create_segment``, ``generate_segments``,
``generate_fixed_time_segments``, ``generate_collapsing_window_segments``, and ``detect_deadspace`` within Distill's
Segmentation package. Each function creates ``Segment`` objects and returns them in the form of a ``Segments`` object.
These functions fall under the categories of basic ``Segment`` creation, automatic ``Segment`` generation, and detecting
deadspace described below.
UserALE Log Preprocessing
-------------------------
Before ``Segment`` objects can be created, the UserALE logs must be put in a format expected by the ``Segment`` creation
functions. Each function expects logs to be structured in a dictionary sorted by ``clientTime``. The keys of the dictionary
are universally unique identifiers (UUIDs) for each log and the value for each UUID key is the log itself. Distill provides analysts with a
function that will generate these IDs: ``get_UUID``. This function can be used as follows:
.. code:: python
# A UserALE log
log
# Generate UUID
uuid = distill.getUUID(log)
Throughout the rest of this documentation UUID and UID will be used interchangeably to describe these unique identifiers.
In addition, note that the functions within Distill's Segmentation package expect the UserALE log ``clientTime`` field to either be
represented as an integer or a python ``datetime`` object. This is another preprocessing step that must be taken before
beginning to use Segmentation functions.
Basic Segment Creation
----------------------
The most literal way to create ``Segment`` objects is through the use of the ``create_segment`` function. This function
takes in three parameters in order to create ``Segment`` objects: a sorted dictionary of UserAle logs, a list of segment
names, and a list of tuples that represent the start ``clientTime`` and end ``clientTime`` of the segment. Given this
information, ``Segment`` objects can be created as follows:
.. code:: python
# Sorted dictionary of UserALE logs
sorted_dict
# List of segment names
segment_names = ["segment1", "segment2"]
# Time tuples
start_end_vals = [(start_time_1, end_time_1), (start_time_2, end_time_2)]
# Create Segments
segments = distill.create_segment(sorted_dict, segment_names, start_end_vals)
The above code will output a ``Segments`` object that contains each ``Segment`` object indicated.
Automatic Segment Generation
----------------------------
If an analyst does not know the start and end times of interest within the UserALE logs, ``Segment`` generation functions
provide a more automatic way to create ``Segment`` objects. There are three functions that aid in the automatic creation
of ``Segment`` objects: ``generate_segments``, ``generate_fixed_time_segments``, and ``generate_collapsing_window_segments``.
Each of these functions provide an optional parameter entitled ``label`` that denotes a prefix to use for the naming of
each generated ``Segment`` object.
Generate Segments
*****************
The ``generate_segments`` function is an automatic way to create ``Segment`` objects and is based off of the matching of
a particular UserALE log field with a list of possible values. The function will then generate ``Segment`` objects based
on windows of time starting before and after the matched field, indicated in seconds as a function parameter. The below
code illustrates the basic use of this function:
.. code:: python
# Sorted dictionary of UserALE logs
sorted_dict
# Generate segment objects based on user clicks
segments = distill.generate_segments(sorted_dict, 'type', ['click'], 1, 2)
The above code will return a ``Segments`` object that contains ``Segment`` objects that represent windows of time 1 second
prior to a 'click' type and 2 seconds after a 'click' type. If we wanted to generate ``Segment`` objects that matched both
'click' and 'load' types, then we could use the following code:
.. code:: python
# Sorted dictionary of UserALE logs
sorted_dict
# Generate segment objects based on user clicks and loads
segments = distill.generate_segments(sorted_dict, 'type', ['click', 'load'], 1, 2)
Note that ``generate_segments`` does not overlap ``Segment`` objects. In the event that two matching events happen back-to-back
within the logs and the second log is already in the ``Segment`` generated by the first, the second log will not have its
own ``Segment`` created. This non-overlapping behavior also may create ``Segment`` objects that are shorter in time than
expected. For instance, if a ``Segment`` is created with an end time that is after the start time of a new ``Segment``,
the new ``Segment`` object's start time will default to the end time of the previous ``Segment``.
Generate Fixed Time Segments
****************************
The ``generate_fixed_time_segments`` function generates ``Segment`` objects based on fixed time intervals. An example
usage of this function is shown below:
.. code:: python
# Sorted dictionary of UserALE logs
sorted_dict
# Generate segment objects based on 5 second intervals
segments = distill.generate_fixed_time_segments(sorted_dict, 5, label="generated")
The above code will create a ``Segments`` object that contains ``Segment`` objects created based off of 5 second intervals.
This example also demonstrates the usage of the optional ``label`` parameter.
Note that by default this function will not trim additional logs that do not fit into a fixed time window if the time between
the start of the first log and end of the last log are not divisible by the indicated interval. To avoid this, ``generate_fixed_time_segments``
also has an optional argument entitled ``trim``. If true, ``trim`` will trim the logs that do not fit into an additional
fixed time window.
Generate Collapsing Window Segments
***********************************
The ``generate_collapsing_windows_segments`` function generates ``Segment`` objects based on a window of time in which the
given field name has a value matching one of the values indicated by the ``field_values_of_interest`` list parameter.
An example usage of this function is shown below:
.. code:: python
# Sorted dictionary of UserALE logs
sorted_dict
# Generate segment objects based on a collapsing window
segments = distill.generate_collapsing_window_segments(sorted_dict, "path", ["Window"])
The above function creates a ``Segments`` object that contains ``Segment`` objects that begin when the path field has the
string "Window" and ends when the path field no longer contains "Window."
Detecting Deadspace
-------------------
The final ``Segment`` creation function involves the automatic detecting of deadspace within the sorted UserALE log dictionary.
Deadspace is time in which the user is idle. The ``detect_deadspace`` function creates ``Segment`` objects based on deadspace
in the logs given a threshold for what is considered to be 'deadspace'. An example of this is shown below:
.. code:: python
# Sorted dictionary of UserALE logs
sorted_dict
# Create segment objects based on detected deadspace
segments = distill.detect_deadspace(sorted_dict, 20, 1, 2)
The above code will output a ``Segments`` object holding ``Segment`` objects that represent deadspace. In this case, we
have defined 'deadspace' to be any idle time of 20 seconds. Each time deadspace is detected, the logs that occurred 1
second before and 2 seconds after that idle time are recorded in the ``Segment``. Note that the optional ``label`` parameter
is also available for the ``detect_deadspace`` function.