blob: 814f6b26f184edc3db74276c3dae14e9426af310 [file] [log] [blame]
---
title: Events Modeling
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
This section explains how to model your application data as events.
**Entity**: it's the real world object involved in the events. The entity may perform the events, or interact with other entity (which became `targetEntity` in an event).
For example, your application may have users and some items which the user can interact with. Then you can model them as two entity types: **user** and **item** and the entityId can uniquely identify the entity within each entityType (e.g. user with ID 1, item with ID 1).
An entity may peform some events (e.g user 1 does something), and entity may have properties associated with it (e.g. user may have gender, age, email etc). Hence, **events** involve **entities** and there are three types of events, respectively:
1. Generic events performed by an entity.
2. Special events for recording changes of an entity's properties
3. Batch events
They are explained in details below.
## 1. Generic events performed by an entity
Whenever the entity performs an action, you can describe such event as `entity "verb" targetEntity with "some extra information"`. The *"targetEntity"* and *"some extra information"* can be optional. The *"verb"* can be used as the name of the *"event"*. The *"some extra information"* can be recorded as `properties` of the event.
The following are some simple examples:
* user-1 signs-up
```json
{
"event" : "sign-up",
"entityType" : "user",
"entityId" : "1"
}
```
* user-1 views item-1 *(with targetEntity)*
```json
{
"event" : "view",
"entityType" : "user",
"entityId" : "1",
"targetEntityType" : "item",
"targetEntityId" : "1"
}
```
* user-1 rates item-1 with rating of 4 stars *(with targetEntity and properties)*
```json
{
"event" : "rate",
"entityType" : "user",
"entityId" : "1",
"targetEntityType" : "item",
"targetEntityId" : "1",
"properties" : {
"rating" : 4
}
}
```
## 2. Special events for recording changes of an entity's properties
The generic events described above are used to record general actions performed by the entity. However, an entity may have properties (or attributes) associated with it. Morever, the properties of the entity may change over time (for example, user may have new address, item may have new categories). In order to record such changes of an entity's properties. Special events `$set` , `$unset` and `$delete` are introduced.
The following special events are reserved for updating entities and their properties:
- `"$set"` event: Set properties of an entity (also implicitly create the entity). To change properties of entity, you simply set the corresponding properties with value again. The `$set` events should be created only when:
* The entity is *first* created (or re-create after `$delete` event), or
* Set the entity's existing or new properties to new values (For example, user updates his email, user adds a phone number, item has a updated categories)
- `"$unset"` event: Unset properties of an entity. It means treating the specified properties as not existing anymore. Note that the field `properties` cannot be empty for `$unset` event.
- `"$delete"` event: delete the entity.
There is no `targetEntityId` for these special events.
For example, setting entity `user-1`'s properties `birthday` and `address`:
```json
{
"event" : "$set",
"entityType" : "user",
"entityId" : "1",
"properties" : {
"birthday" : "1984-10-11",
"address" : "1234 Street, San Francisco, CA 94107"
}
}
```
**Note** that the properties values of the entity will be aggregated based on these special events and the eventTime. The state of the entity is different depending on the time you are looking at the data. In engine's DataSource, you can use [PEventStore.aggregateProperties() API](https://predictionio.apache.org/api/current/#org.apache.predictionio.data.store.PEventStore$) to retrieve the state of entity's properties (based on time).
NOTE: Although it doesn't hurt to import duplicated special events for an entity (exactly same properties) into event server (it just means that the entity changes to the same state as before and new duplicated event provides no new information about the user), it could waste storage space.
To demonstrate the concept of these special events, we are going to import a sequence of events and see how it affects the retrieved entitiy's properties.
Assuming you have created the App (named "MyTestApp") for testing and Event Server is started.
#### Event 1
For example, on `2014-09-09T...`, a user with ID "2" is newly added in your application. Also, this user has properties a = 3 and b = 4. To record such event, we can create a `$set` event for the user.
for convenience, assign the ACCESS_KEY of your test app to the shell variable `ACCESS_KEY` and run following curl command to import the event:
```bash
$ ACCESS_KEY="<YOUR_ACCESS_KEY>"
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
"event" : "$set",
"entityType" : "user",
"entityId" : "2",
"properties" : {
"a" : 3,
"b" : 4
},
"eventTime" : "2014-09-09T16:17:42.937-08:00"
}'
```
You should see something like the following, meaning the events are imported successfully.
```
HTTP/1.1 201 Created
Server: spray-can/1.3.2
Date: Tue, 02 Jun 2015 23:13:58 GMT
Content-Type: application/json; charset=UTF-8
Content-Length: 57
{"eventId":"PVjOIP6AJ5PgsiGQW6pgswAAAUhc7EwZpCfSj5bS5yg"}
```
After this eventTime, user-2 is created and has properties of a = 3 and b = 4.
#### Event 2
Then, on `2014-09-10T...`, let's say the user has updated the properties b = 5 and c = 6. To record such propertiy change, create another `$set` event. Run the following command:
```bash
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
"event" : "$set",
"entityType" : "user",
"entityId" : "2",
"properties" : {
"b" : 5,
"c" : 6
},
"eventTime" : "2014-09-10T13:12:04.937-08:00"
}'
```
After this eventTime, user-2 has properties of a = 3, b = 5 and c = 6. Note that property `b` is updated with latest value.
#### Event 3
Then, let's say on `2014-09-11T...`, the user's properties 'b' is removed for some reasons. To record such event, create `$unset` event for user-2 with properties b:
```bash
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
"event" : "$unset",
"entityType" : "user",
"entityId" : "2",
"properties" : {
"b" : null
},
"eventTime" : "2014-09-11T14:17:42.456-08:00"
}'
```
After this eventTime, user-2 has properties of a = 3, and c = 6. Note that property `b` is removed.
#### Event 4
Then, on `2014-09-12T...`, the user is removed from the application data. To record such event, create `$delete` event:
```bash
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
"event" : "$delete",
"entityType" : "user",
"entityId" : "2",
"eventTime" : "2014-09-12T16:13:41.452-08:00"
}'
```
After this eventTime, user-2 is removed.
#### Event 5
Then, on `2014-09-13T...`, let's say we want to add back the user-2 into the application again for some reasons. To record such event, create `$set` event for user-2 with empty properties:
```bash
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
"event" : "$set",
"entityType" : "user",
"entityId" : "2",
"eventTime" : "2014-09-13T16:17:42.143-08:00"
}'
```
After this eventTime, user-2 is created again with empty properties.
Note that all above events are recorded in Event Store. Let's query Event Server and see if these events are imported.
Go to following URL with your browser:
`http://localhost:7070/events.json?accessKey=<YOUR_ACCESS_KEY>`
or run the following command in terminal:
```
$ curl -i -X GET "http://localhost:7070/events.json?accessKey=$ACCESS_KEY"
```
NOTE: Note that you should quote the entire URL by using single or double quotes when you run the curl command.
You should see all events being created for this user-2.
Now, let's retrieve the user-2's properties using the [PEventStore API](https://predictionio.apache.org/api/current/#org.apache.predictionio.data.store.PEventStore$).
First, start `pio-shell` by running:
```
$ pio-shell --with-spark
```
You should see the following output and shell prompt:
```
15/06/02 16:01:54 INFO SparkILoop: Created spark context..
Spark context available as sc.
15/06/02 16:01:54 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
scala>
```
Run the following code in PIO shell (Replace `"MyTestApp"` with your app name):
```scala
scala> val appName="MyTestApp"
scala> import org.apache.predictionio.data.store.PEventStore
scala> PEventStore.aggregateProperties(appName=appName, entityType="user")(sc).collect()
```
This command is using PEventStore to aggregate the user properties as a Map of user Id and the PropertyMap. `collect()` will return the data as array. You should see the following output at the end, which indicates there is user id 2 with empty properties because that's the state of user 2 with all imported events taken into account.
```
res0: Array[(String, org.apache.predictionio.data.storage.PropertyMap)] =
Array((2,PropertyMap(Map(), 2014-09-09T16:17:42.937-08:00, 2014-09-13T16:17:42.143-08:00)))
```
Let's say we want to retrieve the state of user 2 properties with only events 1 and event 2 imported. To do that, we can specify the untilTime (aggregate the user properties with events up to the specified time) in the API.
Run the following in the pio-shell. the untilTime is set to DateTime(2014, 9, 11, 0, 0) which is the time right before event 3.
```
scala> import org.joda.time.DateTime
scala> PEventStore.aggregateProperties(appName=appName, entityType="user", untilTime=Some(new DateTime(2014, 9, 11, 0, 0)))(sc).collect()
```
You should see the following ouptut and the aggregated properties matches what we expected as described earlier (right befor event 3): user-2 has properties of a = 3, b = 5 and c = 6.
```
res2: Array[(String, org.apache.predictionio.data.storage.PropertyMap)] =
Array((2,PropertyMap(Map(b -> JInt(5), a -> JInt(3), c -> JInt(6)), 2014-09-09T16:17:42.937-08:00, 2014-09-10T13:12:04.937-08:00))
```
As you have seen in the example above, the state of user-2 is different depending on the available events or the time you are looking at the data. Recording events in logging fashioned allows us to re-construct the state the entity according to the time.
## 3. Batch Events to the EventServer
Using a different REST address on the usual EventServer port, as of PredictionIO 0.9.5 you can send batches of up to 50 events as a time. The format is as described above but the JSON payload is packaged as an array of Event objects.
**Response:**
* Status:
* 200 on success if we can return an array data in the response even when some events fail (e.g. because of ill-format). Client needs to check individual dictionary to verify all events were successfully created.
* 400 otherwise. Perhaps exceeded 50 events?
* Data: an array of dictionaries each of which contains either following keys
* “status”: 201 if the event was successfully created; otherwise, 400.
* "eventID": the value is the eventID if the event is successfully created and
* "message": the error message string if any error occurs during creation
The order in the response array is corresponding to the order of the request array. However, the events might be imported in any order.
###Sample Request:
curl -i -X POST http://localhost:7070/batch/events.json?accessKey=...
-H "Content-Type: application/json" -d ‘ \
[
{
"event": "$create",
"entityType": "user",
"entityId": "uid",
"properties": {
...
}
},
{
"event": "like",
"entityType": "user",
"entityId": "uid",
"targetEntityType": "item",
"targetEntityId": "iid",
"properties": {
...
}
"eventTime": "2004-12-13T21:39:45.618-07:00"
},
...
]‘
###Sample Response:
HTTP/1.1 200 Successful
Server: spray-can/1.2.1
Date: Wed, 10 Sep 2014 22:51:33 GMT
Content-Type: application/json; charset=UTF-8
Content-Length: 41
[
{"eventId":"AAAABAAAAQDP3-jSlTMGVu0waj8"},
{
"status": 201,
"eventId": "AAAABAAAAQDP3-jSlTMGVu0waj8"
},
{
"status": 201,
"eventId":"AAAABAAAAQDP3-jSlTMGVu0waj9"
},
{
"status": 400,
"message":"Required entityType is missing”
},
]
Notice that each subrequest receives a status response. The limit of 50 events per batch requests is in line with Facebook, Mixpanel, SegmentIO and other event syncs that accept batches.