blob: c70f7c53b960075592583e094ed586568a498f00 [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"id": "83034970",
"metadata": {},
"source": [
"# Distill Example"
]
},
{
"cell_type": "markdown",
"id": "0809cd43",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"id": "49900c00",
"metadata": {},
"source": [
"### License"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a54331ad",
"metadata": {},
"outputs": [],
"source": [
"#\n",
"# Copyright 2022 The Applied Research Laboratory for Intelligence and Security (ARLIS)\n",
"#\n",
"# Licensed to the Apache Software Foundation (ASF) under one or more\n",
"# contributor license agreements. See the NOTICE file distributed with\n",
"# this work for additional information regarding copyright ownership.\n",
"# The ASF licenses this file to You under the Apache License, Version 2.0\n",
"# (the \"License\"); you may not use this file except in compliance with\n",
"# the License. You may obtain a copy of the License at\n",
"#\n",
"# http://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"id": "42d170c5",
"metadata": {},
"source": [
"### Imports Used in this Example"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "9d2e506b",
"metadata": {},
"outputs": [],
"source": [
"from elasticsearch import Elasticsearch\n",
"from elasticsearch_dsl import connections\n",
"from elasticsearch_dsl import Search\n",
"from elasticsearch_dsl import Q\n",
"from elasticsearch_dsl.query import MultiMatch, Match\n",
"from collections import Counter, deque\n",
"from itertools import count\n",
"from uuid import uuid4\n",
"\n",
"import sys\n",
"sys.path.append('../')\n",
"\n",
"import distill\n",
"import numpy as np\n",
"import pandas as pd\n",
"from pandas.io.json import json_normalize\n",
"import json\n",
"import itertools\n",
"import networkx as nx\n",
"import hashlib, base64\n",
"import plotly.graph_objects as go"
]
},
{
"cell_type": "markdown",
"id": "b55f549a",
"metadata": {},
"source": [
"## Define Search into Logging Database"
]
},
{
"cell_type": "markdown",
"id": "e5359a48",
"metadata": {},
"source": [
"Using Elasticsearch as a backend, we can create new connection to a test instance and define a search object based on that instance and a specific index to search."
]
},
{
"cell_type": "code",
"execution_count": 109,
"id": "f8eed107",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<Elasticsearch([{'host': 'localhost', 'port': 9200}])>\n"
]
}
],
"source": [
"flagonClient = connections.create_connection('flagonTest', hosts=['localhost:9200'], timeout=60)\n",
"\n",
"#TODO describeabs connections\n",
"\n",
"#hello world test\n",
"print(flagonClient)"
]
},
{
"cell_type": "code",
"execution_count": 110,
"id": "6caa6227",
"metadata": {},
"outputs": [],
"source": [
"AleS = Search(using='flagonTest', index=\"userale\")"
]
},
{
"cell_type": "markdown",
"id": "63f4b599",
"metadata": {},
"source": [
"## Define Queries against Log Data"
]
},
{
"cell_type": "markdown",
"id": "88c2b4ea",
"metadata": {},
"source": [
"### Simple Queries"
]
},
{
"cell_type": "code",
"execution_count": 111,
"id": "85cbdaf0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Bool(should=[Match(logType='raw'), Match(logType='custom')])\n"
]
}
],
"source": [
"qLogType = Q(\"match\", logType=\"raw\") | Q(\"match\", logType=\"custom\")\n",
"print(qLogType)"
]
},
{
"cell_type": "code",
"execution_count": 112,
"id": "f3db4b61",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Match(userId='superset-user')\n"
]
}
],
"source": [
"qUserId = Q(\"match\", userId=\"superset-user\")\n",
"print(qUserId)"
]
},
{
"cell_type": "code",
"execution_count": 113,
"id": "a9fa6510",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Bool(must=[Match(sessionID=''), Match(sessionID='')])\n"
]
}
],
"source": [
"qExcludeSession = Q(\"match\", sessionID=\"\") & Q(\"match\", sessionID=\"\")\n",
"print(qExcludeSession)"
]
},
{
"cell_type": "markdown",
"id": "d8f18d58",
"metadata": {},
"source": [
"### Not-As-Simple Queries"
]
},
{
"cell_type": "code",
"execution_count": 114,
"id": "39f94c13",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Wildcard(pageUrl={'value': '*/superset/dashboard*'})\n"
]
}
],
"source": [
"qUrl = Q({\"wildcard\": {\n",
" \"pageUrl\": {\n",
" \"value\": \"*/superset/dashboard*\"\n",
" }\n",
"}})\n",
"print(qUrl)"
]
},
{
"cell_type": "markdown",
"id": "e3806738",
"metadata": {},
"source": [
"### Define Filters"
]
},
{
"cell_type": "code",
"execution_count": 115,
"id": "1ae3df5f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Bool(filter=[Bool(must_not=[Terms(type=['keydown', 'mousedown', 'mouseup'])])])\n"
]
}
],
"source": [
"filterEvents = Q('bool', filter=[~Q('terms', type=['keydown', 'mousedown', 'mouseup'])])\n",
"print(filterEvents)"
]
},
{
"cell_type": "markdown",
"id": "218dcf93",
"metadata": {},
"source": [
"## Chained Searches"
]
},
{
"cell_type": "markdown",
"id": "94521da8",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": 116,
"id": "0afab6a0",
"metadata": {},
"outputs": [],
"source": [
"elk_search = AleS \\\n",
" .query(qUrl) \\\n",
" .query(qLogType) \\\n",
" .query(qUserId) \\\n",
" .query(filterEvents) \\\n",
" .extra(track_total_hits=True) #breaks return limit of 10000 hits"
]
},
{
"cell_type": "markdown",
"id": "cfe9411a",
"metadata": {},
"source": [
"NOTE: `.execute()` will only retreive the first 10 hits with additional terms embedded in queries. Use `.scan()` instead if you want to retreive all the hits. We use `.execute()` below for brevity."
]
},
{
"cell_type": "code",
"execution_count": 163,
"id": "7401817d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"12341\n"
]
}
],
"source": [
"ale_dict = {}\n",
"elk_response = elk_search.scan()\n",
"for hit in elk_response:\n",
" logEntry = (hit.to_dict())\n",
" logEntry['uid'] = distill.getUUID(logEntry)\n",
" logEntry['clientTime'] = distill.epoch_to_datetime(logEntry['clientTime'])\n",
" ctr = len(ale_dict)\n",
" ctr += 1\n",
" ale_dict[ctr] = logEntry\n",
"\n",
"print(len(ale_dict))\n",
"#print(ale_dict)"
]
},
{
"cell_type": "markdown",
"id": "b761d6ca",
"metadata": {},
"source": [
"## Data Forensics\n",
"Data Forensics refers to ascertaining what is in our data. We may decide that we filtered to much or too little, and want to re-run our scan through ELK. Or, we may decide just how to apply filters as we go and \"carve\" out new dictionaries with less data, but more of the data we want. The following examples illustrate how to work with UserALE data in a dictionary format to perform data forensics."
]
},
{
"cell_type": "markdown",
"id": "82cd296d",
"metadata": {},
"source": [
"### Sorting\n",
"Getting User logs into a logical sequence can aid in a number of operations down the line."
]
},
{
"cell_type": "markdown",
"id": "74ba205d",
"metadata": {},
"source": [
"A simple lambda function helps in sorting our user log dict by `clientTime` (when logs were written by the client)."
]
},
{
"cell_type": "code",
"execution_count": 118,
"id": "be808aef",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"12305"
]
},
"execution_count": 118,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sorted_data = dict(sorted(ale_dict.items(), key = lambda kv: kv[1]['clientTime']))\n",
"len(sorted_data)"
]
},
{
"cell_type": "markdown",
"id": "8d381225",
"metadata": {},
"source": [
"### Searching\n",
"Before we can filter out what we don't want in our data, we have to be able to be able to describe what we do and don't want. Dictionaries are a fast and efficient way to search through data and Distill provides some supporting libraries for finding the information you want from your user logs."
]
},
{
"cell_type": "markdown",
"id": "89e91341",
"metadata": {},
"source": [
"Distill's `find_meta_values` function uses list comprehensions to quickly provide all the unique values for specific key (e.g., `sessionID`, `userId`)."
]
},
{
"cell_type": "code",
"execution_count": 119,
"id": "bae89caf",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['session_1642013755036',\n",
" 'session_1642561069785',\n",
" 'session_1642012917325',\n",
" 'session_1641584276813',\n",
" 'session_1640200820004',\n",
" 'session_1642626473013',\n",
" 'session_1641502434428',\n",
" 'session_1640029398947',\n",
" 'session_1642562635205',\n",
" 'session_1640118177195',\n",
" 'session_1641844965430',\n",
" 'session_1642004982781']"
]
},
"execution_count": 119,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sessions = distill.find_meta_values('sessionID', sorted_data)\n",
"sessions"
]
},
{
"cell_type": "code",
"execution_count": 120,
"id": "ed519ff9",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['superset-user']"
]
},
"execution_count": 120,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"users = distill.find_meta_values('userId', sorted_data)\n",
"users "
]
},
{
"cell_type": "markdown",
"id": "cc3840bf",
"metadata": {},
"source": [
"Relying on the dictionary format, we can quickly create new dictionaries with certain characteristics using simple dictionary comprehensions (e.g., a dictionary with all logs that contain the key: `path`; a dictionarey with all logs where `type`== `click`)."
]
},
{
"cell_type": "code",
"execution_count": 121,
"id": "76139d82",
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"12302"
]
},
"execution_count": 121,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"values = ['path']\n",
"sorted_data_paths = {k:v for k, v in sorted_data.items() if any(item in values for item in v.keys())}\n",
"len(sorted_data_paths)"
]
},
{
"cell_type": "code",
"execution_count": 122,
"id": "6bf84749",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"392"
]
},
"execution_count": 122,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"values = ['click']\n",
"sorted_data_paths_clicks = {k:v for k, v in sorted_data_paths.items() if any(item in values for item in v.values())}\n",
"len(sorted_data_paths_clicks)"
]
},
{
"cell_type": "markdown",
"id": "53a86d61",
"metadata": {},
"source": [
"Using the same methods, we can find all logs that refer to a specific DOM element in the field `path`. "
]
},
{
"cell_type": "code",
"execution_count": 123,
"id": "6cbaf33e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1889"
]
},
"execution_count": 123,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ele = ['div.superset-legacy-chart-world-map']\n",
"sorted_data_pathele = {k:v for k, v in sorted_data_paths.items() if any(item in ele for item in v['path'])}\n",
"len(sorted_data_pathele.keys())"
]
},
{
"cell_type": "code",
"execution_count": 124,
"id": "b2b4deff",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"5"
]
},
"execution_count": 124,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toggleEle = ['button.ant-btn superset-button css-1mljg09', 'div#chart-id-515.filter_box']\n",
"sorted_data_pathele = {k:v for k, v in sorted_data_paths_clicks.items() if all(item in v['path'] for item in toggleEle)}\n",
"len(sorted_data_pathele.keys()) "
]
},
{
"cell_type": "markdown",
"id": "ecac79e3",
"metadata": {},
"source": [
"## Segmentation\n",
"User data is always nested in time--the things they do, explore, and select are bound to time. Segmentation is the practice of slicing a series of data into a set of epochs (time-bound bins of logs) defined by some characteristic. They can be very general (e.g., 30 second, non-overlapping intervals starting from the beginning of a user session), or they can be very specific (e.g., an epoch when users were interacting with a specific UI element with filters set). Segmentation is generally very challenging and in the realm of 'advanced user analytics'. Distill makes segmentation much easier by supporting data scientists in creating and curating segments as mutable object. See the examples below: "
]
},
{
"cell_type": "markdown",
"id": "8f867369",
"metadata": {},
"source": [
"We want to to be able to create and curate segments without having to rewrite new datasets every time. We're going to start by creating a Master Dictionary for all our interesting segments:"
]
},
{
"cell_type": "code",
"execution_count": 125,
"id": "10e34203",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{}"
]
},
"execution_count": 125,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"superSegments = {}\n",
"superSegments"
]
},
{
"cell_type": "markdown",
"id": "0d991926",
"metadata": {},
"source": [
"### Finding Deadspace\n",
"In one case we might want to readily identify epochs within user sessions wherein users aren't doing anything. We call this \"deadspace\"--a user might have started doing something else AFK, but we have no user behavior to indicate they've switch tasks (e.g., `blur` event). Deadspace can be useful to identify; we can omit it from other segments if we need to. Distill's `detect_deadspace` function is helpful for finding deadspace:"
]
},
{
"cell_type": "code",
"execution_count": 126,
"id": "47553657",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"deadSpace1 Segment_Type.DEADSPACE (1640029869288, 1640030063271) 193.983 4 [350, 301, 346, 347]\n",
"deadSpace2 Segment_Type.DEADSPACE (1640030219069, 1640098958897) 68739.828 2 [840, 924]\n",
"deadSpace3 Segment_Type.DEADSPACE (1640098969598, 1640099715182) 745.584 2 [991, 990]\n",
"deadSpace4 Segment_Type.DEADSPACE (1640099716927, 1640118186776) 18469.849 3 [1007, 674, 1126]\n",
"deadSpace5 Segment_Type.DEADSPACE (1640118296632, 1640182567282) 64270.65 6 [782, 981, 1121, 1222, 1227, 1241]\n",
"deadSpace6 Segment_Type.DEADSPACE (1640182573419, 1640201030250) 18456.831 3 [1231, 1426, 1589]\n",
"deadSpace7 Segment_Type.DEADSPACE (1640201075969, 1640201157009) 81.04 2 [1651, 1819]\n",
"deadSpace8 Segment_Type.DEADSPACE (1640201172007, 1640201332691) 160.684 2 [1812, 1869]\n",
"deadSpace9 Segment_Type.DEADSPACE (1640201346787, 1640201425011) 78.224 4 [1974, 2102, 2118, 2127]\n",
"deadSpace10 Segment_Type.DEADSPACE (1640201494535, 1640201568436) 73.901 4 [2183, 2184, 2185, 2186]\n",
"deadSpace11 Segment_Type.DEADSPACE (1640201578551, 1640201708691) 130.14 6 [2081, 2083, 2176, 2045, 2046, 2047]\n",
"deadSpace12 Segment_Type.DEADSPACE (1640201899707, 1640201968335) 68.628 3 [2859, 2876, 2877]\n",
"deadSpace13 Segment_Type.DEADSPACE (1640202073587, 1641502449931) 1300376.344 3 [3040, 3049, 3061]\n",
"deadSpace14 Segment_Type.DEADSPACE (1641502510070, 1641584293776) 81783.706 3 [3018, 4019, 4020]\n",
"deadSpace15 Segment_Type.DEADSPACE (1641584454611, 1641584584359) 129.748 2 [3969, 3519]\n",
"deadSpace16 Segment_Type.DEADSPACE (1641584657254, 1641584742169) 84.915 2 [4055, 4072]\n",
"deadSpace17 Segment_Type.DEADSPACE (1641584821749, 1641585078332) 256.583 2 [3315, 3316]\n",
"deadSpace18 Segment_Type.DEADSPACE (1641585114110, 1641585228031) 113.921 2 [3739, 3745]\n",
"deadSpace19 Segment_Type.DEADSPACE (1641585237896, 1641585458459) 220.563 2 [3580, 3408]\n",
"deadSpace20 Segment_Type.DEADSPACE (1641585467013, 1641846627837) 261160.824 3 [3582, 7240, 7242]\n",
"deadSpace21 Segment_Type.DEADSPACE (1641846637675, 1641849234850) 2597.175 2 [5053, 5054]\n",
"deadSpace22 Segment_Type.DEADSPACE (1641849676536, 1641910810485) 61133.949 2 [7238, 7263]\n",
"deadSpace23 Segment_Type.DEADSPACE (1641910811783, 1641961009794) 50198.011 2 [7262, 7282]\n",
"deadSpace24 Segment_Type.DEADSPACE (1641961062271, 1641963232792) 2170.521 4 [7281, 7284, 7285, 7286]\n",
"deadSpace25 Segment_Type.DEADSPACE (1641963235873, 1642004991756) 41755.883 4 [7279, 7287, 7906, 7907]\n",
"deadSpace26 Segment_Type.DEADSPACE (1642005017893, 1642005128878) 110.985 2 [7680, 7709]\n",
"deadSpace27 Segment_Type.DEADSPACE (1642005129089, 1642005196721) 67.632 2 [7708, 7691]\n",
"deadSpace28 Segment_Type.DEADSPACE (1642005199678, 1642013738275) 8538.597 3 [7747, 7912, 7913]\n",
"deadSpace29 Segment_Type.DEADSPACE (1642013783536, 1642013854497) 70.961 2 [7335, 7336]\n",
"deadSpace30 Segment_Type.DEADSPACE (1642013961974, 1642014461416) 499.442 2 [8147, 8432]\n",
"deadSpace31 Segment_Type.DEADSPACE (1642014578838, 1642014670475) 91.637 2 [8738, 8379]\n",
"deadSpace32 Segment_Type.DEADSPACE (1642014723082, 1642015133292) 410.21 2 [8794, 8731]\n",
"deadSpace33 Segment_Type.DEADSPACE (1642015161819, 1642015236013) 74.194 2 [8577, 8618]\n",
"deadSpace34 Segment_Type.DEADSPACE (1642015271091, 1642015659247) 388.156 2 [8682, 8678]\n",
"deadSpace35 Segment_Type.DEADSPACE (1642015701502, 1642015925856) 224.354 2 [8690, 8712]\n",
"deadSpace36 Segment_Type.DEADSPACE (1642015928230, 1642016248803) 320.573 2 [8730, 8802]\n",
"deadSpace37 Segment_Type.DEADSPACE (1642016278737, 1642018691736) 2412.999 2 [8956, 8909]\n",
"deadSpace38 Segment_Type.DEADSPACE (1642018770931, 1642561099479) 542328.548 2 [8938, 9020]\n",
"deadSpace39 Segment_Type.DEADSPACE (1642561100398, 1642561241813) 141.415 2 [9072, 9108]\n",
"deadSpace40 Segment_Type.DEADSPACE (1642561252257, 1642561320627) 68.37 2 [9070, 9124]\n",
"deadSpace41 Segment_Type.DEADSPACE (1642561588507, 1642561806420) 217.913 2 [9469, 9470]\n",
"deadSpace42 Segment_Type.DEADSPACE (1642561840009, 1642561903885) 63.876 2 [9691, 9755]\n",
"deadSpace43 Segment_Type.DEADSPACE (1642561989927, 1642562053670) 63.743 2 [10005, 10016]\n",
"deadSpace44 Segment_Type.DEADSPACE (1642562185295, 1642562247581) 62.286 2 [10172, 10272]\n",
"deadSpace45 Segment_Type.DEADSPACE (1642562253855, 1642562316596) 62.741 2 [10229, 10230]\n",
"deadSpace46 Segment_Type.DEADSPACE (1642562433438, 1642562501117) 67.679 2 [10435, 10444]\n",
"deadSpace47 Segment_Type.DEADSPACE (1642562644232, 1642563245456) 601.224 2 [10647, 10660]\n",
"deadSpace48 Segment_Type.DEADSPACE (1642563558577, 1642568294888) 4736.311 4 [11876, 11839, 11840, 11841]\n",
"deadSpace49 Segment_Type.DEADSPACE (1642568310491, 1642626477474) 58166.983 3 [11783, 11637, 11638]\n",
"deadSpace50 Segment_Type.DEADSPACE (1642626559946, 1642642862736) 16302.79 2 [11838, 12288]\n",
"deadSpace51 Segment_Type.DEADSPACE (1642642866908, 1642646440472) 3573.564 2 [12294, 12295]\n",
"deadSpace52 Segment_Type.DEADSPACE (1642646442067, 1642648050144) 1608.077 2 [12292, 12293]\n",
"deadSpace53 Segment_Type.DEADSPACE (1642648052957, 1642654564752) 6511.795 2 [12302, 12299]\n"
]
}
],
"source": [
"deadSpaceSegments = distill.detect_deadspace(sorted_data_paths, 60, 0, 0)\n",
"for counter, d in enumerate(deadSpaceSegments.values(), start=1):\n",
" d.segment_name = str(\"deadSpace\" + str(counter)) #renaming segment names on the fly\n",
" d.segment_length_sec = (d.start_end_val[1] - d.start_end_val[0])/1000 #adding custom segment-object attributes\n",
" print(d.segment_name, d.segment_type, d.start_end_val, d.segment_length_sec, d.num_logs, d.uids)"
]
},
{
"cell_type": "markdown",
"id": "c946342d",
"metadata": {},
"source": [
"Distill is designed to supplement forensic workflows. We might learn that segmenting one way doesn't work. We can easily modify parameters with review and modify quickly. In this case we may need to assume that 1 minute of 'deadspace' is normal use, we can revise our timing parameters and return a more reasonable solution:"
]
},
{
"cell_type": "code",
"execution_count": 191,
"id": "dc130a1f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"deadSpace1 Segment_Type.DEADSPACE (1640030219069, 1640098958897) 68739.828 2 [840, 924]\n",
"deadSpace2 Segment_Type.DEADSPACE (1640098969598, 1640099715182) 745.584 2 [991, 990]\n",
"deadSpace3 Segment_Type.DEADSPACE (1640099716927, 1640118186776) 18469.849 3 [1007, 674, 1126]\n",
"deadSpace4 Segment_Type.DEADSPACE (1640118296632, 1640182567282) 64270.65 6 [782, 981, 1121, 1222, 1227, 1241]\n",
"deadSpace5 Segment_Type.DEADSPACE (1640182573419, 1640201030250) 18456.831 3 [1231, 1426, 1589]\n",
"deadSpace6 Segment_Type.DEADSPACE (1640202073587, 1641502449931) 1300376.344 3 [3040, 3049, 3061]\n",
"deadSpace7 Segment_Type.DEADSPACE (1641502510070, 1641584293776) 81783.706 3 [3018, 4019, 4020]\n",
"deadSpace8 Segment_Type.DEADSPACE (1641585467013, 1641846627837) 261160.824 3 [3582, 7240, 7242]\n",
"deadSpace9 Segment_Type.DEADSPACE (1641846637675, 1641849234850) 2597.175 2 [5053, 5054]\n",
"deadSpace10 Segment_Type.DEADSPACE (1641849676536, 1641910810485) 61133.949 2 [7238, 7263]\n",
"deadSpace11 Segment_Type.DEADSPACE (1641910811783, 1641961009794) 50198.011 2 [7262, 7282]\n",
"deadSpace12 Segment_Type.DEADSPACE (1641961062271, 1641963232792) 2170.521 4 [7281, 7284, 7285, 7286]\n",
"deadSpace13 Segment_Type.DEADSPACE (1641963235873, 1642004991756) 41755.883 4 [7279, 7287, 7906, 7907]\n",
"deadSpace14 Segment_Type.DEADSPACE (1642005199678, 1642013738275) 8538.597 3 [7747, 7912, 7913]\n",
"deadSpace15 Segment_Type.DEADSPACE (1642013961974, 1642014461416) 499.442 2 [8147, 8432]\n",
"deadSpace16 Segment_Type.DEADSPACE (1642014723082, 1642015133292) 410.21 2 [8794, 8731]\n",
"deadSpace17 Segment_Type.DEADSPACE (1642015271091, 1642015659247) 388.156 2 [8682, 8678]\n",
"deadSpace18 Segment_Type.DEADSPACE (1642016278737, 1642018691736) 2412.999 2 [8956, 8909]\n",
"deadSpace19 Segment_Type.DEADSPACE (1642018770931, 1642561099479) 542328.548 2 [8938, 9020]\n",
"deadSpace20 Segment_Type.DEADSPACE (1642562644232, 1642563245456) 601.224 2 [10647, 10660]\n",
"deadSpace21 Segment_Type.DEADSPACE (1642563558577, 1642568294888) 4736.311 4 [11876, 11839, 11840, 11841]\n",
"deadSpace22 Segment_Type.DEADSPACE (1642568310491, 1642626477474) 58166.983 3 [11783, 11637, 11638]\n",
"deadSpace23 Segment_Type.DEADSPACE (1642626559946, 1642642862736) 16302.79 2 [11838, 12288]\n",
"deadSpace24 Segment_Type.DEADSPACE (1642642866908, 1642646440472) 3573.564 2 [12294, 12295]\n",
"deadSpace25 Segment_Type.DEADSPACE (1642646442067, 1642648050144) 1608.077 2 [12292, 12293]\n",
"deadSpace26 Segment_Type.DEADSPACE (1642648052957, 1642654564752) 6511.795 2 [12302, 12299]\n"
]
}
],
"source": [
"deadSpaceSegments = distill.detect_deadspace(sorted_data_paths, 360, 0, 0)\n",
"for counter, d in enumerate(deadSpaceSegments.values(), start=1):\n",
" d.segment_name = str(\"deadSpace\" + str(counter)) #renaming segment names on the fly\n",
" d.segment_length_sec = (d.start_end_val[1] - d.start_end_val[0])/1000 #adding custom segment-object attributes\n",
" print(d.segment_name, d.segment_type, d.start_end_val, d.segment_length_sec, d.num_logs, d.uids)"
]
},
{
"cell_type": "code",
"execution_count": 189,
"id": "2efcd5ed",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys([840, 991, 1007, 1121, 1231, 3040, 3018, 3582, 5053, 7238, 7262, 7281, 7287, 7747, 8147, 8794, 8682, 8956, 8938, 10647, 11876, 11783, 11838, 12294, 12292, 12302])"
]
},
"execution_count": 189,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"deadSpaceSegments.keys()"
]
},
{
"cell_type": "markdown",
"id": "f593d800",
"metadata": {},
"source": [
"Add the revised deadspace segments to the Master Dictionary of segments:"
]
},
{
"cell_type": "code",
"execution_count": 128,
"id": "9cd5dffc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"deadSpace1 Segment_Type.DEADSPACE\n",
"deadSpace2 Segment_Type.DEADSPACE\n",
"deadSpace3 Segment_Type.DEADSPACE\n",
"deadSpace4 Segment_Type.DEADSPACE\n",
"deadSpace5 Segment_Type.DEADSPACE\n",
"deadSpace6 Segment_Type.DEADSPACE\n",
"deadSpace7 Segment_Type.DEADSPACE\n",
"deadSpace8 Segment_Type.DEADSPACE\n",
"deadSpace9 Segment_Type.DEADSPACE\n",
"deadSpace10 Segment_Type.DEADSPACE\n",
"deadSpace11 Segment_Type.DEADSPACE\n",
"deadSpace12 Segment_Type.DEADSPACE\n",
"deadSpace13 Segment_Type.DEADSPACE\n",
"deadSpace14 Segment_Type.DEADSPACE\n",
"deadSpace15 Segment_Type.DEADSPACE\n",
"deadSpace16 Segment_Type.DEADSPACE\n",
"deadSpace17 Segment_Type.DEADSPACE\n",
"deadSpace18 Segment_Type.DEADSPACE\n",
"deadSpace19 Segment_Type.DEADSPACE\n",
"deadSpace20 Segment_Type.DEADSPACE\n",
"deadSpace21 Segment_Type.DEADSPACE\n",
"deadSpace22 Segment_Type.DEADSPACE\n",
"deadSpace23 Segment_Type.DEADSPACE\n",
"deadSpace24 Segment_Type.DEADSPACE\n",
"deadSpace25 Segment_Type.DEADSPACE\n",
"deadSpace26 Segment_Type.DEADSPACE\n"
]
}
],
"source": [
"superSegments.update(deadSpaceSegments)\n",
"for d in superSegments.values():\n",
" print(d.segment_name, d.segment_type)"
]
},
{
"cell_type": "code",
"execution_count": 165,
"id": "3114cf69",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys([840, 991, 1007, 1121, 1231, 3040, 3018, 3582, 5053, 7238, 7262, 7281, 7287, 7747, 8147, 8794, 8682, 8956, 8938, 10647, 11876, 11783, 11838, 12294, 12292, 12302, 'toggle1', 'toggle2', 'toggle3', 'toggle4', 'toggle1_2', 75, 206, 312, 415, 1277, 1312, 925, 1002, 762, 1068, 653, 1235, 1665, 1711, 1833, 1872, 2108, 2157, 2002, 2048, 2332, 2663, 2795, 2888, 2943, 2981, 2991, 3082, 3628, 3925, 3499, 3519, 3710, 4072, 3315, 3326, 4004, 3745, 3848, 5060, 6676, 4541, 5076, 4624, 6835, 5864, 5966, 6000, 5553, 7269, 7277, 7774, 7692, 8044, 8055, 7427, 7574, 8143, 8346, 8464, 8373, 8221, 8627, 8598, 8713, 8817, 8911, 8941, 9085, 9094, 9185, 9304, 9329, 9418, 9432, 9466, 9627, 9757, 10140, 10530, 10633, 10663, 10766, 11225, 11461, 11639, 12303])"
]
},
"execution_count": 165,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"superSegments.keys()"
]
},
{
"cell_type": "code",
"execution_count": 183,
"id": "9e002514",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<distill.segmentation.segment.Segment at 0x21d0f1d09a0>"
]
},
"execution_count": 183,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"superSegments['toggle4']"
]
},
{
"cell_type": "markdown",
"id": "1bf72eb2",
"metadata": {},
"source": [
"### Simple Segments - Toggles\n",
"In another case, we might know exactly which events bound segment, e.g., a toggle feature or the application of a filter. If the corresponding element(s) are known we can search our dictionary for these events and create new segments readily with Distill's `create_segment` function. "
]
},
{
"cell_type": "markdown",
"id": "8b389422",
"metadata": {},
"source": [
"First, find all logs that contain the key with corresponding values that relate to those elements. Using UserALE.js data, we're looking for the \"path\" key, and within that set, only logs that correspond to 'click' events:"
]
},
{
"cell_type": "code",
"execution_count": 129,
"id": "c1aeb62c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"12302"
]
},
"execution_count": 129,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"values = ['path']\n",
"sorted_data_paths = {k:v for k, v in sorted_data.items() if any(item in values for item in v.keys())}\n",
"len(sorted_data_paths)"
]
},
{
"cell_type": "code",
"execution_count": 130,
"id": "ab609416",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"392"
]
},
"execution_count": 130,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"values = ['click']\n",
"sorted_data_paths_clicks = {k:v for k, v in sorted_data_paths.items() if any(item in values for item in v.values())}\n",
"len(sorted_data_paths_clicks)"
]
},
{
"cell_type": "markdown",
"id": "4e1bdae5",
"metadata": {},
"source": [
"Next, find all logs where \"path\" contains the elements that correspond to the exact 'toggle' element and extrac the times that indicate each time it was 'clicked'."
]
},
{
"cell_type": "code",
"execution_count": 131,
"id": "6c32d7e2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(1642561335946, 1642561421205),\n",
" (1642561421205, 1642561581309),\n",
" (1642561581309, 1642563313727),\n",
" (1642563313727, 1642563487560)]"
]
},
"execution_count": 131,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"toggleEle = ['button.ant-btn superset-button css-1mljg09', 'div#chart-id-515.filter_box']\n",
"toggle_times = distill.pairwiseSeq([log['clientTime'] for log in sorted_data_paths_clicks.values() if all(item in log['path'] for item in toggleEle)])\n",
"toggle_times"
]
},
{
"cell_type": "markdown",
"id": "0e9247ce",
"metadata": {},
"source": [
"Now, we can submit these times to Distill's `create_segment` function:"
]
},
{
"cell_type": "code",
"execution_count": 132,
"id": "d94c4010",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"toggle1 Segment_Type.CREATE (1642561335946, 1642561421205) 85.259 4\n",
"toggle2 Segment_Type.CREATE (1642561421205, 1642561581309) 160.104 14\n",
"toggle3 Segment_Type.CREATE (1642561581309, 1642563313727) 1732.418 127\n",
"toggle4 Segment_Type.CREATE (1642563313727, 1642563487560) 173.833 32\n"
]
}
],
"source": [
"segment_names = []\n",
"for i in range(0,len(toggle_times),1):\n",
" segment_names.append(str(\"toggle\" + str(i+1)))\n",
"toggleSegments = distill.create_segment(sorted_data_paths_clicks, segment_names, toggle_times)\n",
"for d in toggleSegments.values():\n",
" d.segment_length_sec = (d.start_end_val[1] - d.start_end_val[0])/1000 #adding custom segment-object attributes\n",
" print(d.segment_name, d.segment_type, d.start_end_val, d.segment_length_sec, d.num_logs)"
]
},
{
"cell_type": "markdown",
"id": "86a1e3be",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": 149,
"id": "2c5ab730",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"toggle1 Segment_Type.CREATE (1642561335946, 1642561421205) 85.259 4\n",
"toggle2 Segment_Type.CREATE (1642561421205, 1642561581309) 160.104 14\n",
"toggle3 Segment_Type.CREATE (1642561581309, 1642563313727) 1732.418 127\n",
"toggle4 Segment_Type.CREATE (1642563313727, 1642563487560) 173.833 32\n",
"toggle1_2 Segment_Type.UNION (1642561335946, 1642561581309) 245.363 17\n"
]
}
],
"source": [
"toggleSegments['toggle1_2'] = distill.union('toggle1_2', toggleSegments['toggle2'], toggleSegments['toggle1'])\n",
"for d in toggleSegments.values():\n",
" d.segment_length_sec = (d.start_end_val[1] - d.start_end_val[0])/1000 #adding custom segment-object attributes\n",
" print(d.segment_name, d.segment_type, d.start_end_val, d.segment_length_sec, d.num_logs)"
]
},
{
"cell_type": "markdown",
"id": "55dc664e",
"metadata": {},
"source": [
"Finally, add revised deadspace segments to the Master Dictionary of segments:"
]
},
{
"cell_type": "code",
"execution_count": 150,
"id": "2471c232",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"deadSpace1 Segment_Type.DEADSPACE\n",
"deadSpace2 Segment_Type.DEADSPACE\n",
"deadSpace3 Segment_Type.DEADSPACE\n",
"deadSpace4 Segment_Type.DEADSPACE\n",
"deadSpace5 Segment_Type.DEADSPACE\n",
"deadSpace6 Segment_Type.DEADSPACE\n",
"deadSpace7 Segment_Type.DEADSPACE\n",
"deadSpace8 Segment_Type.DEADSPACE\n",
"deadSpace9 Segment_Type.DEADSPACE\n",
"deadSpace10 Segment_Type.DEADSPACE\n",
"deadSpace11 Segment_Type.DEADSPACE\n",
"deadSpace12 Segment_Type.DEADSPACE\n",
"deadSpace13 Segment_Type.DEADSPACE\n",
"deadSpace14 Segment_Type.DEADSPACE\n",
"deadSpace15 Segment_Type.DEADSPACE\n",
"deadSpace16 Segment_Type.DEADSPACE\n",
"deadSpace17 Segment_Type.DEADSPACE\n",
"deadSpace18 Segment_Type.DEADSPACE\n",
"deadSpace19 Segment_Type.DEADSPACE\n",
"deadSpace20 Segment_Type.DEADSPACE\n",
"deadSpace21 Segment_Type.DEADSPACE\n",
"deadSpace22 Segment_Type.DEADSPACE\n",
"deadSpace23 Segment_Type.DEADSPACE\n",
"deadSpace24 Segment_Type.DEADSPACE\n",
"deadSpace25 Segment_Type.DEADSPACE\n",
"deadSpace26 Segment_Type.DEADSPACE\n",
"toggle1 Segment_Type.CREATE\n",
"toggle2 Segment_Type.CREATE\n",
"toggle3 Segment_Type.CREATE\n",
"toggle4 Segment_Type.CREATE\n",
"toggle1_2 Segment_Type.UNION\n"
]
}
],
"source": [
"superSegments.update(toggleSegments)\n",
"for d in superSegments.values():\n",
" print(d.segment_name, d.segment_type)"
]
},
{
"cell_type": "markdown",
"id": "05eccefd",
"metadata": {},
"source": [
"### Ambiguous Segments"
]
},
{
"cell_type": "code",
"execution_count": 151,
"id": "42f0102e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"map_1 (1640029804278, 1640029834278) 30.0 183\n",
"map_2 (1640029840106, 1640029870106) 30.0 67\n",
"map_3 (1640030065505, 1640030095505) 30.0 140\n",
"map_4 (1640030113456, 1640030143456) 30.0 84\n",
"map_5 (1640030159682, 1640030189682) 30.0 19\n",
"map_6 (1640030215637, 1640030245637) 30.0 7\n",
"map_7 (1640098960393, 1640098990393) 30.0 96\n",
"map_8 (1640099715638, 1640099745638) 30.0 14\n",
"map_9 (1640118193428, 1640118223428) 30.0 185\n",
"map_10 (1640118242304, 1640118272304) 30.0 135\n",
"map_11 (1640118276485, 1640118306485) 30.0 145\n",
"map_12 (1640182568142, 1640182598142) 30.0 5\n",
"map_13 (1640201041753, 1640201071753) 30.0 183\n",
"map_14 (1640201071761, 1640201101761) 30.0 55\n",
"map_15 (1640201157250, 1640201187250) 30.0 146\n",
"map_16 (1640201333004, 1640201363004) 30.0 116\n",
"map_17 (1640201425485, 1640201455485) 30.0 54\n",
"map_18 (1640201493781, 1640201523781) 30.0 8\n",
"map_19 (1640201571360, 1640201601360) 30.0 92\n",
"map_20 (1640201709131, 1640201739131) 30.0 107\n",
"map_21 (1640201770496, 1640201800496) 30.0 113\n",
"map_22 (1640201846176, 1640201876176) 30.0 107\n",
"map_23 (1640201879303, 1640201909303) 30.0 98\n",
"map_24 (1640201971359, 1640202001359) 30.0 37\n",
"map_25 (1640202031447, 1640202061447) 30.0 64\n",
"map_26 (1641502458602, 1641502488602) 30.0 16\n",
"map_27 (1641502506921, 1641502536921) 30.0 45\n",
"map_28 (1641584316907, 1641584346907) 30.0 170\n",
"map_29 (1641584351264, 1641584381264) 30.0 109\n",
"map_30 (1641584412220, 1641584442220) 30.0 78\n",
"map_31 (1641584443149, 1641584473149) 30.0 133\n",
"map_32 (1641584584359, 1641584614359) 30.0 42\n",
"map_33 (1641584656995, 1641584686995) 30.0 2\n",
"map_34 (1641584742169, 1641584772169) 30.0 43\n",
"map_35 (1641584821749, 1641584851749) 30.0 1\n",
"map_36 (1641585078613, 1641585108613) 30.0 280\n",
"map_37 (1641585110991, 1641585140991) 30.0 55\n",
"map_38 (1641585228031, 1641585258031) 30.0 28\n",
"map_39 (1641585458977, 1641585488977) 30.0 14\n",
"map_40 (1641849235130, 1641849265130) 30.0 128\n",
"map_41 (1641849271492, 1641849301492) 30.0 284\n",
"map_42 (1641849306285, 1641849336285) 30.0 180\n",
"map_43 (1641849336995, 1641849366995) 30.0 216\n",
"map_44 (1641849367149, 1641849397149) 30.0 168\n",
"map_45 (1641849436992, 1641849466992) 30.0 191\n",
"map_46 (1641849467312, 1641849497312) 30.0 258\n",
"map_47 (1641849526231, 1641849556231) 30.0 105\n",
"map_48 (1641849560480, 1641849590480) 30.0 121\n",
"map_49 (1641849671655, 1641849701655) 30.0 32\n",
"map_50 (1641910810650, 1641910840650) 30.0 9\n",
"map_51 (1641963233861, 1641963263861) 30.0 10\n",
"map_52 (1642004997774, 1642005027774) 30.0 56\n",
"map_53 (1642005196917, 1642005226917) 30.0 35\n",
"map_54 (1642013742966, 1642013772966) 30.0 27\n",
"map_55 (1642013773427, 1642013803427) 30.0 52\n",
"map_56 (1642013854863, 1642013884863) 30.0 219\n",
"map_57 (1642013911632, 1642013941632) 30.0 167\n",
"map_58 (1642013961109, 1642013991109) 30.0 3\n",
"map_59 (1642014463528, 1642014493528) 30.0 31\n",
"map_60 (1642014516874, 1642014546874) 30.0 28\n",
"map_61 (1642014670794, 1642014700794) 30.0 291\n",
"map_62 (1642014716890, 1642014746890) 30.0 11\n",
"map_63 (1642015133602, 1642015163602) 30.0 104\n",
"map_64 (1642015237595, 1642015267595) 30.0 34\n",
"map_65 (1642015925873, 1642015955873) 30.0 39\n",
"map_66 (1642016249603, 1642016279603) 30.0 145\n",
"map_67 (1642018692112, 1642018722112) 30.0 38\n",
"map_68 (1642018769090, 1642018799090) 30.0 14\n",
"map_69 (1642561250405, 1642561280405) 30.0 7\n",
"map_70 (1642561321319, 1642561351319) 30.0 51\n",
"map_71 (1642561381832, 1642561411832) 30.0 77\n",
"map_72 (1642561418573, 1642561448573) 30.0 26\n",
"map_73 (1642561489022, 1642561519022) 30.0 129\n",
"map_74 (1642561525426, 1642561555426) 30.0 13\n",
"map_75 (1642561582104, 1642561612104) 30.0 11\n",
"map_76 (1642561806544, 1642561836544) 30.0 107\n",
"map_77 (1642561836797, 1642561866797) 30.0 35\n",
"map_78 (1642561904243, 1642561934243) 30.0 49\n",
"map_79 (1642562156535, 1642562186535) 30.0 110\n",
"map_80 (1642562613217, 1642562643217) 30.0 34\n",
"map_81 (1642562643454, 1642562673454) 30.0 10\n",
"map_82 (1642563261459, 1642563291459) 30.0 102\n",
"map_83 (1642563311454, 1642563341454) 30.0 183\n",
"map_84 (1642563425528, 1642563455528) 30.0 86\n",
"map_85 (1642563483058, 1642563513058) 30.0 91\n",
"map_86 (1642626481298, 1642626511298) 30.0 362\n",
"map_87 (1642654564769, 1642654594769) 30.0 3\n"
]
}
],
"source": [
"mapSegments = distill.generate_segments(sorted_data_paths,'path',['div.superset-legacy-chart-world-map','window'],0,30)\n",
"for counter, d in enumerate(mapSegments.values(), start=1): \n",
" d.segment_name = str(\"map_\" + str(counter))\n",
" d.segment_length_sec = (d.start_end_val[1] - d.start_end_val[0])/1000 #adding custom segment-object attributes\n",
" print(d.segment_name, d.start_end_val, d.segment_length_sec, d.num_logs)"
]
},
{
"cell_type": "code",
"execution_count": 164,
"id": "6e5a5738",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"deadSpace1 Segment_Type.DEADSPACE\n",
"deadSpace2 Segment_Type.DEADSPACE\n",
"deadSpace3 Segment_Type.DEADSPACE\n",
"deadSpace4 Segment_Type.DEADSPACE\n",
"deadSpace5 Segment_Type.DEADSPACE\n",
"deadSpace6 Segment_Type.DEADSPACE\n",
"deadSpace7 Segment_Type.DEADSPACE\n",
"deadSpace8 Segment_Type.DEADSPACE\n",
"deadSpace9 Segment_Type.DEADSPACE\n",
"deadSpace10 Segment_Type.DEADSPACE\n",
"deadSpace11 Segment_Type.DEADSPACE\n",
"deadSpace12 Segment_Type.DEADSPACE\n",
"deadSpace13 Segment_Type.DEADSPACE\n",
"deadSpace14 Segment_Type.DEADSPACE\n",
"deadSpace15 Segment_Type.DEADSPACE\n",
"deadSpace16 Segment_Type.DEADSPACE\n",
"deadSpace17 Segment_Type.DEADSPACE\n",
"deadSpace18 Segment_Type.DEADSPACE\n",
"deadSpace19 Segment_Type.DEADSPACE\n",
"deadSpace20 Segment_Type.DEADSPACE\n",
"deadSpace21 Segment_Type.DEADSPACE\n",
"deadSpace22 Segment_Type.DEADSPACE\n",
"deadSpace23 Segment_Type.DEADSPACE\n",
"deadSpace24 Segment_Type.DEADSPACE\n",
"deadSpace25 Segment_Type.DEADSPACE\n",
"deadSpace26 Segment_Type.DEADSPACE\n",
"toggle1 Segment_Type.CREATE\n",
"toggle2 Segment_Type.CREATE\n",
"toggle3 Segment_Type.CREATE\n",
"toggle4 Segment_Type.CREATE\n",
"toggle1_2 Segment_Type.UNION\n",
"map_1 Segment_Type.GENERATE\n",
"map_2 Segment_Type.GENERATE\n",
"map_3 Segment_Type.GENERATE\n",
"map_4 Segment_Type.GENERATE\n",
"map_5 Segment_Type.GENERATE\n",
"map_6 Segment_Type.GENERATE\n",
"map_7 Segment_Type.GENERATE\n",
"map_8 Segment_Type.GENERATE\n",
"map_9 Segment_Type.GENERATE\n",
"map_10 Segment_Type.GENERATE\n",
"map_11 Segment_Type.GENERATE\n",
"map_12 Segment_Type.GENERATE\n",
"map_13 Segment_Type.GENERATE\n",
"map_14 Segment_Type.GENERATE\n",
"map_15 Segment_Type.GENERATE\n",
"map_16 Segment_Type.GENERATE\n",
"map_17 Segment_Type.GENERATE\n",
"map_18 Segment_Type.GENERATE\n",
"map_19 Segment_Type.GENERATE\n",
"map_20 Segment_Type.GENERATE\n",
"map_21 Segment_Type.GENERATE\n",
"map_22 Segment_Type.GENERATE\n",
"map_23 Segment_Type.GENERATE\n",
"map_24 Segment_Type.GENERATE\n",
"map_25 Segment_Type.GENERATE\n",
"map_26 Segment_Type.GENERATE\n",
"map_27 Segment_Type.GENERATE\n",
"map_28 Segment_Type.GENERATE\n",
"map_29 Segment_Type.GENERATE\n",
"map_30 Segment_Type.GENERATE\n",
"map_31 Segment_Type.GENERATE\n",
"map_32 Segment_Type.GENERATE\n",
"map_33 Segment_Type.GENERATE\n",
"map_34 Segment_Type.GENERATE\n",
"map_35 Segment_Type.GENERATE\n",
"map_36 Segment_Type.GENERATE\n",
"map_37 Segment_Type.GENERATE\n",
"map_38 Segment_Type.GENERATE\n",
"map_39 Segment_Type.GENERATE\n",
"map_40 Segment_Type.GENERATE\n",
"map_41 Segment_Type.GENERATE\n",
"map_42 Segment_Type.GENERATE\n",
"map_43 Segment_Type.GENERATE\n",
"map_44 Segment_Type.GENERATE\n",
"map_45 Segment_Type.GENERATE\n",
"map_46 Segment_Type.GENERATE\n",
"map_47 Segment_Type.GENERATE\n",
"map_48 Segment_Type.GENERATE\n",
"map_49 Segment_Type.GENERATE\n",
"map_50 Segment_Type.GENERATE\n",
"map_51 Segment_Type.GENERATE\n",
"map_52 Segment_Type.GENERATE\n",
"map_53 Segment_Type.GENERATE\n",
"map_54 Segment_Type.GENERATE\n",
"map_55 Segment_Type.GENERATE\n",
"map_56 Segment_Type.GENERATE\n",
"map_57 Segment_Type.GENERATE\n",
"map_58 Segment_Type.GENERATE\n",
"map_59 Segment_Type.GENERATE\n",
"map_60 Segment_Type.GENERATE\n",
"map_61 Segment_Type.GENERATE\n",
"map_62 Segment_Type.GENERATE\n",
"map_63 Segment_Type.GENERATE\n",
"map_64 Segment_Type.GENERATE\n",
"map_65 Segment_Type.GENERATE\n",
"map_66 Segment_Type.GENERATE\n",
"map_67 Segment_Type.GENERATE\n",
"map_68 Segment_Type.GENERATE\n",
"map_69 Segment_Type.GENERATE\n",
"map_70 Segment_Type.GENERATE\n",
"map_71 Segment_Type.GENERATE\n",
"map_72 Segment_Type.GENERATE\n",
"map_73 Segment_Type.GENERATE\n",
"map_74 Segment_Type.GENERATE\n",
"map_75 Segment_Type.GENERATE\n",
"map_76 Segment_Type.GENERATE\n",
"map_77 Segment_Type.GENERATE\n",
"map_78 Segment_Type.GENERATE\n",
"map_79 Segment_Type.GENERATE\n",
"map_80 Segment_Type.GENERATE\n",
"map_81 Segment_Type.GENERATE\n",
"map_82 Segment_Type.GENERATE\n",
"map_83 Segment_Type.GENERATE\n",
"map_84 Segment_Type.GENERATE\n",
"map_85 Segment_Type.GENERATE\n",
"map_86 Segment_Type.GENERATE\n",
"map_87 Segment_Type.GENERATE\n"
]
}
],
"source": [
"superSegments.update(mapSegments)\n",
"for d in superSegments.values():\n",
" print(d.segment_name, d.segment_type)"
]
},
{
"cell_type": "code",
"execution_count": 153,
"id": "20ab036f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"51"
]
},
"execution_count": 153,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mapSegments_list = []\n",
"mapSegment_times = []\n",
"for d in mapSegments.values():\n",
" if d.num_logs > 50:\n",
" mapSegments_list.append(d.segment_name)\n",
" mapSegment_times.append(d.start_end_val)\n",
"len(mapSegment_times)"
]
},
{
"cell_type": "markdown",
"id": "322fb822",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": 154,
"id": "3ad9c6af",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"map_1 183\n",
"map_9 185\n",
"map_13 183\n",
"map_28 170\n",
"map_36 280\n",
"map_41 284\n",
"map_42 180\n",
"map_43 216\n",
"map_44 168\n",
"map_45 191\n",
"map_46 258\n",
"map_56 219\n",
"map_57 167\n",
"map_61 291\n",
"map_83 183\n",
"map_86 362\n"
]
}
],
"source": [
"mapSegments_data = distill.write_segment(sorted_data_paths, mapSegments_list, mapSegment_times)\n",
"for d in mapSegments_data.keys():\n",
" if len(mapSegments_data[d]) > 150:\n",
" print(d, len(mapSegments_data[d]))"
]
},
{
"cell_type": "markdown",
"id": "26915660",
"metadata": {},
"source": [
"## Graphs and Stats"
]
},
{
"cell_type": "code",
"execution_count": 155,
"id": "17476fd9",
"metadata": {},
"outputs": [],
"source": [
"edges_map_1 = distill.pairwiseSeq(['|'.join(log['path']) for log in mapSegments_data['map_1'].values()])\n",
"edges_list_map_1 = list(edges_map_1)\n",
"edges_map_2 = distill.pairwiseSeq(['|'.join(log['path']) for log in mapSegments_data['map_9'].values()])\n",
"edges_list_map_2 = list(edges_map_2)"
]
},
{
"cell_type": "code",
"execution_count": 156,
"id": "bb75016c",
"metadata": {},
"outputs": [],
"source": [
"nodes_map_1 = set(['|'.join(log['path']) for log in mapSegments_data['map_1'].values()])\n",
"nodes_list_map_1 = list(nodes_map_1)\n",
"nodes_map_2 = set(['|'.join(log['path']) for log in mapSegments_data['map_9'].values()])\n",
"nodes_list_map_2 = list(nodes_map_2)"
]
},
{
"cell_type": "code",
"execution_count": 157,
"id": "a1140db2",
"metadata": {},
"outputs": [],
"source": [
"G_map1 = distill.createDiGraph(nodes_list_map_1, edges_list_map_1, drop_recursions = False)\n",
"G_map2 = distill.createDiGraph(nodes_list_map_2, edges_list_map_2, drop_recursions = False)"
]
},
{
"cell_type": "code",
"execution_count": 158,
"id": "e7b2ecbf",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"nx.draw(G_map1, with_labels=False)"
]
},
{
"cell_type": "code",
"execution_count": 159,
"id": "a81165c5",
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"nx.draw(G_map2, with_labels=False)"
]
},
{
"cell_type": "code",
"execution_count": 160,
"id": "839167fc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9979518689196109"
]
},
"execution_count": 160,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nx.average_node_connectivity(G_map2)"
]
},
{
"cell_type": "markdown",
"id": "db537c67",
"metadata": {},
"source": [
"## Exploratory Visualization"
]
},
{
"cell_type": "code",
"execution_count": 195,
"id": "a7231ba0",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.plotly.v1+json": {
"config": {
"plotlyServerURL": "https://plot.ly"
},
"data": [
{
"link": {
"source": [
0,
1,
2,
0,
3,
4,
5,
6,
7,
8,
7,
5,
4,
7,
0,
2,
3,
9,
10,
2,
11,
12,
13,
14,
15,
14,
16,
13,
13,
11,
16,
12,
2,
1,
2,
9,
17,
18,
19,
20,
21,
18,
17,
9,
2,
12,
8,
22,
23,
8,
0,
24,
23,
9,
12,
24,
2,
23,
25,
8,
0,
26,
9,
27,
7,
9
],
"target": [
1,
2,
0,
3,
4,
5,
6,
7,
8,
7,
5,
4,
7,
2,
2,
3,
9,
10,
2,
11,
12,
13,
14,
15,
14,
16,
13,
16,
11,
16,
12,
0,
1,
0,
9,
17,
18,
19,
20,
21,
18,
17,
9,
2,
12,
8,
22,
23,
8,
12,
24,
23,
9,
12,
24,
2,
23,
25,
23,
9,
26,
1,
27,
7,
4,
7
],
"value": [
8,
5,
16,
1,
1,
1,
1,
1,
2,
2,
1,
1,
3,
2,
16,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
2,
1,
1,
1,
1,
2,
4,
8,
4,
1,
1,
1,
1,
1,
1,
1,
1,
2,
1,
1,
1,
1,
2,
1,
1,
1,
1,
2,
2,
2,
1,
1,
1,
1,
1,
1,
1,
1,
3,
1
]
},
"node": {
"label": [
"path.[object SVGAnimatedString]",
"circle.[object SVGAnimatedString]",
"svg.[object SVGAnimatedString]",
"div.superset-legacy-chart-world-map",
"td. ",
"th. ",
"div.header-title",
"div.dashboard-component dashboard-component-chart-holder fade-out",
"div.dashboard-content css-wp1ax2",
"div.dashboard-component dashboard-component-chart-holder fade-out",
"div.grid-column background--transparent",
"#document",
"div.header-title",
"svg.[object SVGAnimatedString]",
"rect.[object SVGAnimatedString]",
"path.[object SVGAnimatedString]",
"text.[object SVGAnimatedString]",
"div.grid-row background--transparent",
"div.dashboard-component dashboard-component-chart-holder fade-out",
"div.header-controls",
"div.chart-slice",
"div.text-container",
"div.dashboard-component-header header-large",
"div.dashboard-header css-1qf0uii",
"div.chart-slice",
"ul.ant-menu ant-menu-light main-nav css-188dvs4 ant-menu-root ant-menu-horizontal",
"div.hoverinfo",
"div.grid-row background--transparent"
]
},
"type": "sankey"
}
],
"layout": {
"template": {
"data": {
"bar": [
{
"error_x": {
"color": "#2a3f5f"
},
"error_y": {
"color": "#2a3f5f"
},
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
},
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "bar"
}
],
"barpolar": [
{
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
},
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "barpolar"
}
],
"carpet": [
{
"aaxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"baxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"type": "carpet"
}
],
"choropleth": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "choropleth"
}
],
"contour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "contour"
}
],
"contourcarpet": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "contourcarpet"
}
],
"heatmap": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "heatmap"
}
],
"heatmapgl": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "heatmapgl"
}
],
"histogram": [
{
"marker": {
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "histogram"
}
],
"histogram2d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2d"
}
],
"histogram2dcontour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2dcontour"
}
],
"mesh3d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "mesh3d"
}
],
"parcoords": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "parcoords"
}
],
"pie": [
{
"automargin": true,
"type": "pie"
}
],
"scatter": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatter"
}
],
"scatter3d": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatter3d"
}
],
"scattercarpet": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattercarpet"
}
],
"scattergeo": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergeo"
}
],
"scattergl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergl"
}
],
"scattermapbox": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattermapbox"
}
],
"scatterpolar": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolar"
}
],
"scatterpolargl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolargl"
}
],
"scatterternary": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterternary"
}
],
"surface": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "surface"
}
],
"table": [
{
"cells": {
"fill": {
"color": "#EBF0F8"
},
"line": {
"color": "white"
}
},
"header": {
"fill": {
"color": "#C8D4E3"
},
"line": {
"color": "white"
}
},
"type": "table"
}
]
},
"layout": {
"annotationdefaults": {
"arrowcolor": "#2a3f5f",
"arrowhead": 0,
"arrowwidth": 1
},
"autotypenumbers": "strict",
"coloraxis": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"colorscale": {
"diverging": [
[
0,
"#8e0152"
],
[
0.1,
"#c51b7d"
],
[
0.2,
"#de77ae"
],
[
0.3,
"#f1b6da"
],
[
0.4,
"#fde0ef"
],
[
0.5,
"#f7f7f7"
],
[
0.6,
"#e6f5d0"
],
[
0.7,
"#b8e186"
],
[
0.8,
"#7fbc41"
],
[
0.9,
"#4d9221"
],
[
1,
"#276419"
]
],
"sequential": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"sequentialminus": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
]
},
"colorway": [
"#636efa",
"#EF553B",
"#00cc96",
"#ab63fa",
"#FFA15A",
"#19d3f3",
"#FF6692",
"#B6E880",
"#FF97FF",
"#FECB52"
],
"font": {
"color": "#2a3f5f"
},
"geo": {
"bgcolor": "white",
"lakecolor": "white",
"landcolor": "#E5ECF6",
"showlakes": true,
"showland": true,
"subunitcolor": "white"
},
"hoverlabel": {
"align": "left"
},
"hovermode": "closest",
"mapbox": {
"style": "light"
},
"paper_bgcolor": "white",
"plot_bgcolor": "#E5ECF6",
"polar": {
"angularaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"radialaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"scene": {
"xaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"yaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"zaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
}
},
"shapedefaults": {
"line": {
"color": "#2a3f5f"
}
},
"ternary": {
"aaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"baxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"caxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"title": {
"x": 0.05
},
"xaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
},
"yaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
}
}
}
}
},
"text/html": [
"<div> <div id=\"3c045324-ae46-488c-bea6-b7dea93ac970\" class=\"plotly-graph-div\" style=\"height:525px; width:100%;\"></div> <script type=\"text/javascript\"> require([\"plotly\"], function(Plotly) { window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById(\"3c045324-ae46-488c-bea6-b7dea93ac970\")) { Plotly.newPlot( \"3c045324-ae46-488c-bea6-b7dea93ac970\", [{\"link\":{\"source\":[0,1,2,0,3,4,5,6,7,8,7,5,4,7,0,2,3,9,10,2,11,12,13,14,15,14,16,13,13,11,16,12,2,1,2,9,17,18,19,20,21,18,17,9,2,12,8,22,23,8,0,24,23,9,12,24,2,23,25,8,0,26,9,27,7,9],\"target\":[1,2,0,3,4,5,6,7,8,7,5,4,7,2,2,3,9,10,2,11,12,13,14,15,14,16,13,16,11,16,12,0,1,0,9,17,18,19,20,21,18,17,9,2,12,8,22,23,8,12,24,23,9,12,24,2,23,25,23,9,26,1,27,7,4,7],\"value\":[8,5,16,1,1,1,1,1,2,2,1,1,3,2,16,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,2,4,8,4,1,1,1,1,1,1,1,1,2,1,1,1,1,2,1,1,1,1,2,2,2,1,1,1,1,1,1,1,1,3,1]},\"node\":{\"label\":[\"path.[object SVGAnimatedString]\",\"circle.[object SVGAnimatedString]\",\"svg.[object SVGAnimatedString]\",\"div.superset-legacy-chart-world-map\",\"td. \",\"th. \",\"div.header-title\",\"div.dashboard-component dashboard-component-chart-holder fade-out\",\"div.dashboard-content css-wp1ax2\",\"div.dashboard-component dashboard-component-chart-holder fade-out\",\"div.grid-column background--transparent\",\"#document\",\"div.header-title\",\"svg.[object SVGAnimatedString]\",\"rect.[object SVGAnimatedString]\",\"path.[object SVGAnimatedString]\",\"text.[object SVGAnimatedString]\",\"div.grid-row background--transparent\",\"div.dashboard-component dashboard-component-chart-holder fade-out\",\"div.header-controls\",\"div.chart-slice\",\"div.text-container\",\"div.dashboard-component-header header-large\",\"div.dashboard-header css-1qf0uii\",\"div.chart-slice\",\"ul.ant-menu ant-menu-light main-nav css-188dvs4 ant-menu-root ant-menu-horizontal\",\"div.hoverinfo\",\"div.grid-row background--transparent\"]},\"type\":\"sankey\"}], {\"template\":{\"data\":{\"bar\":[{\"error_x\":{\"color\":\"#2a3f5f\"},\"error_y\":{\"color\":\"#2a3f5f\"},\"marker\":{\"line\":{\"color\":\"#E5ECF6\",\"width\":0.5},\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"bar\"}],\"barpolar\":[{\"marker\":{\"line\":{\"color\":\"#E5ECF6\",\"width\":0.5},\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"barpolar\"}],\"carpet\":[{\"aaxis\":{\"endlinecolor\":\"#2a3f5f\",\"gridcolor\":\"white\",\"linecolor\":\"white\",\"minorgridcolor\":\"white\",\"startlinecolor\":\"#2a3f5f\"},\"baxis\":{\"endlinecolor\":\"#2a3f5f\",\"gridcolor\":\"white\",\"linecolor\":\"white\",\"minorgridcolor\":\"white\",\"startlinecolor\":\"#2a3f5f\"},\"type\":\"carpet\"}],\"choropleth\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"type\":\"choropleth\"}],\"contour\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"contour\"}],\"contourcarpet\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"type\":\"contourcarpet\"}],\"heatmap\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"heatmap\"}],\"heatmapgl\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"heatmapgl\"}],\"histogram\":[{\"marker\":{\"pattern\":{\"fillmode\":\"overlay\",\"size\":10,\"solidity\":0.2}},\"type\":\"histogram\"}],\"histogram2d\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"histogram2d\"}],\"histogram2dcontour\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"histogram2dcontour\"}],\"mesh3d\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"type\":\"mesh3d\"}],\"parcoords\":[{\"line\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"parcoords\"}],\"pie\":[{\"automargin\":true,\"type\":\"pie\"}],\"scatter\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatter\"}],\"scatter3d\":[{\"line\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatter3d\"}],\"scattercarpet\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scattercarpet\"}],\"scattergeo\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scattergeo\"}],\"scattergl\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scattergl\"}],\"scattermapbox\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scattermapbox\"}],\"scatterpolar\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatterpolar\"}],\"scatterpolargl\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatterpolargl\"}],\"scatterternary\":[{\"marker\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"type\":\"scatterternary\"}],\"surface\":[{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"},\"colorscale\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"type\":\"surface\"}],\"table\":[{\"cells\":{\"fill\":{\"color\":\"#EBF0F8\"},\"line\":{\"color\":\"white\"}},\"header\":{\"fill\":{\"color\":\"#C8D4E3\"},\"line\":{\"color\":\"white\"}},\"type\":\"table\"}]},\"layout\":{\"annotationdefaults\":{\"arrowcolor\":\"#2a3f5f\",\"arrowhead\":0,\"arrowwidth\":1},\"autotypenumbers\":\"strict\",\"coloraxis\":{\"colorbar\":{\"outlinewidth\":0,\"ticks\":\"\"}},\"colorscale\":{\"diverging\":[[0,\"#8e0152\"],[0.1,\"#c51b7d\"],[0.2,\"#de77ae\"],[0.3,\"#f1b6da\"],[0.4,\"#fde0ef\"],[0.5,\"#f7f7f7\"],[0.6,\"#e6f5d0\"],[0.7,\"#b8e186\"],[0.8,\"#7fbc41\"],[0.9,\"#4d9221\"],[1,\"#276419\"]],\"sequential\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]],\"sequentialminus\":[[0.0,\"#0d0887\"],[0.1111111111111111,\"#46039f\"],[0.2222222222222222,\"#7201a8\"],[0.3333333333333333,\"#9c179e\"],[0.4444444444444444,\"#bd3786\"],[0.5555555555555556,\"#d8576b\"],[0.6666666666666666,\"#ed7953\"],[0.7777777777777778,\"#fb9f3a\"],[0.8888888888888888,\"#fdca26\"],[1.0,\"#f0f921\"]]},\"colorway\":[\"#636efa\",\"#EF553B\",\"#00cc96\",\"#ab63fa\",\"#FFA15A\",\"#19d3f3\",\"#FF6692\",\"#B6E880\",\"#FF97FF\",\"#FECB52\"],\"font\":{\"color\":\"#2a3f5f\"},\"geo\":{\"bgcolor\":\"white\",\"lakecolor\":\"white\",\"landcolor\":\"#E5ECF6\",\"showlakes\":true,\"showland\":true,\"subunitcolor\":\"white\"},\"hoverlabel\":{\"align\":\"left\"},\"hovermode\":\"closest\",\"mapbox\":{\"style\":\"light\"},\"paper_bgcolor\":\"white\",\"plot_bgcolor\":\"#E5ECF6\",\"polar\":{\"angularaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"},\"bgcolor\":\"#E5ECF6\",\"radialaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"}},\"scene\":{\"xaxis\":{\"backgroundcolor\":\"#E5ECF6\",\"gridcolor\":\"white\",\"gridwidth\":2,\"linecolor\":\"white\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"white\"},\"yaxis\":{\"backgroundcolor\":\"#E5ECF6\",\"gridcolor\":\"white\",\"gridwidth\":2,\"linecolor\":\"white\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"white\"},\"zaxis\":{\"backgroundcolor\":\"#E5ECF6\",\"gridcolor\":\"white\",\"gridwidth\":2,\"linecolor\":\"white\",\"showbackground\":true,\"ticks\":\"\",\"zerolinecolor\":\"white\"}},\"shapedefaults\":{\"line\":{\"color\":\"#2a3f5f\"}},\"ternary\":{\"aaxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"},\"baxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"},\"bgcolor\":\"#E5ECF6\",\"caxis\":{\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\"}},\"title\":{\"x\":0.05},\"xaxis\":{\"automargin\":true,\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\",\"title\":{\"standoff\":15},\"zerolinecolor\":\"white\",\"zerolinewidth\":2},\"yaxis\":{\"automargin\":true,\"gridcolor\":\"white\",\"linecolor\":\"white\",\"ticks\":\"\",\"title\":{\"standoff\":15},\"zerolinecolor\":\"white\",\"zerolinewidth\":2}}}}, {\"responsive\": true} ).then(function(){\n",
" \n",
"var gd = document.getElementById('3c045324-ae46-488c-bea6-b7dea93ac970');\n",
"var x = new MutationObserver(function (mutations, observer) {{\n",
" var display = window.getComputedStyle(gd).display;\n",
" if (!display || display === 'none') {{\n",
" console.log([gd, 'removed!']);\n",
" Plotly.purge(gd);\n",
" observer.disconnect();\n",
" }}\n",
"}});\n",
"\n",
"// Listen for the removal of the full notebook cells\n",
"var notebookContainer = gd.closest('#notebook-container');\n",
"if (notebookContainer) {{\n",
" x.observe(notebookContainer, {childList: true});\n",
"}}\n",
"\n",
"// Listen for the clearing of the current output cell\n",
"var outputEl = gd.closest('.output');\n",
"if (outputEl) {{\n",
" x.observe(outputEl, {childList: true});\n",
"}}\n",
"\n",
" }) }; }); </script> </div>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"distill.sankey(edges_map_1,[nodes_list_map_2[item].split(\"|\")[0] for item in range(len(nodes_list_map_2))])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3fb3d8d8",
"metadata": {},
"outputs": [],
"source": [
"distill.funnel"
]
},
{
"cell_type": "markdown",
"id": "11b105f5",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "e12b4d7d",
"metadata": {},
"outputs": [],
"source": [
"#clickRate"
]
},
{
"cell_type": "markdown",
"id": "a06bf50a",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "90badd8d",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "348e3a5b",
"metadata": {},
"outputs": [],
"source": [
"edge_list_temp = []\n",
"for row in edges_segmentN:\n",
" if row[0] != row[1]: \n",
" edge_list_temp.append(row)\n",
"edge_list = edge_list_temp\n",
"\n",
"edge_list_counter = Counter(edge_list)\n",
"\n",
"source_list = [i[0] for i in edge_list_counter.keys()]\n",
"target_list = [i[1] for i in edge_list_counter.keys()]\n",
"value_list = [i for i in edge_list_counter.values()]\n",
"\n",
"nodes = []\n",
"for row in edge_list:\n",
" for col in row:\n",
" if col not in nodes:\n",
" nodes.append(col) \n",
" \n",
"sources = []\n",
"for i in source_list:\n",
" sources.append(nodes.index(i))\n",
"targets = []\n",
"for i in target_list:\n",
" targets.append(nodes.index(i))\n",
"values = value_list\n",
"\n",
"fig = go.Figure(data=[go.Sankey(\n",
" node = dict(\n",
" label = [nodes[item].split(\"|\")[0] for item in range(len(nodes))],\n",
" ),\n",
" link = dict(\n",
" source = sources,\n",
" target = targets,\n",
" value = values\n",
" ))])\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"id": "9bd03731",
"metadata": {},
"source": [
"# WIP"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dc8b6cdd",
"metadata": {},
"outputs": [],
"source": [
"x = [hashlib.md5('_'.join(log['path']).encode('utf-8')).digest() for log in finalSegments['...'].values()]\n",
"y = [hashlib.md5('_'.join(log['path']).encode('utf-8')).digest() for log in finalSegments['...'].values()]\n",
"set(x) & set (y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8fbf9bbb",
"metadata": {},
"outputs": [],
"source": [
"x = ['_'.join(log['path']) for log in finalSegments['...'].values()]\n",
"y = ['_'.join(log['path']) for log in finalSegments['...'].values()]\n",
"set(x) & set(y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eb1e095c",
"metadata": {},
"outputs": [],
"source": [
"nx.graph_edit_distance(G_segmentN, G_segmentN)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f79c272",
"metadata": {},
"outputs": [],
"source": [
"for v in nx.optimize_graph_edit_distance(G_segmentN, G_segmentN):\n",
" minv = v\n",
"minv"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a01f8ef7",
"metadata": {},
"outputs": [],
"source": [
"dictionary[new_key] = dictionary.pop(old_key)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "20cf3107",
"metadata": {},
"outputs": [],
"source": [
"superSegments"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}