products/distill/docs/Example_Usage.md - flagon - Git at Google

 <!--
   ~ Licensed to the Apache Software Foundation (ASF) under one
   ~ or more contributor license agreements.  See the NOTICE file
   ~ distributed with this work for additional information
   ~ regarding copyright ownership.  The ASF licenses this file
   ~ to you under the Apache License, Version 2.0 (the
   ~ "License"); you may not use this file except in compliance
   ~ with the License.  You may obtain a copy of the License at
   ~
   ~   http://www.apache.org/licenses/LICENSE-2.0
   ~
   ~ Unless required by applicable law or agreed to in writing,
   ~ software distributed under the License is distributed on an
   ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   ~ KIND, either express or implied.  See the License for the
   ~ specific language governing permissions and limitations
   ~ under the License.
 -->

 > \<!\-\--Licensed to the Apache Software Foundation (ASF) under one or
 > more contributor license agreements. See the NOTICE file distributed
 > with this work for additional information regarding copyright
 > ownership. The ASF licenses this file to You under the Apache License,
 > Version 2.0 (the \"License\"); you may not use this file except in
 > compliance with the License. You may obtain a copy of the License at
 >
 > > <http://www.apache.org/licenses/LICENSE-2.0>
 >
 > Unless required by applicable law or agreed to in writing, software
 > distributed under the License is distributed on an \"AS IS\" BASIS,
 > WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
 > implied. See the License for the specific language governing
 > permissions and limitations under the License. \-\--\>

 # Example Usage of Distill

 In this example, we run through a simulated user experiment using
 UserALE data generated within an instantiation of Superset. This data
 reflects four simulated user sessions in which the user performs three
 tasks within the Video Game Sales example dashboard:

 1.  Filter the Video Games Sales by Wii, Racing, and Nintendo.
 2.  Find Mario Kart in the list of games.
 3.  Determine the difference in global sales between the 3DS game
     Nintendogs + cats and Wii Sports.

 A screenshot of this Superset dashboard can be seen below:

 ![image](./images/Superset_Dashboard.png){width="700px"}

 The data of these four sessions is captured in a JSON file entitled
 [task_example.json]{.title-ref}. In the following example, we will:

 -   Show how to use Distill\'s Segmentation package to create useful
     `Segments` of data.
 -   Visualize `Segment` objects using timeline/gantt and digraph
     visualizations.
 -   Compare each of these user sessions through the investigation of
     edit distances.

 **Note: The data utilized in this example was not data collected in any
 user study. Rather this data is simulated through developer interactions
 with the Superset dashboard.**

 ## Imports

 The first step in this example is to import all of the packages that we
 need. We do this with the code below:

 ``` python
 import datetime
 import distill
 import json
 import networkx as nx
 import os
 import pandas as pd
 import plotly.express as px
 import re
 ```

 ## Processing and Segmentation

 Now that we have imported all of the necessary packages, we can begin to
 process and segment the data. This can be done by creating
 `Segment`/`Segments` objects. These objects will help us to visualize
 the data and understand processes that the users took to perform each of
 the three tasks.

 ### Processing the JSON File

 The `setup` function is used to convert a JSON file into the required
 format for segmentation. It also allows us to assert the date format
 that we want to use for our analysis (i.e., integer or `datetime`).
 Below we define this function:

 ``` python
 def setup(file, date_type):
     with open(file) as json_file:
         raw_data = json.load(json_file)

     data = {}
     for log in raw_data:
         data[distill.getUUID(log)] = log

     # Convert clientTime to specified type
     for uid in data:
         log = data[uid]
         client_time = log['clientTime']
         if date_type == "integer":
             log['clientTime'] = distill.epoch_to_datetime(client_time)
         elif date_type == "datetime":
             log['clientTime'] = pd.to_datetime(client_time, unit='ms', origin='unix')

     # Sort
     sorted_data = sorted(data.items(), key=lambda kv: kv[1]['clientTime'])
     sorted_dict = dict(sorted_data)

     return (sorted_data, sorted_dict)
 ```

 Using this function, we can process the UserALE data and create
 `Segment` objects that represent each of the four user sessions. This is
 shown below through the utilization of the
 `generate_collapsing_window_segments` function.

 ``` python
 data_many_session = setup("./data/task_example.json", "datetime")
 sorted_dict = data_many_session[1]

 # Create segments based on sessionID
 segments = distill.Segments()
 session_ids = sorted(distill.find_meta_values('sessionID', sorted_dict), key=lambda sessionID: sessionID)
 for session_id in session_ids:
     segments.append_segments(distill.generate_collapsing_window_segments(sorted_dict, 'sessionID', [session_id], session_id))

 # Improve readability of Segment names
 for index in range(len(segments)):
     segments[index].segment_name = "Session" + str(index)
 ```

 Below we list out each of the created `Segment` objects along with their
 number of logs and start times.

 ``` console
 Session0   Length: 427   Start Time: 2022-05-16 21:25:57.935000
 Session1   Length: 236   Start Time: 2022-05-16 21:27:38.283000
 Session2   Length: 332   Start Time: 2022-05-16 21:28:59.774000
 Session3   Length: 219   Start Time: 2022-05-16 21:30:25.633000
 ```

 ### Further Segmentation of Sessions

 Now that there are `Segment` objects that represent each session, let\'s
 write the `Segment` objects. This will allow us to further segment these
 session segments to analyze the activity of the user during each of
 these sessions. This can be done with the following code:

 ``` python
 segment_names = [segment.segment_name for segment in segments]
 start_end_vals = [segment.start_end_val for segment in segments]
 segment_map = distill.write_segment(sorted_dict, segment_names, start_end_vals)
 ```

 We can now generate `Segments` objects within each of those session
 segments that represent user interactions on two different elements of
 the Superset dashboard.

 The first element involves user interactions with the filter window that
 filters the list of video games (shown in the screenshot below). The
 element in the path that represents these interactions is
 \"div.filter-container css-ffe7is.\"

 ![image](./images/Video_Game_Filter.png){width="500px"}

 The second element involves interactions with the actual list of video
 games (shown in the screenshot below) represented by the
 \"div#chart-id-110.superset-chart-table\" path element.

 ![image](./images/Games_List.png){width="500px"}

 By creating `Segment` objects that show user interaction on these two
 windows, we can get an understanding of how the user is using the
 Superset dashboard to complete the three tasks. We create these
 `Segment` objects with the following code:

 ``` python
 session_0_segments = distill.generate_collapsing_window_segments(segment_map['Session0'], 'path', ['div.filter-container css-ffe7is'], "Game_Filter")
 session_1_segments = distill.generate_collapsing_window_segments(segment_map['Session1'], 'path', ['div.filter-container css-ffe7is'], "Game_Filter")
 session_2_segments = distill.generate_collapsing_window_segments(segment_map['Session2'], 'path', ['div.filter-container css-ffe7is'], "Game_Filter")
 session_3_segments = distill.generate_collapsing_window_segments(segment_map['Session3'], 'path', ['div.filter-container css-ffe7is'], "Game_Filter")

 session_0_segments.append_segments(distill.generate_collapsing_window_segments(segment_map['Session0'], 'path', ['div#chart-id-110.superset-chart-table'], "Games"))
 session_1_segments.append_segments(distill.generate_collapsing_window_segments(segment_map['Session1'], 'path', ['div#chart-id-110.superset-chart-table'], "Games"))
 session_2_segments.append_segments(distill.generate_collapsing_window_segments(segment_map['Session2'], 'path', ['div#chart-id-110.superset-chart-table'], "Games"))
 session_3_segments.append_segments(distill.generate_collapsing_window_segments(segment_map['Session3'], 'path', ['div#chart-id-110.superset-chart-table'], "Games"))
 ```

 Now, we append each of those newly generated `Segments` objects to the
 overarching segments variable. This will create one large `Segments`
 object that contains all `Segment` objects from all sessions.

 ``` python
 segments.append_segments(session_0_segments)
 segments.append_segments(session_1_segments)
 segments.append_segments(session_2_segments)
 segments.append_segments(session_3_segments)
 ```

 ## Visualization of `Segment` Objects

 To understand these `Segment` objects better, we can visualize them.
 First, we will visualize them using Plotly\'s timeline function, then we
 will analyze them by creating DiGraphs.

 ### Visualization with Plotly\'s Timeline

 The following code can be used to define a function that will display a
 Plotly timeline of each of the `Segment` objects:

 ``` python
 def display_segments(segments):
     segment_list = []
     for segment in segments:
         if not isinstance(segment.start_end_val[0], datetime.datetime) or not isinstance(segment.start_end_val[1], datetime.datetime):
             new_segment = distill.Segment()
             new_segment.segment_name = segment.segment_name
             new_segment.num_logs = segment.num_logs
             new_segment.uids = segment.uids
             new_segment.generate_field_name = segment.generate_field_name
             new_segment.generate_matched_values = segment.generate_matched_values
             new_segment.segment_type = segment.segment_type
             new_segment.start_end_val = (pd.to_datetime(segment.start_end_val[0], unit='ms', origin='unix'), pd.to_datetime(segment.start_end_val[1], unit='ms', origin='unix'))
             segment_list.append(new_segment)
         else:
             segment_list.append(segment)
     new_segments = distill.Segments(segments=segment_list)
     distill.export_segments("./test.csv",new_segments)
     df = pd.read_csv("./test.csv")
     fig = px.timeline(df, x_start="Start Time", x_end="End Time", y="Segment Name", color="Number of Logs")
     fig.update_yaxes(autorange="reversed")
     os.remove("./test.csv")
     fig.show()
 ```

 Using this code, we can visualize the `Segment` objects we created.

 ``` python
 display_segments(segments)
 ```

 This will produce the following timeline graph:

 ![image](./images/Timeline_Graph.png){width="700px"}

 This graph shows the number of logs in each `Segment` while also showing
 the length of time each `Segment` represents. We can also begin to
 understand some of the interactions that each user had with the
 dashboard by analyzing the `Segment` objects that exist within each
 overarching session `Segment`.

 ### Visualizing User Workflows with DiGraphs

 Another way we can visualize user workflows is through the creation and
 analysis of DiGraphs. The function below (`draw_digraph`) draws a
 DiGraph based on the passed in `Segments` object. These graphs are
 colored in such a way that interactions with the video game filter are
 colored in green while the interactions with the list of video games are
 colored in blue.

 ``` python
 def draw_digraph(segments):
     nodes = sorted(segments.get_segment_list(), key=lambda segment: segment.start_end_val[0])
     edges = distill.pairwiseSeq(segments.get_segment_list())

     # Set coloring of graph based on element in Superset dashboard
     color_map = []
     for segment in segments:
         if re.match("Game_Filter\S*", segment.segment_name):
             color_map.append('green')
         else:
             color_map.append('blue')

     graph = distill.createDiGraph(nodes, edges)
     nx.draw(graph, node_color=color_map)
     return graph
 ```

 We can now use this function to create DiGraphs of each of the user
 sessions.

 **Graph 0 - Session 0**

 ``` python
 G0 = draw_digraph(session_0_segments)
 ```

 ![image](./images/Graph_0.png){width="400px"}

 **Graph 1 - Session 1**

 ``` python
 G1 = draw_digraph(session_1_segments)
 ```

 ![image](./images/Graph_1.png){width="400px"}

 **Graph 2 - Session 2**

 ``` python
 G2 = draw_digraph(session_2_segments)
 ```

 ![image](./images/Graph_2.png){width="400px"}

 **Graph 3 - Session 3**

 ``` python
 G3 = draw_digraph(session_3_segments)
 ```

 ![image](./images/Graph_3.png){width="400px"}

 By analyzing these graphs, we can understand the general interactions
 that users had with the two elements of the Superset dashboard. For
 instance, in each of these graphs, the user starts by filtering the
 dashboard. Based on the tasks that the user is meant to perform, this
 makes a lot of sense since the most logical way to filter the dashboard
 is through the filtering window. However, these graphs begin to differ
 in the amount of interactions that the user has with the actual list of
 video games. While users always follow the workflow: filter \--\> game
 list \--\> filter \--\> game list, there are occasions when the user
 interacts more with the game list than others.

 ## Measuring Similarity with Edit Distance

 One way to understand the differences between the previously generated
 DiGraphs is to look at their edit distance. Edit distance is a metric
 that measures how many distortions are necessary to turn one graph into
 another, thus measuring similarity. For instance, taking the edit
 distance between a graph and itself yields an edit distance of 0, since
 the graphs are exactly the same. We can show this using NetworkX\'s
 `graph_edit_distance` to calculate the edit distance.

 **Input**

 ``` python
 nx.graph_edit_distance(G0, G0)  # 0.0
 ```

 Let\'s now calculate the edit distances between each graph to calculate
 an average. Note, however, that when we try to calculate some edit
 distances, we run into a bit of an issue. Since edit distance is a
 computationally complex problem, it can take a long time and require
 large amounts of computational resources to find an exact answer. To
 simplify this problem, we can use NetworkX\'s
 `optimize_graph_edit_distance` function which will create an
 approximation of the graph edit distance. For the following
 calculations, we use the function required depending on the length of
 time `graph_edit_distance` takes in each circumstance.

 **Input: G0, G1**

 ``` python
 next(nx.optimize_graph_edit_distance(G0, G1))   # 32.0
 ```

 **Input: G0, G2**

 ``` python
 next(nx.optimize_graph_edit_distance(G0, G2))   # 34.0
 ```

 **Input: G0, G3**

 ``` python
 next(nx.optimize_graph_edit_distance(G0, G3))   # 38.0
 ```

 **Input: G1, G2**

 ``` python
 nx.graph_edit_distance(G1, G2)   # 2.0
 ```

 **Input: G1, G3**

 ``` python
 next(nx.optimize_graph_edit_distance(G1, G3))   # 18.0
 ```

 **Input: G2, G3**

 ``` python
 nx.graph_edit_distance(G2, G3)   # 4.0
 ```

 Using these outputs we can now calculate the average edit distance with
 the following calculation:

 ``` python
 (32.0 + 34.0 + 38.0 + 2.0 + 18.0 + 4.0)/6   # 21.33
 ```

 This shows that the average edit distance between each of these session
 DiGraphs is 21.33.
	<!--
	~ Licensed to the Apache Software Foundation (ASF) under one
	~ or more contributor license agreements. See the NOTICE file
	~ distributed with this work for additional information
	~ regarding copyright ownership. The ASF licenses this file
	~ to you under the Apache License, Version 2.0 (the
	~ "License"); you may not use this file except in compliance
	~ with the License. You may obtain a copy of the License at
	~
	~ http://www.apache.org/licenses/LICENSE-2.0
	~
	~ Unless required by applicable law or agreed to in writing,
	~ software distributed under the License is distributed on an
	~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	~ KIND, either express or implied. See the License for the
	~ specific language governing permissions and limitations
	~ under the License.
	-->

	> \<!\-\--Licensed to the Apache Software Foundation (ASF) under one or
	> more contributor license agreements. See the NOTICE file distributed
	> with this work for additional information regarding copyright
	> ownership. The ASF licenses this file to You under the Apache License,
	> Version 2.0 (the \"License\"); you may not use this file except in
	> compliance with the License. You may obtain a copy of the License at
	>
	> > <http://www.apache.org/licenses/LICENSE-2.0>
	>
	> Unless required by applicable law or agreed to in writing, software
	> distributed under the License is distributed on an \"AS IS\" BASIS,
	> WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
	> implied. See the License for the specific language governing
	> permissions and limitations under the License. \-\--\>

	# Example Usage of Distill

	In this example, we run through a simulated user experiment using
	UserALE data generated within an instantiation of Superset. This data
	reflects four simulated user sessions in which the user performs three
	tasks within the Video Game Sales example dashboard:

	1. Filter the Video Games Sales by Wii, Racing, and Nintendo.
	2. Find Mario Kart in the list of games.
	3. Determine the difference in global sales between the 3DS game
	Nintendogs + cats and Wii Sports.

	A screenshot of this Superset dashboard can be seen below:

	![image](./images/Superset_Dashboard.png){width="700px"}

	The data of these four sessions is captured in a JSON file entitled
	[task_example.json]{.title-ref}. In the following example, we will:

	- Show how to use Distill\'s Segmentation package to create useful
	`Segments` of data.
	- Visualize `Segment` objects using timeline/gantt and digraph
	visualizations.
	- Compare each of these user sessions through the investigation of
	edit distances.

	**Note: The data utilized in this example was not data collected in any
	user study. Rather this data is simulated through developer interactions
	with the Superset dashboard.**

	## Imports

	The first step in this example is to import all of the packages that we
	need. We do this with the code below:

	``` python
	import datetime
	import distill
	import json
	import networkx as nx
	import os
	import pandas as pd
	import plotly.express as px
	import re
	```

	## Processing and Segmentation

	Now that we have imported all of the necessary packages, we can begin to
	process and segment the data. This can be done by creating
	`Segment`/`Segments` objects. These objects will help us to visualize
	the data and understand processes that the users took to perform each of
	the three tasks.

	### Processing the JSON File

	The `setup` function is used to convert a JSON file into the required
	format for segmentation. It also allows us to assert the date format
	that we want to use for our analysis (i.e., integer or `datetime`).
	Below we define this function:

	``` python
	def setup(file, date_type):
	with open(file) as json_file:
	raw_data = json.load(json_file)

	data = {}
	for log in raw_data:
	data[distill.getUUID(log)] = log

	# Convert clientTime to specified type
	for uid in data:
	log = data[uid]
	client_time = log['clientTime']
	if date_type == "integer":
	log['clientTime'] = distill.epoch_to_datetime(client_time)
	elif date_type == "datetime":
	log['clientTime'] = pd.to_datetime(client_time, unit='ms', origin='unix')

	# Sort
	sorted_data = sorted(data.items(), key=lambda kv: kv[1]['clientTime'])
	sorted_dict = dict(sorted_data)

	return (sorted_data, sorted_dict)
	```

	Using this function, we can process the UserALE data and create
	`Segment` objects that represent each of the four user sessions. This is
	shown below through the utilization of the
	`generate_collapsing_window_segments` function.

	``` python
	data_many_session = setup("./data/task_example.json", "datetime")
	sorted_dict = data_many_session[1]

	# Create segments based on sessionID
	segments = distill.Segments()
	session_ids = sorted(distill.find_meta_values('sessionID', sorted_dict), key=lambda sessionID: sessionID)
	for session_id in session_ids:
	segments.append_segments(distill.generate_collapsing_window_segments(sorted_dict, 'sessionID', [session_id], session_id))

	# Improve readability of Segment names
	for index in range(len(segments)):
	segments[index].segment_name = "Session" + str(index)
	```

	Below we list out each of the created `Segment` objects along with their
	number of logs and start times.

	``` console
	Session0 Length: 427 Start Time: 2022-05-16 21:25:57.935000
	Session1 Length: 236 Start Time: 2022-05-16 21:27:38.283000
	Session2 Length: 332 Start Time: 2022-05-16 21:28:59.774000
	Session3 Length: 219 Start Time: 2022-05-16 21:30:25.633000
	```

	### Further Segmentation of Sessions

	Now that there are `Segment` objects that represent each session, let\'s
	write the `Segment` objects. This will allow us to further segment these
	session segments to analyze the activity of the user during each of
	these sessions. This can be done with the following code:

	``` python
	segment_names = [segment.segment_name for segment in segments]
	start_end_vals = [segment.start_end_val for segment in segments]
	segment_map = distill.write_segment(sorted_dict, segment_names, start_end_vals)
	```

	We can now generate `Segments` objects within each of those session
	segments that represent user interactions on two different elements of
	the Superset dashboard.

	The first element involves user interactions with the filter window that
	filters the list of video games (shown in the screenshot below). The
	element in the path that represents these interactions is
	\"div.filter-container css-ffe7is.\"

	![image](./images/Video_Game_Filter.png){width="500px"}

	The second element involves interactions with the actual list of video
	games (shown in the screenshot below) represented by the
	\"div#chart-id-110.superset-chart-table\" path element.

	![image](./images/Games_List.png){width="500px"}

	By creating `Segment` objects that show user interaction on these two
	windows, we can get an understanding of how the user is using the
	Superset dashboard to complete the three tasks. We create these
	`Segment` objects with the following code:

	``` python
	session_0_segments = distill.generate_collapsing_window_segments(segment_map['Session0'], 'path', ['div.filter-container css-ffe7is'], "Game_Filter")
	session_1_segments = distill.generate_collapsing_window_segments(segment_map['Session1'], 'path', ['div.filter-container css-ffe7is'], "Game_Filter")
	session_2_segments = distill.generate_collapsing_window_segments(segment_map['Session2'], 'path', ['div.filter-container css-ffe7is'], "Game_Filter")
	session_3_segments = distill.generate_collapsing_window_segments(segment_map['Session3'], 'path', ['div.filter-container css-ffe7is'], "Game_Filter")

	session_0_segments.append_segments(distill.generate_collapsing_window_segments(segment_map['Session0'], 'path', ['div#chart-id-110.superset-chart-table'], "Games"))
	session_1_segments.append_segments(distill.generate_collapsing_window_segments(segment_map['Session1'], 'path', ['div#chart-id-110.superset-chart-table'], "Games"))
	session_2_segments.append_segments(distill.generate_collapsing_window_segments(segment_map['Session2'], 'path', ['div#chart-id-110.superset-chart-table'], "Games"))
	session_3_segments.append_segments(distill.generate_collapsing_window_segments(segment_map['Session3'], 'path', ['div#chart-id-110.superset-chart-table'], "Games"))
	```

	Now, we append each of those newly generated `Segments` objects to the
	overarching segments variable. This will create one large `Segments`
	object that contains all `Segment` objects from all sessions.

	``` python
	segments.append_segments(session_0_segments)
	segments.append_segments(session_1_segments)
	segments.append_segments(session_2_segments)
	segments.append_segments(session_3_segments)
	```

	## Visualization of `Segment` Objects

	To understand these `Segment` objects better, we can visualize them.
	First, we will visualize them using Plotly\'s timeline function, then we
	will analyze them by creating DiGraphs.

	### Visualization with Plotly\'s Timeline

	The following code can be used to define a function that will display a
	Plotly timeline of each of the `Segment` objects:

	``` python
	def display_segments(segments):
	segment_list = []
	for segment in segments:
	if not isinstance(segment.start_end_val[0], datetime.datetime) or not isinstance(segment.start_end_val[1], datetime.datetime):
	new_segment = distill.Segment()
	new_segment.segment_name = segment.segment_name
	new_segment.num_logs = segment.num_logs
	new_segment.uids = segment.uids
	new_segment.generate_field_name = segment.generate_field_name
	new_segment.generate_matched_values = segment.generate_matched_values
	new_segment.segment_type = segment.segment_type
	new_segment.start_end_val = (pd.to_datetime(segment.start_end_val[0], unit='ms', origin='unix'), pd.to_datetime(segment.start_end_val[1], unit='ms', origin='unix'))
	segment_list.append(new_segment)
	else:
	segment_list.append(segment)
	new_segments = distill.Segments(segments=segment_list)
	distill.export_segments("./test.csv",new_segments)
	df = pd.read_csv("./test.csv")
	fig = px.timeline(df, x_start="Start Time", x_end="End Time", y="Segment Name", color="Number of Logs")
	fig.update_yaxes(autorange="reversed")
	os.remove("./test.csv")
	fig.show()
	```

	Using this code, we can visualize the `Segment` objects we created.

	``` python
	display_segments(segments)
	```

	This will produce the following timeline graph:

	![image](./images/Timeline_Graph.png){width="700px"}

	This graph shows the number of logs in each `Segment` while also showing
	the length of time each `Segment` represents. We can also begin to
	understand some of the interactions that each user had with the
	dashboard by analyzing the `Segment` objects that exist within each
	overarching session `Segment`.

	### Visualizing User Workflows with DiGraphs

	Another way we can visualize user workflows is through the creation and
	analysis of DiGraphs. The function below (`draw_digraph`) draws a
	DiGraph based on the passed in `Segments` object. These graphs are
	colored in such a way that interactions with the video game filter are
	colored in green while the interactions with the list of video games are
	colored in blue.

	``` python
	def draw_digraph(segments):
	nodes = sorted(segments.get_segment_list(), key=lambda segment: segment.start_end_val[0])
	edges = distill.pairwiseSeq(segments.get_segment_list())

	# Set coloring of graph based on element in Superset dashboard
	color_map = []
	for segment in segments:
	if re.match("Game_Filter\S*", segment.segment_name):
	color_map.append('green')
	else:
	color_map.append('blue')

	graph = distill.createDiGraph(nodes, edges)
	nx.draw(graph, node_color=color_map)
	return graph
	```

	We can now use this function to create DiGraphs of each of the user
	sessions.

	Graph 0 - Session 0

	``` python
	G0 = draw_digraph(session_0_segments)
	```

	![image](./images/Graph_0.png){width="400px"}

	Graph 1 - Session 1

	``` python
	G1 = draw_digraph(session_1_segments)
	```

	![image](./images/Graph_1.png){width="400px"}

	Graph 2 - Session 2

	``` python
	G2 = draw_digraph(session_2_segments)
	```

	![image](./images/Graph_2.png){width="400px"}

	Graph 3 - Session 3

	``` python
	G3 = draw_digraph(session_3_segments)
	```

	![image](./images/Graph_3.png){width="400px"}

	By analyzing these graphs, we can understand the general interactions
	that users had with the two elements of the Superset dashboard. For
	instance, in each of these graphs, the user starts by filtering the
	dashboard. Based on the tasks that the user is meant to perform, this
	makes a lot of sense since the most logical way to filter the dashboard
	is through the filtering window. However, these graphs begin to differ
	in the amount of interactions that the user has with the actual list of
	video games. While users always follow the workflow: filter \--\> game
	list \--\> filter \--\> game list, there are occasions when the user
	interacts more with the game list than others.

	## Measuring Similarity with Edit Distance

	One way to understand the differences between the previously generated
	DiGraphs is to look at their edit distance. Edit distance is a metric
	that measures how many distortions are necessary to turn one graph into
	another, thus measuring similarity. For instance, taking the edit
	distance between a graph and itself yields an edit distance of 0, since
	the graphs are exactly the same. We can show this using NetworkX\'s
	`graph_edit_distance` to calculate the edit distance.

	Input

	``` python
	nx.graph_edit_distance(G0, G0) # 0.0
	```

	Let\'s now calculate the edit distances between each graph to calculate
	an average. Note, however, that when we try to calculate some edit
	distances, we run into a bit of an issue. Since edit distance is a
	computationally complex problem, it can take a long time and require
	large amounts of computational resources to find an exact answer. To
	simplify this problem, we can use NetworkX\'s
	`optimize_graph_edit_distance` function which will create an
	approximation of the graph edit distance. For the following
	calculations, we use the function required depending on the length of
	time `graph_edit_distance` takes in each circumstance.

	Input: G0, G1

	``` python
	next(nx.optimize_graph_edit_distance(G0, G1)) # 32.0
	```

	Input: G0, G2

	``` python
	next(nx.optimize_graph_edit_distance(G0, G2)) # 34.0
	```

	Input: G0, G3

	``` python
	next(nx.optimize_graph_edit_distance(G0, G3)) # 38.0
	```

	Input: G1, G2

	``` python
	nx.graph_edit_distance(G1, G2) # 2.0
	```

	Input: G1, G3

	``` python
	next(nx.optimize_graph_edit_distance(G1, G3)) # 18.0
	```

	Input: G2, G3

	``` python
	nx.graph_edit_distance(G2, G3) # 4.0
	```

	Using these outputs we can now calculate the average edit distance with
	the following calculation:

	``` python
	(32.0 + 34.0 + 38.0 + 2.0 + 18.0 + 4.0)/6 # 21.33
	```

	This shows that the average edit distance between each of these session
	DiGraphs is 21.33.