blob: ec8fd2f67a2b97bd9e8c8ed983fb14dc4b165067 [file] [log] [blame]
<!-- HTML header for doxygen 1.8.4-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.13"/>
<meta name="keywords" content="madlib,postgres,greenplum,machine learning,data mining,deep learning,ensemble methods,data science,market basket analysis,affinity analysis,pca,lda,regression,elastic net,huber white,proportional hazards,k-means,latent dirichlet allocation,bayes,support vector machines,svm"/>
<title>MADlib: All Pairs Shortest Path</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
$(document).ready(initResizable);
</script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
$(document).ready(function() { init_search(); });
</script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
jax: ["input/TeX","output/HTML-CSS"],
});
</script><script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js"></script>
<!-- hack in the navigation tree -->
<script type="text/javascript" src="eigen_navtree_hacks.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="madlib_extra.css" rel="stylesheet" type="text/css"/>
<!-- google analytics -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-45382226-1', 'madlib.apache.org');
ga('send', 'pageview');
</script>
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td id="projectlogo"><a href="http://madlib.apache.org"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
<td style="padding-left: 0.5em;">
<div id="projectname">
<span id="projectnumber">1.17.0</span>
</div>
<div id="projectbrief">User Documentation for Apache MADlib</div>
</td>
<td> <div id="MSearchBox" class="MSearchBoxInactive">
<span class="left">
<img id="MSearchSelect" src="search/mag_sel.png"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
alt=""/>
<input type="text" id="MSearchField" value="Search" accesskey="S"
onfocus="searchBox.OnSearchFieldFocus(true)"
onblur="searchBox.OnSearchFieldFocus(false)"
onkeyup="searchBox.OnSearchFieldChange(event)"/>
</span><span class="right">
<a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
</span>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.13 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('group__grp__apsp.html','');});
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
</div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0"
name="MSearchResults" id="MSearchResults">
</iframe>
</div>
<div class="header">
<div class="headertitle">
<div class="title">All Pairs Shortest Path<div class="ingroups"><a class="el" href="group__grp__graph.html">Graph</a></div></div> </div>
</div><!--header-->
<div class="contents">
<div class="toc"><b>Contents</b> <ul>
<li>
<a href="#apsp">APSP</a> </li>
<li>
<a href="#notes">Notes</a> </li>
<li>
<a href="#examples">Examples</a> </li>
<li>
<a href="#literature">Literature</a> </li>
</ul>
</div><p>The all pairs shortest paths (APSP) algorithm finds the length (summed weights) of the shortest paths between all pairs of vertices, such that the sum of the weights of the path edges is minimized.</p>
<dl class="section warning"><dt>Warning</dt><dd>APSP is an expensive algorithm for run-time because it finds the shortest path between all nodes in the graph. It is recommended that you start with a small graph to get a sense of run-time for your use case, then increase size carefully from there. The worst case run-time for this implementation is O(V^2 * E) where V is the number of vertices and E is the number of edges. In practice, run-time will be generally be much less than this, but it depends on the graph. On a Greenplum cluster, the edge table should be distributed by the source vertex id column for better performance.</dd></dl>
<p><a class="anchor" id="apsp"></a></p><dl class="section user"><dt>APSP</dt><dd><pre class="syntax">
graph_apsp( vertex_table,
vertex_id,
edge_table,
edge_args,
out_table,
grouping_cols
)
</pre></dd></dl>
<p><b>Arguments</b> </p><dl class="arglist">
<dt>vertex_table </dt>
<dd><p class="startdd">TEXT. Name of the table containing the vertex data for the graph. Must contain the column specified in the 'vertex_id' parameter below.</p>
<p class="enddd"></p>
</dd>
<dt>vertex_id </dt>
<dd><p class="startdd">TEXT, default = 'id'. Name of the column in 'vertex_table' containing vertex ids. The vertex ids are of type BIGINT with no duplicates. They do not need to be contiguous.</p>
<p class="enddd"></p>
</dd>
<dt>edge_table </dt>
<dd><p class="startdd">TEXT. Name of the table containing the edge data. The edge table must contain columns for source vertex, destination vertex and edge weight. Column naming convention is described below in the 'edge_args' parameter.</p>
<p class="enddd"></p>
</dd>
<dt>edge_args </dt>
<dd><p class="startdd">TEXT. A comma-delimited string containing multiple named arguments of the form "name=value". The following parameters are supported for this string argument:</p><ul>
<li>src (BIGINT): Name of the column containing the source vertex ids in the edge table. Default column name is 'src'.</li>
<li>dest (BIGINT): Name of the column containing the destination vertex ids in the edge table. Default column name is 'dest'.</li>
<li>weight (FLOAT8): Name of the column containing the edge weights in the edge table. Default column name is 'weight'.</li>
</ul>
<p class="enddd"></p>
</dd>
<dt>out_table </dt>
<dd><p class="startdd">TEXT. Name of the table to store the result of APSP. It contains a row for every vertex of every group and have the following columns (in addition to the grouping columns):</p><ul>
<li>source_vertex: The id for the source vertex. Will use the input edge column 'src' for column naming.</li>
<li>dest_vertex: The id for the destination vertex. Will use the input edge column 'dest' for column naming.</li>
<li>weight: The total weight of the shortest path from the source vertex to the destination vertex. Will use the input parameter 'weight' for column naming.</li>
<li>parent: The parent of the destination vertex in the shortest path from source. Parent will equal dest_vertex if there are no intermediate vertices. Will use 'parent' for column naming.</li>
</ul>
<p>A summary table named &lt;out_table&gt;_summary is also created. This is an internal table that keeps a record of the input parameters and is used by the path retrieval function described below. </p>
<p class="enddd"></p>
</dd>
<dt>grouping_cols (optional) </dt>
<dd>TEXT, default = NULL. List of columns used to group the input into discrete subgraphs. These columns must exist in the edge table. When this value is null, no grouping is used and a single APSP result is generated. </dd>
</dl>
<dl class="section user"><dt>Path Retrieval</dt><dd></dd></dl>
<p>The path retrieval function returns the shortest path from the source vertex to a specified desination vertex.</p>
<pre class="syntax">
graph_apsp_get_path( apsp_table,
source_vertex,
dest_vertex,
path_table
)
</pre><p><b>Arguments</b> </p><dl class="arglist">
<dt>apsp_table </dt>
<dd><p class="startdd">TEXT. Name of the table that contains the APSP output.</p>
<p class="enddd"></p>
</dd>
<dt>source_vertex </dt>
<dd><p class="startdd">BIGINT. The vertex that will be the source of the desired path.</p>
<p class="enddd"></p>
</dd>
<dt>dest_vertex </dt>
<dd><p class="startdd">BIGINT. The vertex that will be the destination of the desired path.</p>
<p class="enddd"></p>
</dd>
<dt>path_table </dt>
<dd><p class="startdd">TEXT. Name of the output table that contains the path. It contains a row for every group and has the following columns:</p><ul>
<li>grouping_cols: The grouping columns given in the creation of the APSP table. If there are no grouping columns, these columns will not exist and the table will have a single row.</li>
<li>path (ARRAY): The shortest path from the source vertex to the destination vertex. </li>
</ul>
<p class="enddd"></p>
</dd>
</dl>
<p><a class="anchor" id="notes"></a></p><dl class="section user"><dt>Notes</dt><dd></dd></dl>
<p>Graphs with negative edges are supported but graphs with negative cycles are not.</p>
<p>The implementation is analogous to a matrix multiplication procedure. Please refer to the MADlib design document and references [1] and [2] for more details.</p>
<p>Also see the Grail project [3] for more background on graph analytics processing in relational databases.</p>
<p><a class="anchor" id="examples"></a></p><dl class="section user"><dt>Examples</dt><dd></dd></dl>
<ol type="1">
<li>Create vertex and edge tables to represent the graph: <pre class="syntax">
DROP TABLE IF EXISTS vertex, edge;
CREATE TABLE vertex(
id INTEGER
);
CREATE TABLE edge(
src INTEGER,
dest INTEGER,
weight FLOAT8
);
INSERT INTO vertex VALUES
(0),
(1),
(2),
(3),
(4),
(5),
(6),
(7);
INSERT INTO edge VALUES
(0, 1, 1.0),
(0, 2, 1.0),
(0, 4, 10.0),
(1, 2, 2.0),
(1, 3, 10.0),
(2, 3, 1.0),
(2, 5, 1.0),
(2, 6, 3.0),
(3, 0, 1.0),
(4, 0, -2.0),
(5, 6, 1.0),
(6, 7, 1.0);
</pre></li>
<li>Calculate the shortest paths: <pre class="syntax">
DROP TABLE IF EXISTS out, out_summary;
SELECT madlib.graph_apsp(
'vertex', -- Vertex table
NULL, -- Vertix id column (NULL means use default naming)
'edge', -- Edge table
NULL, -- Edge arguments (NULL means use default naming)
'out'); -- Output table of shortest paths
SELECT * FROM out ORDER BY src,dest;
</pre> <pre class="result">
src | dest | weight | parent
-----+------+----------+--------
0 | 0 | 0 | 0
0 | 1 | 1 | 1
0 | 2 | 1 | 2
0 | 3 | 2 | 2
0 | 4 | 10 | 4
0 | 5 | 2 | 2
0 | 6 | 3 | 5
0 | 7 | 4 | 6
1 | 0 | 4 | 3
1 | 1 | 0 | 1
1 | 2 | 2 | 2
1 | 3 | 3 | 2
1 | 4 | 14 | 0
1 | 5 | 3 | 2
1 | 6 | 4 | 5
1 | 7 | 5 | 6
(showing only 16 of 64 rows)
</pre></li>
<li>Get the shortest path from vertex 0 to vertex 5: <pre class="syntax">
DROP TABLE IF EXISTS out_path;
SELECT madlib.graph_apsp_get_path('out',0,5,'out_path');
SELECT * FROM out_path;
</pre> <pre class="result">
path
---------
{0,2,5}
</pre></li>
<li>Now let's do a similar example except using different column names in the tables (i.e., not the defaults). Create the vertex and edge tables: <pre class="syntax">
DROP TABLE IF EXISTS vertex_alt, edge_alt;
CREATE TABLE vertex_alt AS SELECT id AS v_id FROM vertex;
CREATE TABLE edge_alt AS SELECT src AS e_src, dest, weight AS e_weight FROM edge;
</pre></li>
<li>Calculate the shortest paths: <pre class="syntax">
DROP TABLE IF EXISTS out_alt, out_alt_summary;
SELECT madlib.graph_apsp(
'vertex_alt', -- Vertex table
'v_id', -- Vertex id column
'edge_alt', -- Edge table
'src=e_src, weight=e_weight', -- Edge arguments
'out_alt'); -- Output table of shortest paths
SELECT * FROM out_alt ORDER BY e_src, dest;
</pre> <pre class="result">
e_src | dest | e_weight | parent
-------+------+----------+--------
0 | 0 | 0 | 0
0 | 1 | 1 | 1
0 | 2 | 1 | 2
0 | 3 | 2 | 2
0 | 4 | 10 | 4
0 | 5 | 2 | 2
0 | 6 | 3 | 5
0 | 7 | 4 | 6
1 | 0 | 4 | 3
1 | 1 | 0 | 1
1 | 2 | 2 | 2
1 | 3 | 3 | 2
1 | 4 | 14 | 0
1 | 5 | 3 | 2
1 | 6 | 4 | 5
1 | 7 | 5 | 6
(showing only 16 of 64 rows)
</pre></li>
<li>Create a graph with 2 groups and find APSP for each group: <pre class="syntax">
DROP TABLE IF EXISTS edge_gr;
CREATE TABLE edge_gr AS
(
SELECT *, 0 AS grp FROM edge
UNION
SELECT *, 1 AS grp FROM edge WHERE src &lt; 6 AND dest &lt; 6
);
INSERT INTO edge_gr VALUES
(4,5,-20,1);
</pre></li>
<li>Find APSP for all groups: <pre class="syntax">
DROP TABLE IF EXISTS out_gr, out_gr_summary;
SELECT madlib.graph_apsp(
'vertex', -- Vertex table
NULL, -- Vertex id column (NULL means use default naming)
'edge_gr', -- Edge table
NULL, -- Edge arguments (NULL means use default naming)
'out_gr', -- Output table of shortest paths
'grp' -- Grouping columns
);
SELECT * FROM out_gr WHERE src &lt; 2 ORDER BY grp,src,dest;
</pre> <pre class="result">
grp | src | dest | weight | parent
-----+-----+------+--------+--------
0 | 0 | 0 | 0 | 0
0 | 0 | 1 | 1 | 1
0 | 0 | 2 | 1 | 2
0 | 0 | 3 | 2 | 2
0 | 0 | 4 | 10 | 4
0 | 0 | 5 | 2 | 2
0 | 0 | 6 | 4 | 2
0 | 0 | 7 | 5 | 6
0 | 1 | 0 | 4 | 3
0 | 1 | 1 | 0 | 1
0 | 1 | 2 | 2 | 2
0 | 1 | 3 | 3 | 2
0 | 1 | 4 | 14 | 0
0 | 1 | 5 | 3 | 2
0 | 1 | 6 | 4 | 5
0 | 1 | 7 | 5 | 6
1 | 0 | 0 | 0 | 0
1 | 0 | 1 | 1 | 1
1 | 0 | 2 | 1 | 2
1 | 0 | 3 | 2 | 2
1 | 0 | 4 | 10 | 4
1 | 0 | 5 | -10 | 4
1 | 1 | 0 | 4 | 3
1 | 1 | 1 | 0 | 1
1 | 1 | 2 | 2 | 2
1 | 1 | 3 | 3 | 2
1 | 1 | 4 | 14 | 0
1 | 1 | 5 | -6 | 4
(28 rows)
</pre></li>
<li>Find the path from vertex 0 to vertex 5 in every group <pre class="syntax">
DROP TABLE IF EXISTS out_gr_path;
SELECT madlib.graph_apsp_get_path('out_gr',0,5,'out_gr_path');
SELECT * FROM out_gr_path ORDER BY grp;
</pre> <pre class="result">
grp | path
-----+---------
0 | {0,2,5}
1 | {0,4,5}
</pre></li>
</ol>
<p><a class="anchor" id="literature"></a></p><dl class="section user"><dt>Literature</dt><dd></dd></dl>
<p>[1] <a href="http://www.columbia.edu/~cs2035/courses/ieor6614.S11/apsp.pdf">http://www.columbia.edu/~cs2035/courses/ieor6614.S11/apsp.pdf</a></p>
<p>[2] <a href="http://users.cecs.anu.edu.au/~Alistair.Rendell/Teaching/apac_comp3600/module4/all_pairs_shortest_paths.xhtml">http://users.cecs.anu.edu.au/~Alistair.Rendell/Teaching/apac_comp3600/module4/all_pairs_shortest_paths.xhtml</a></p>
<p>[3] The case against specialized graph analytics engines, J. Fan, G. Soosai Raj, and J. M. Patel. CIDR 2015. <a href="http://cidrdb.org/cidr2015/Papers/CIDR15_Paper20.pdf">http://cidrdb.org/cidr2015/Papers/CIDR15_Paper20.pdf</a> </p>
</div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="footer">Generated on Mon Apr 6 2020 21:46:58 for MADlib by
<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.13 </li>
</ul>
</div>
</body>
</html>