blob: 956675ae117337a2a0861196d42b63f901e247a4 [file] [log] [blame]
---
layout: post
status: PUBLISHED
published: true
title: TPC in Apache Phoenix
id: 68f0f74b-4da7-4eb8-a444-9b43f69cfdf8
date: '2015-04-29 03:12:10 -0400'
categories: phoenix
tags:
- db
- query
- phoenix
- tpc
- database
- mvcc
- hbase
- sql
- hadoop
- performance
- big
- data
- join
permalink: phoenix/entry/tpc_in_apache_phoenix
---
<p>I often get asked "<em>Does <a href="http://phoenix.apache.org/">Apache&nbsp;Phoenix</a> support joins?</em>" That's when I know that marketing is hard as <b>Phoenix&nbsp;has supported joins for over a year! </b>In fact, thanks to the excellent work of Maryann Xue, one of our project management committee (<a href="https://www.apache.org/dev/pmc.html#what-is-a-pmc">PMC</a>) members, Phoenix can run many of the complex <a href="Transaction_Processing_Performance_Council" target="_blank">TPC</a> queries. TPC benchmarks are used in evaluating the performance of computer systems. From wikipedia:</p>
<blockquote style="margin: 0px 0px 0px 40px; border: none; padding: 0px;">
<p><i>Transaction Processing Performance Council (TPC) is a non-profit organization founded in 1988 to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry.&nbsp;&nbsp;</i></p>
</blockquote>
<p>Here are some of the query constructs that Phoenix support that you may not know about:</p>
<p><font size="3"><strong>Derived Tables</strong> </font>to create on-the-fly views:</p>
<pre><font face="courier new, courier, monospace"> SELECT LastName, FirstName
FROM
( SELECT BusinessEntityID, LastName, FirstName FROM Employee
WHERE State = 'NY' ) AS EmployeeDerivedTable
WHERE LastName = "Smith"
ORDER BY FirstName;
</font></pre></p>
<p><font size="3"><strong>Correlated Sub-queries</strong> </font>to correlate column values in inner and outer queries:</p>
<pre><font face="courier new, courier, monospace"> SELECT DISTINCT e.LastName, e.FirstName, e.BusinessEntityID, sp.Bonus
FROM Employee AS e
JOIN SalesPerson AS sp ON c.BusinessEntityID = sp.BusinessEntityID
WHERE c.Bonus >=
( SELECT average(sp2.Bonus)
FROM SalesPerson sp2
JOIN Employee AS e2 ON e2.BusinessEntityID = sp2.BusinessEntityID
WHERE e.DepartmentID = e2.DepartmentID );
</font></pre>
<p><font size="3"><strong>Semi/Anti Joins</strong> </font> to test for the existence or nonexistence of a row:</p>
<pre><font face="courier new, courier, monospace"> SELECT p.Name, p.Job, p.EmployerID
FROM People AS p
WHERE EXISTS ( SELECT 1 FROM Employers AS e WHERE e.ID = p.EmployerID );
</font></pre>
<p><font size="3"><strong>Union All</strong> </font>to concatenate rows from multiple tables (available in 4.4.0 release):</p>
<pre><font face="courier new, courier, monospace"> SELECT * FROM Employee e
WHERE e.BusinessEntityID IN
( SELECT BusinessEntityID FROM SalesPerson WHERE Ranking >= 5.0
UNION ALL
SELECT BusinessEntityID FROM CustomerReview WHERE Score >= 8.0 )</font><span style="font-family: verdana, arial, 'Bitstream Vera Sans', helvetica, sans-serif;">&nbsp;</span></pre>
<p>Phoenix has support for both a broadcast hash join strategy, when one side of the join is small enough to fit into memory, as well as a sort merge join strategy for cases when it won't. Put all this functionality together, and <strong>Phoenix is capable of running many TPC queries</strong>.</p>
<p>Here's an example query from one of the TPC queries:</p>
<pre><font face="courier new, courier, monospace"> SELECT supp_nation, cust_nation, l_year, sum(volume) as revenue
FROM ( SELECT
n1.n_name as supp_nation, n2.n_name as cust_nation,
year(l_shipdate) as l_year,
l_extendedprice * (1 - l_discount) as volume
FROM Supplier
JOIN LineItem ON s_suppkey = l_suppkey
JOIN Orders ON o_orderkey = l_orderkey
JOIN Customer ON c_custkey = o_custkey
JOIN Nation n1 ON s_nationkey = n1.n_nationkey
JOIN Nation n2 ON c_nationkey = n2.n_nationkey
WHERE ( n1.n_name = '[NATION1]' and n2.n_name = '[NATION2]' )
OR (n1.n_name = '[NATION2]' and n2.n_name = '[NATION1]') )
AND l_shipdate between date '1995-01-01' and date '1996-12-31'
) AS Shipping
GROUP BY supp_nation, cust_nation, l_year
ORDER BY supp_nation, cust_nation, l_year;</font></pre>
<p>So, please repeat after me: "<em><strong>YES</strong>, Phoenix supports joins</em>" :-)</p>
<p>Interested in learning more? Come hear Maryann Xue talk at the upcoming <a href="http://www.meetup.com/San-Francisco-Apache-Phoenix-Meetup/events/221247399/" target="_blank">Apache Phoenix meetup</a> on May 5th at the Hortonworks HQ in Santa Clara, or hear our joint talk at <a href="http://www.hbasecon.com/" target="_blank">HBaseCon 2015</a>&nbsp;on May 7th where we'll discuss our plan for getting the rest of the way to complete ANSI SQL 92 support.</p></p>