| --- |
| layout: post |
| status: PUBLISHED |
| published: true |
| title: TPC in Apache Phoenix |
| id: 68f0f74b-4da7-4eb8-a444-9b43f69cfdf8 |
| date: '2015-04-29 03:12:10 -0400' |
| categories: phoenix |
| tags: |
| - db |
| - query |
| - phoenix |
| - tpc |
| - database |
| - mvcc |
| - hbase |
| - sql |
| - hadoop |
| - performance |
| - big |
| - data |
| - join |
| permalink: phoenix/entry/tpc_in_apache_phoenix |
| --- |
| <p>I often get asked "<em>Does <a href="http://phoenix.apache.org/">Apache Phoenix</a> support joins?</em>" That's when I know that marketing is hard as <b>Phoenix has supported joins for over a year! </b>In fact, thanks to the excellent work of Maryann Xue, one of our project management committee (<a href="https://www.apache.org/dev/pmc.html#what-is-a-pmc">PMC</a>) members, Phoenix can run many of the complex <a href="Transaction_Processing_Performance_Council" target="_blank">TPC</a> queries. TPC benchmarks are used in evaluating the performance of computer systems. From wikipedia:</p> |
| <blockquote style="margin: 0px 0px 0px 40px; border: none; padding: 0px;"> |
| <p><i>Transaction Processing Performance Council (TPC) is a non-profit organization founded in 1988 to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry. </i></p> |
| </blockquote> |
| <p>Here are some of the query constructs that Phoenix support that you may not know about:</p> |
| <p><font size="3"><strong>Derived Tables</strong> </font>to create on-the-fly views:</p> |
| <pre><font face="courier new, courier, monospace"> SELECT LastName, FirstName
|
| FROM
|
| ( SELECT BusinessEntityID, LastName, FirstName FROM Employee
|
| WHERE State = 'NY' ) AS EmployeeDerivedTable
|
| WHERE LastName = "Smith"
|
| ORDER BY FirstName;
|
| </font></pre></p> |
| <p><font size="3"><strong>Correlated Sub-queries</strong> </font>to correlate column values in inner and outer queries:</p> |
| <pre><font face="courier new, courier, monospace"> SELECT DISTINCT e.LastName, e.FirstName, e.BusinessEntityID, sp.Bonus
|
| FROM Employee AS e
|
| JOIN SalesPerson AS sp ON c.BusinessEntityID = sp.BusinessEntityID
|
| WHERE c.Bonus >=
|
| ( SELECT average(sp2.Bonus)
|
| FROM SalesPerson sp2
|
| JOIN Employee AS e2 ON e2.BusinessEntityID = sp2.BusinessEntityID
|
| WHERE e.DepartmentID = e2.DepartmentID );
|
| </font></pre> |
| <p><font size="3"><strong>Semi/Anti Joins</strong> </font> to test for the existence or nonexistence of a row:</p> |
| <pre><font face="courier new, courier, monospace"> SELECT p.Name, p.Job, p.EmployerID
|
| FROM People AS p
|
| WHERE EXISTS ( SELECT 1 FROM Employers AS e WHERE e.ID = p.EmployerID );
|
| </font></pre> |
| <p><font size="3"><strong>Union All</strong> </font>to concatenate rows from multiple tables (available in 4.4.0 release):</p> |
| <pre><font face="courier new, courier, monospace"> SELECT * FROM Employee e
|
| WHERE e.BusinessEntityID IN
|
| ( SELECT BusinessEntityID FROM SalesPerson WHERE Ranking >= 5.0
|
| UNION ALL
|
| SELECT BusinessEntityID FROM CustomerReview WHERE Score >= 8.0 )</font><span style="font-family: verdana, arial, 'Bitstream Vera Sans', helvetica, sans-serif;"> </span></pre> |
| <p>Phoenix has support for both a broadcast hash join strategy, when one side of the join is small enough to fit into memory, as well as a sort merge join strategy for cases when it won't. Put all this functionality together, and <strong>Phoenix is capable of running many TPC queries</strong>.</p> |
| <p>Here's an example query from one of the TPC queries:</p> |
| <pre><font face="courier new, courier, monospace"> SELECT supp_nation, cust_nation, l_year, sum(volume) as revenue
|
| FROM ( SELECT
|
| n1.n_name as supp_nation, n2.n_name as cust_nation,
|
| year(l_shipdate) as l_year,
|
| l_extendedprice * (1 - l_discount) as volume
|
| FROM Supplier
|
| JOIN LineItem ON s_suppkey = l_suppkey
|
| JOIN Orders ON o_orderkey = l_orderkey
|
| JOIN Customer ON c_custkey = o_custkey
|
| JOIN Nation n1 ON s_nationkey = n1.n_nationkey
|
| JOIN Nation n2 ON c_nationkey = n2.n_nationkey
|
| WHERE ( n1.n_name = '[NATION1]' and n2.n_name = '[NATION2]' )
|
| OR (n1.n_name = '[NATION2]' and n2.n_name = '[NATION1]') )
|
| AND l_shipdate between date '1995-01-01' and date '1996-12-31'
|
| ) AS Shipping
|
| GROUP BY supp_nation, cust_nation, l_year
|
| ORDER BY supp_nation, cust_nation, l_year;</font></pre> |
| <p>So, please repeat after me: "<em><strong>YES</strong>, Phoenix supports joins</em>" :-)</p> |
| <p>Interested in learning more? Come hear Maryann Xue talk at the upcoming <a href="http://www.meetup.com/San-Francisco-Apache-Phoenix-Meetup/events/221247399/" target="_blank">Apache Phoenix meetup</a> on May 5th at the Hortonworks HQ in Santa Clara, or hear our joint talk at <a href="http://www.hbasecon.com/" target="_blank">HBaseCon 2015</a> on May 7th where we'll discuss our plan for getting the rest of the way to complete ANSI SQL 92 support.</p></p> |