Merge branch 'master' into release/2.2.0.0-incubating

commit: b50bbf2590b5047a011462dd6f31750026ea1c8e [log] [tgz]
author: David Yozie <yozie@apache.org> Tue May 23 16:18:10 2017 -0700
committer: David Yozie <yozie@apache.org> Tue May 23 16:18:10 2017 -0700
tree: e0ceb8a064b53ee3175800e7f2eb1a7f0ac1eded
parent: a65fa2896a266c70faf248b30583fdade46faaae [diff]
parent: 18ee9446db2dbe05522eb528c6d91a5de0fff070 [diff]
diff --git a/README.md b/README.md
index 331d272..79cddb6 100644
--- a/README.md
+++ b/README.md

@@ -33,24 +33,24 @@
 2. Install bookbinder and its dependent gems. Make sure you are in the `book` directory and enter:
 
     ``` bash
-$ bundle install
-```
+    $ bundle install
+    ```
 
 3. The installed `config.yml` file configures the book for building from your local HAWQ source files.  Build the output HTML files by executing the command:
 
     ``` bash
-$ bundle exec bookbinder bind local
-```
+    $ bundle exec bookbinder bind local
+    ```
 
    Bookbinder converts the XML source into HTML, and puts the final output in the `final_app` directory.
   
 5. The `final_app` directory stages the HTML into a web application that you can view using the rack gem. To view the documentation build:
 
     ``` bash
-$ cd final_app
-$ bundle install
-$ rackup
-```
+    $ cd final_app
+    $ bundle install
+    $ rackup
+    ```
 
    Your local documentation is now available for viewing at[http://localhost:9292](http://localhost:9292)
 

diff --git a/book/master_middleman/source/subnavs/apache-hawq-nav.erb b/book/master_middleman/source/subnavs/apache-hawq-nav.erb
index 489f0c4..a69da5c 100644
--- a/book/master_middleman/source/subnavs/apache-hawq-nav.erb
+++ b/book/master_middleman/source/subnavs/apache-hawq-nav.erb

@@ -39,6 +39,30 @@
         </ul>
       </li>
       <li class="has_submenu">
+        <a href="/docs/userguide/2.2.0.0-incubating/tutorial/overview.html">Getting Started with HAWQ Tutorial</a>
+          <ul>
+            <li>
+              <a href="/docs/userguide/2.2.0.0-incubating/tutorial/gettingstarted/introhawqenv.html">Lesson 1 - Runtime Environment</a>
+            </li>
+            <li>
+              <a href="/docs/userguide/2.2.0.0-incubating/tutorial/gettingstarted/basichawqadmin.html">Lesson 2 - Cluster Administration</a>
+            </li>
+            <li>
+              <a href="/docs/userguide/2.2.0.0-incubating/tutorial/gettingstarted/basicdbadmin.html">Lesson 3 - Database Administration</a>
+            </li>
+            <li>
+              <a href="/docs/userguide/2.2.0.0-incubating/tutorial/gettingstarted/dataandscripts.html">Lesson 4 - Sample Data Set and HAWQ Schemas</a>
+            </li>
+            <li>
+              <a href="/docs/userguide/2.2.0.0-incubating/tutorial/gettingstarted/introhawqtbls.html">Lesson 5 - HAWQ Tables</a>
+            </li>
+            <li>
+              <a href="/docs/userguide/2.2.0.0-incubating/tutorial/gettingstarted/intropxfhdfs.html">Lesson 6 - HAWQ Extension Framework (PXF)</a>
+            </li>
+          </ul>
+        </li>
+
+      <li class="has_submenu">
         <span>
           Running a HAWQ Cluster
         </span>
@@ -214,6 +238,9 @@
             <a href="/docs/userguide/2.2.0.0-incubating/ddl/ddl-table.html">Creating and Managing Tables</a>
           </li>
           <li>
+            <a href="/docs/userguide/2.2.0.0-incubating/ddl/locate-table-hdfs.html"> Identifying HAWQ Table HDFS Files</a>
+          </li>
+          <li>
             <a href="/docs/userguide/2.2.0.0-incubating/ddl/ddl-storage.html">Choosing the Table Storage Model</a>
           </li>
           <li>

diff --git a/markdown/clientaccess/roles_privs.html.md.erb b/markdown/clientaccess/roles_privs.html.md.erb
index 2675c75..e9e9aa2 100644
--- a/markdown/clientaccess/roles_privs.html.md.erb
+++ b/markdown/clientaccess/roles_privs.html.md.erb

@@ -154,9 +154,9 @@
 
 ## <a id="topic8"></a>Encrypting Data 
 
-PostgreSQL provides an optional package of encryption/decryption functions called `pgcrypto`, which can also be installed and used in HAWQ. The `pgcrypto` package is not installed by default with HAWQ. However, you can download a `pgcrypto` package from [Pivotal Network](https://network.pivotal.io). 
+PostgreSQL provides an optional package of encryption/decryption functions called `pgcrypto`, which you can enable in HAWQ.
 
-If you are building HAWQ from source files, then you should enable `pgcrypto` support as an option when compiling HAWQ.
+If you are building HAWQ from source, then you should enable `pgcrypto` support as an option when compiling HAWQ.
 
 The `pgcrypto` functions allow database administrators to store certain columns of data in encrypted form. This adds an extra layer of protection for sensitive data, as data stored in HAWQ in encrypted form cannot be read by users who do not have the encryption key, nor be read directly from the disks.
 

diff --git a/markdown/ddl/locate-table-hdfs.html.md.erb b/markdown/ddl/locate-table-hdfs.html.md.erb
new file mode 100644
index 0000000..a38aa1a
--- /dev/null
+++ b/markdown/ddl/locate-table-hdfs.html.md.erb

@@ -0,0 +1,160 @@
+---

+title: Identifying HAWQ Table HDFS Files

+---

+

+<!--

+Licensed to the Apache Software Foundation (ASF) under one

+or more contributor license agreements.  See the NOTICE file

+distributed with this work for additional information

+regarding copyright ownership.  The ASF licenses this file

+to you under the Apache License, Version 2.0 (the

+"License"); you may not use this file except in compliance

+with the License.  You may obtain a copy of the License at

+

+  http://www.apache.org/licenses/LICENSE-2.0

+

+Unless required by applicable law or agreed to in writing,

+software distributed under the License is distributed on an

+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

+KIND, either express or implied.  See the License for the

+specific language governing permissions and limitations

+under the License.

+-->
+
+You can determine the HDFS location of the data file(s) associated with a specific HAWQ table using the HAWQ filespace HDFS location, the table identifier, and the identifiers for the tablespace and database in which the table resides. 
+
+The number of HDFS data files associated with a HAWQ table is determined by the distribution mechanism (hash or random) identified when the table was first created or altered.
+
+Only an HDFS or HAWQ superuser may access HAWQ table HDFS files.
+
+## <a id="idhdfsloc"></a> HDFS Location
+
+The format of the HDFS file path for a HAWQ table is:
+
+``` pre
+hdfs://<name-node>:<port>/<hawq-filespace-dir>/<tablespace-oid>/<database-oid>/<table-relfilenode>/<file-number>
+```
+
+The HDFS file path components are described in the table below.
+
+|   Path Component   | Description  |
+|---------------------|----------------------------|
+| \<name-node\>  |  The HDFS NameNode host.  |
+| \<port\>  |  The HDFS NameNode port. |
+| \<hawq-filespace-dir\>  |  The HDFS directory location of the HAWQ filespace. The default HAWQ filespace HDFS directory is `hawq_default`. |
+| \<tablespace-oid\>  |  The tablespace object identifier. The default HAWQ tablespace identifier is `16385`. |
+| \<database-oid\>  |  The database object identifier. |
+| \<table-relfilenode\>  |  The table object identifier. |
+| \<file-number\>  |  The file number. |
+
+**Note**: The HAWQ filespace name and its HDFS directory location must be identified when you create a new HAWQ filespace. You must know both to locate the HDFS files for a specific HAWQ table.
+
+The \<name-node\>:\<port\>/\<default-hawq-filespace-dir\> together comprise the `hawq_dfs_url` server configuration parameter. To display the value of the HAWQ default filespace URL:
+
+``` shell
+gpadmin@master$ hawq config -s hawq_dfs_url
+GUC      : hawq_dfs_url
+Value    : <name-node>:8020/hawq_default
+```
+
+or view the **HAWQ** service **Configs > Advanced**, **General** pane, in your Ambari console.
+
+You can determine the tablespace, database, and table object identifiers through HAWQ catalog queries. See the [Example](#ex_hdfslochash) below.
+
+
+## <a id="idnumfiles"></a> Number of Data Files
+
+The number of data files that are created for a HAWQ table differs for hash-distributed and randomly-distributed HAWQ tables.
+
+Hash-distributed HAWQ tables use a fixed number of virtual segments (vsegs). This number is determined by the `default_hash_table_bucket_number` server configuration parameter setting, or the `BUCKETNUM` value you provide in the `CREATE TABLE` call. The number of HDFS files that HAWQ creates for a hash-distributed table also depends on the maximum number of concurrent inserts that have been executed against the table. The number of HDFS files is always the `default_hash_table_bucket_number` or `BUCKETNUM` value multiplied by the maximum number of concurrent inserts.
+
+The number of HDFS files generated for a randomly-distributed HAWQ table varies depending on the total number of virtual segments that have written data to the table.
+
+
+## <a id="ex_hdfslochash"></a> Example: Locating HDFS Files for a HAWQ Table
+
+Perform the following steps to identify the HDFS location of the data files associated with a hash-distributed HAWQ table. The SQL queries used in this example are applicable to randomly-distributed HAWQ tables as well.
+
+**Note**: Your HAWQ catalog object identifier query results may differ.
+
+1. Start the `psql` subsystem:
+
+    ``` shell
+    gpadmin@master$ psql -d testdb
+    ```
+    
+2. Create a hash-distributed table with 4 buckets and insert some data:
+
+    ``` sql
+    testdb=# CREATE TABLE hash_tbl (id int) WITH (BUCKETNUM=4) DISTRIBUTED BY (id);
+    CREATE TABLE
+    testdb=# INSERT INTO hash_tbl SELECT i FROM generate_series(1,100) AS i;
+    INSERT 0 100
+    ```
+
+4. Determine the tablespace identifier for your filespace. You must know both the filespace and tablespace names. For example:
+
+    ``` sql
+    testdb=# SELECT fsname, spcname AS tablespace_name, tablespace_oid 
+               FROM  pg_filespace, gp_persistent_tablespace_node, pg_tablespace 
+               WHERE pg_tablespace.spcfsoid = gp_persistent_tablespace_node.filespace_oid 
+                 AND pg_filespace.oid = pg_tablespace.spcfsoid 
+                 AND fsname !~ '^pg_' ORDER BY 1;
+       fsname   | tablespace_name | tablespace_oid 
+    ------------+-----------------+----------------
+     dfs_system | dfs_default     |          16385
+     tryfs      | try_tablespace  |          16619
+    (2 rows)
+    ```
+    
+    The default HAWQ filespace name is `dfs_system`. The tablespace identifier associated with the default HAWQ tablespace named `dfs_default` is `16385`. Make note of this identifier.
+    
+    The example above includes a second HAWQ filespace named `tryfs`. The tablespace identifier associated with the tablespace named `try_tablespace` is `16619`.
+    
+3. Determine the object identifier of the database `testdb`:
+
+    ``` sql
+    testdb=# SELECT oid FROM pg_database WHERE datname = 'testdb';
+      oid  
+    -------
+     16508
+    (1 row)
+    ```
+    
+    Make note of this identifier.
+    
+4.  Tables of the same name may reside in different schemas. The catalog query you use to determine the identifier for the `hash_tbl` table also includes the schema name (`public`):
+
+    ``` sql
+    testdb=# SELECT relname, relfilenode, nspname, relnamespace  
+               FROM pg_class, pg_namespace  
+               WHERE relname = 'hash_tbl' AND nspname = 'public' AND relnamespace=pg_namespace.oid;
+     relname  | relfilenode |  nspname  | relnamespace 
+    ----------+-------------+-----------+--------------
+     hash_tbl |       55784 | public    |         2200
+    (1 row)
+    ```
+    
+    Make note of the `relfilenode` value for `hash_tbl`.
+
+4. Construct an HDFS file path for `hash_tbl`. For example, using the HDFS directory location of the HAWQ default filespace:
+
+    ``` pre
+    hdfs://<name-node>:<port>/<hawq-filespace-name>/<tablespace-oid>/<database-oid>/<table-relfilenode>/<file-number>
+    hdfs://<name-node>:8020/hawq_default/16385/16508/55784
+    ```
+    
+    Substitute your HDFS NameNode for \<name-node\>.
+
+4. Locate the HDFS file(s):
+
+    ``` shell
+    gpadmin@master$ hdfs dfs -ls hdfs://<name-node>:8020/hawq_default/16385/16508/55784
+    Found 6 items
+    -rw-------   3 gpadmin gpadmin        176 2017-04-17 15:24 hdfs://name-node:8020/hawq_default/16385/16508/55784/1
+    -rw-------   3 gpadmin gpadmin        168 2017-04-17 15:24 hdfs://name-node:8020/hawq_default/16385/16508/55784/2
+    -rw-------   3 gpadmin gpadmin        192 2017-04-17 15:24 hdfs://name-node:8020/hawq_default/16385/16508/55784/3
+    -rw-------   3 gpadmin gpadmin        168 2017-04-17 15:24 hdfs://name-node:8020/hawq_default/16385/16508/55784/4
+    ```
+    
+    As expected, `hash_tbl` is comprised of 4 HDFS data files, a multiple of the `BUCKETNUM` you specified when creating the table in Step 2.

diff --git a/markdown/ranger/madlib-ranger.html.md.erb b/markdown/ranger/madlib-ranger.html.md.erb
index 8f6d55a..f074c30 100644
--- a/markdown/ranger/madlib-ranger.html.md.erb
+++ b/markdown/ranger/madlib-ranger.html.md.erb

@@ -22,21 +22,19 @@
 -->

 
 
-You can use MADlib, an open source library for in-database analytics, with your HAWQ installation. MADlib functions typically operate on source, output, and model tables. When Ranger is enabled for HAWQ authorization, you will need to provide access to all MADLib-related databases, schemas, tables, and functions to the appropriate users.  
+You can use MADlib, an open source library for in-database analytics, with your HAWQ installation. MADlib functions typically operate on source, output, and model tables. When Ranger is enabled for HAWQ authorization, you will need to explicitly provide access to all MADlib-related databases, schemas, tables, and functions to the appropriate users.  
 
-Consider the following when setting up HAWQ policies for MADlib access:
+Consider the following when setting up HAWQ Ranger policies for MADlib access:
 
-- Assign `temp` permission to the database on which users will run MADlib functions.
-- MADlib users often share their output tables. If this is the case in your deployment, create a shared schema dedicated to output tables, assigning `usage-schema` and `create` privileges for all MADlib users to this shared schema.
-- Assign `create-schema` database permission to those MADlib users that do not choose to share their output tables.
+- Assign `temp` permission to the database(s) on which users will run MADlib functions. This permission is required because MADlib creates temporary tables at runtime.
+- MADlib users often share their output tables. If this is the case in your deployment, create a shared schema dedicated to output tables, assigning `usage-schema` and `create` privileges to this shared schema to all MADlib users. 
+    - When calling a MADlib function, prepend the output table name with the shared schema name; for example, `shared_schema.output_table1`. This ensures that all tables created by the MADlib function (model summary tables, dictionary tables, etc.) are written to the same, accessible shared schema.
+    - MADlib sometimes creates output tables in addition to the one specified by the user. Prepending the shared schema name to the output table name ensures that these MADlib-generated output tables are accessible. 
+- Assign the `create-schema` database permission to those MADlib users who choose not to share their output tables. This permits those users to create private schemas for their MADlib output tables, rendering them inaccessible to other users.
 
 - `madlib` Schema-Level Permissions
+    - By default, MADlib is installed in a schema named `madlib`. You can choose to install MADlib in a different schema. References to `madlib` in the list below apply to the schema in which you installed MADlib.
     - Assign `usage-schema` and `create` privileges to the `madlib` schema.
     - Assign `execute` permissions on all functions within the `madlib` schema, including any functions called within.
     - Assign `insert` and `select` permissions to all tables within the `madlib` schema.
     - Assign the `usage-schema` and `create` permissions for the current schema, and any schema in which the source, output, and model tables may reside.
-
-- Function-Specific Permissions 
-    - Assign `insert` and `select` permissions for the source, output, and model tables.
-    - Assign `insert` and `select` permissions for the output \_summary and \__group tables.
-

diff --git a/markdown/ranger/ranger-policy-creation.html.md.erb b/markdown/ranger/ranger-policy-creation.html.md.erb
index 5bd12b4..ec78c35 100644
--- a/markdown/ranger/ranger-policy-creation.html.md.erb
+++ b/markdown/ranger/ranger-policy-creation.html.md.erb

@@ -319,10 +319,13 @@
 
 - `CREATE LANGUAGE` commands (superuser-only) issued for non-built-in languages (pljava, plpython, ..) require the `usage` permission for the `c` language.
 
-- If Ranger is enabled for Hive authorization in your HAWQ cluster:
-    -  Create Hive policy(s) providing the user `pxf` access to any Hive tables you want to expose via PXF HCatalog integration or HAWQ PXF external tables.
-    - The HAWQ policies providing access to PXF HCatalog integration must identify database `hcatalog`, schema `<hive-schema-name>`, and table `<hive-table-name>` resources.  These privileges are required in addition to any Hive policies for user `pxf` when Ranger is enabled for Hive authorization.
+- Using built-in functions may generate the message:  “WARNING: usage privilege of namespace \<schema-name\> is required.” This message is displayed even though the usage permission on \<schema-name\> is not actually required to execute the built-in function.
 
-- If you have enabled Ranger authorization for HDFS in your HAWQ cluster:
-    -  Create an HDFS policy(s) providing user `gpadmin` access to the HDFS HAWQ filespace.
-    -  If you plan to use PXF external tables to read and write HDFS data, create HDFS policies providing user `pxf` access to the HDFS files backing your PXF external tables.
+- When Ranger authorization is enabled for HDFS in your HAWQ cluster:
+    - The HDFS `xasecure.add-hadoop-authorization` property determines whether or not HDFS access controls are used as a fallback when no policy exists for a given HDFS resource. HAWQ access to HDFS is not affected when the `xasecure.add-hadoop-authorization` property is set to `true`. When this property is set to `false`, you must define HDFS Ranger policies permitting the `gadmin` HAWQ user read/write/execute access to the HAWQ HDFS filespace. 
+    - Access to HDFS-backed PXF external tables is not affected by the `xasecure.add-hadoop-authorization` property value, since the `pxf` user is a member of the `hdfs` superuser group.
+
+- Hive Ranger policies cannot control PXF access to Hive tables.
+    -  When Ranger authorization is enabled for HAWQ, the `gpadmin` user has access permissions to all Hive tables exposed through PXF external tables and HCatalog integration.
+    - Other HAWQ users may gain access to Hive-backed PXF external tables when provided `usage-schema` and `create` permissions on the `public` or any private schema. To restrict this access, selectively assign permissions to the `pxf` protocol. 
+    - HCatalog access to Hive tables is restricted by default when Ranger authorization is enabled for HAWQ; you must create policies to explicitly allow this access.

diff --git a/markdown/reference/guc/parameter_definitions.html.md.erb b/markdown/reference/guc/parameter_definitions.html.md.erb
index 1f94c5a..70416d6 100644
--- a/markdown/reference/guc/parameter_definitions.html.md.erb
+++ b/markdown/reference/guc/parameter_definitions.html.md.erb

@@ -2043,7 +2043,7 @@
 
 ## <a name="hawq_rm_return_percent_on_overcommit"></a>hawq\_rm\_return\_percent\_on\_overcommit
 
-Determines how many containers the global resource manager should return to the global resource manager (YARN for example.) This configuration only applies when HAWQ's YARN queue is busy, and HAWQ makes the YARN queue overuse its resources. The default value is 10, which means HAWQ will return 10% of acquired YARN containers by pausing the allocation of resources to HAWQ queries.
+Determines how many containers HAWQ should return to the global resource manager (YARN for example.) This configuration only applies when HAWQ's YARN queue is busy, and HAWQ makes the YARN queue overuse its resources. The default value is 10, which means HAWQ will return 10% of acquired YARN containers by pausing the allocation of resources to HAWQ queries.
 
 In a typical deployment, you do not need to modify the default value of this parameter.
 

diff --git a/markdown/tutorial/gettingstarted/basicdbadmin.html.md.erb b/markdown/tutorial/gettingstarted/basicdbadmin.html.md.erb
new file mode 100644
index 0000000..04fdaab
--- /dev/null
+++ b/markdown/tutorial/gettingstarted/basicdbadmin.html.md.erb

@@ -0,0 +1,233 @@
+---
+title: Lesson 3 - Database Administration
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+The HAWQ `gpadmin` user and other users who are granted the necessary privileges can execute SQL commands to create HAWQ databases and tables. These commands may be invoked via scripts, programs, and from the `psql` client utility.
+
+This lesson introduces basic HAWQ database administration commands and tasks using `psql`. You will create a database and a simple table, and add data to and query the table.
+
+## <a id="tut_adminprereq"></a> Prerequisites
+
+Ensure that you have [Set Up your HAWQ Runtime Environment](introhawqenv.html#tut_runtime_setup) and that your HAWQ cluster is up and running.
+
+
+## <a id="tut_ex_createdb"></a>Exercise: Create the HAWQ Tutorial Database
+
+In this exercise, you use the `psql` command line utility to create a HAWQ database.
+
+1. Start the `psql` subsystem:
+
+    ``` shell
+    gpadmin@master$ psql -d postgres
+    ```
+
+    You enter the `psql` interpreter, connecting to the `postgres` database. `postgres` is a default template database created during HAWQ installation.
+    
+    ``` sql
+    psql (8.2.15)
+    Type "help" for help.
+
+    postgres=# 
+    ```
+    
+    The `psql` prompt is the database name followed by `=#` or `=>`. `=#` identifies the session as that of a database superuser. The default `psql` prompt for a non-superuser is `=>`.
+
+2. Create a database named `hawqgsdb`:
+
+    ``` sql
+    postgres=# CREATE DATABASE hawqgsdb;
+    CREATE DATABASE
+    ```
+    
+    The `;` at the end of the `CREATE DATABASE` statement instructs `psql` to interpret the command. SQL commands that span multiple lines are not interpreted until the `;` is entered.
+
+3. Connect to the `hawqgsdb` database you just created:
+
+    ``` sql
+    postgres=# \c hawqgsdb
+    You are now connected to database "hawqgsdb" as user "gpadmin".
+    hawqgsdb=#
+    ```
+
+4. Use the `psql` `\l` meta-command to list all HAWQ databases:
+
+    ``` sql
+    hawqgsdb=# \l
+                         List of databases
+          Name       |  Owner  | Encoding | Access privileges 
+    -----------------+---------+----------+-------------------
+     hawqgsdb        | gpadmin | UTF8     | 
+     postgres        | gpadmin | UTF8     | 
+     template0       | gpadmin | UTF8     | 
+     template1       | gpadmin | UTF8     | 
+    (4 rows)
+    ```
+    
+    HAWQ creates two additional template databases during installation, `template0` and `template1`, as you see above. Your HAWQ cluster may list additional databases.
+
+5. Exit `psql`:
+
+    ``` sql
+    hawqgsdb=# \q
+    ```
+
+## <a id="tut_ex_usepsql"></a>Exercise: Use psql for Table Operations
+
+You manage and access HAWQ databases and tables via the `psql` utility, an interactive front-end to the HAWQ database. In this exercise, you use `psql` to create, add data to, and query a simple HAWQ table.
+
+1. Start the `psql` subsystem:
+
+    ``` shell
+    gpadmin@master$ psql -d hawqgsdb
+    ```
+
+    The `-d hawqgsdb` option instructs `psql` to connect directly to the `hawqgsdb` database.
+  
+
+2. Create a table named `first_tbl` that has a single integer column named `i`:
+
+    ``` sql
+    hawqgsdb=# CREATE TABLE first_tbl( i int );
+    CREATE TABLE 
+    ```
+
+3. Display descriptive information about table `first_tbl`:
+
+    ``` sql
+    hawqgsdb=# \d first_tbl
+    Append-Only Table "public.first_tbl"
+     Column |  Type   | Modifiers 
+    --------+---------+-----------
+     i      | integer | 
+    Compression Type: None
+    Compression Level: 0
+    Block Size: 32768
+    Checksum: f
+    Distributed randomly
+    ```
+    
+    `first_tbl` is a table in the HAWQ `public` schema. `first_tbl` has a single integer column, was created with no compression, and is distributed randomly.
+
+4. Add some data to `first_tbl`:
+
+    ``` sql
+    hawqgsdb=# INSERT INTO first_tbl VALUES(1);
+    INSERT 0 1
+    hawqgsdb=# INSERT INTO first_tbl VALUES(2);
+    INSERT 0 1 
+    ```
+    
+    Each `INSERT` command adds a row to `first_tbl`, the first adding a row with the value `i=1`, and the second, a row with the value `i=2`. Each `INSERT` also displays the number of rows added (1).
+
+4. HAWQ provides several built-in functions for data manipulation. The  `generate_series(<start>, <end>)` function generates a series of numbers beginning with `<start>` and finishing at `<end>`. Use the `generate_series()` HAWQ built-in function to add rows for `i=3`, `i=4`, and `i=5` to `first_tbl`:
+
+    ``` sql
+    hawqgsdb=# INSERT INTO first_tbl SELECT generate_series(3, 5);
+    INSERT 0 3
+    ```
+    
+    This `INSERT `command uses the `generate_series()` built-in function to add 3 rows to `first_tbl`, starting with `i=3` and writing and incrementing `i` for each new row.
+        
+5. Perform a query to return all rows in the `first_tbl` table:
+
+    ``` sql
+    hawqgsdb=# SELECT * FROM first_tbl;
+     i  
+    ----
+      1
+      2
+      3
+      4
+      5
+    (5 rows)
+    ```
+    
+    The `SELECT *` command queries `first_tbl`, returning all columns and all rows. `SELECT` also displays the total number of rows returned in the query.
+
+6. Perform a query to return column `i` for all rows in `first_tbl` where `i` is greater than 3:
+
+    ``` sql
+    hawqgsdb=# SELECT i FROM first_tbl WHERE i>3;
+     i  
+    ----
+      4
+      5
+    (2 rows)
+    ```
+    
+    The `SELECT` command returns the 2 rows (`i=4` and `i=5`) in the table where `i` is larger than 3 and displays the value of `i`.
+
+7. Exit the `psql` subsystem:
+
+    ``` sql
+    hawqgsdb=# \q
+    ```
+    
+8. `psql` includes an option, `-c`, to run a single SQL command from the shell command line. Perform the same query you ran in Step 7 using the `-c <sql-command>` option:
+
+    ``` shell
+    gpadmin@master$ psql -d hawqgsdb -c 'SELECT i FROM first_tbl WHERE i>3'
+    ```
+    
+    Notice that you enclose the SQL command in single quotes.
+
+9. Set the HAWQ `PGDATABASE` environment variable to identify `hawqsgdb`:
+
+    ``` shell
+    gpadmin@master$ export PGDATABASE=hawqgsdb
+    ```
+
+    `$PGDATABASE` identifies the default database to which to connect when invoking the HAWQ `psql` command.
+
+10. Re-run the query from the command line again, this time omitting the `-d` option:
+
+    ``` shell
+    gpadmin@master$ psql -c 'SELECT i FROM first_tbl WHERE i>3'
+    ```
+    
+    When no database is specified on the command line, `psql` attempts to connect to the database identified by `$PGDATABASE`.
+
+11. Add the `PGDATABASE` setting to your `.bash_profile`:
+
+    ``` shell
+    export PGDATABASE=hawqgsdb
+    ```  
+
+    
+## <a id="tut_dbadmin_summary"></a>Summary
+You created the database you will use in later lessons. You also created, inserted data into, and queried a simple HAWQ table using`psql`.
+
+For information on SQL command support in HAWQ, refer to the [SQL Command](../../reference/SQLCommandReference.html) reference. 
+
+For detailed information on the `psql` subsystem, refer to the [psql](../../reference/cli/client_utilities/psql.html) reference page. Commonly-used `psql` meta\-commands are identified in the table below.
+
+| Action                                                    | Command                                                                                                                                                                                            |
+|-------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| List databases | `\l` |
+| List tables in current database   | `\dt`                                                                                         |
+| Describe a specific table   | `\d <table-name>`                                                                                         |
+| Execute an SQL script     | `\i <script-name>`                                                                                         |
+| Quit/Exit    | `\q`                                                                                         |
+
+Lesson 4 introduces the Retail demo, a more complicated data set used in upcoming lessons. You will download and examine the data set and work files. You will also load some of the data set into HDFS.
+ 
+**Lesson 4**: [Sample Data Set and HAWQ Schemas](dataandscripts.html)

diff --git a/markdown/tutorial/gettingstarted/basichawqadmin.html.md.erb b/markdown/tutorial/gettingstarted/basichawqadmin.html.md.erb
new file mode 100644
index 0000000..84ecb5f
--- /dev/null
+++ b/markdown/tutorial/gettingstarted/basichawqadmin.html.md.erb

@@ -0,0 +1,225 @@
+---
+title: Lesson 2 - Cluster Administration
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+The HAWQ `gpadmin` administrative user has super-user capabilities on all HAWQ databases and HAWQ cluster management commands.
+
+HAWQ configuration parameters affect the behaviour of both the HAWQ cluster and individual HAWQ nodes.
+
+This lesson introduces basic HAWQ cluster administration tasks. You will view and update HAWQ configuration parameters.
+
+**Note**: Before installing HAWQ, you or your administrator choose to configure and manage the HAWQ cluster either using the command line or using the Ambari UI. You will perform command line and Ambari exercises for managing your HAWQ cluster in this lesson. Although you are introduced to both, command line and Ambari HAWQ cluster management modes should not be mixed.
+
+## <a id="tut_adminprereq"></a> Prerequisites
+
+Ensure that you have [Set Up your HAWQ Runtime Environment](introhawqenv.html#tut_runtime_setup) and that your HAWQ cluster is up and running.
+
+## <a id="tut_ex_cmdline_cfg"></a>Exercise: View and Update HAWQ Configuration from the Command Line
+
+If you choose to manage your HAWQ cluster from the command line, you will perform many administrative functions using the `hawq` utility. The `hawq` command line utility provides subcommands including `start`, `stop`, `config`, and `state`.
+
+In this exercise, you will use the command line to view and set HAWQ server configuration parameters. 
+
+Perform the following steps to view the HAWQ HDFS filespace URL and set the `pljava_classpath` server configuration parameter:
+
+1. The `hawq_dfs_url` configuration parameter identifies the HDFS NameNode (or HDFS NameService if HDFS High Availability is enabled) host, port, and the HAWQ filespace location within HDFS. Display the value of this parameter:
+
+    ``` shell
+    gpadmin@master$ hawq config -s hawq_dfs_url
+    GUC	   : hawq_dfs_url
+    Value  : <hdfs-namenode>:8020/hawq_data
+    ```
+    
+    Make note of the <hdfs-namenode> hostname or IP address returned, you will need this in *Lesson 6: HAWQ Extension Framework (PXF)*.
+
+2. The HAWQ PL/Java `pljava_classpath` server configuration parameter identifies the classpath used by the HAWQ PL/Java extension. View the current `pljava_classpath` configuration parameter setting:
+
+    ``` shell
+    gpadmin@master$ hawq config -s pljava_classpath
+    GUC		: pljava_classpath
+    Value   :
+    ```
+    
+    The value is currently not set, as indicated by the empty `Value`.
+
+3. Your HAWQ installation includes an example PL/Java JAR file. Set `pljava_classpath` to include the `examples.jar` file installed with HAWQ:
+
+    ``` shell
+    gpadmin@master$ hawq config -c pljava_classpath -v 'examples.jar'
+    GUC pljava_classpath does not exist in hawq-site.xml
+    Try to add it with value: examples.jar
+    GUC	    : pljava_classpath
+    Value   : examples.jar
+    ```
+
+    The message 'GUC pljava\_classpath does not exist in hawq-site.xml; Try to add it with value: examples.jar' indicates that HAWQ could not find a previous setting for `pljava_classpath` and attempts to set this configuration parameter to `examples.jar`, the value you provided with the `-v` option.
+
+3. You must reload the HAWQ configuration after setting a configuration parameter: 
+
+    ``` shell
+    gpadmin@master$ hawq stop cluster --reload
+    20170411:19:58:17:428600 hawq_stop:master:gpadmin-[INFO]:-Prepare to do 'hawq stop'
+    20170411:19:58:17:428600 hawq_stop:master:gpadmin-[INFO]:-You can find log in:
+    20170411:19:58:17:428600 hawq_stop:master:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_stop_20170411.log
+    20170411:19:58:17:428600 hawq_stop:master:gpadmin-[INFO]:-GPHOME is set to:
+    20170411:19:58:17:428600 hawq_stop:master:gpadmin-[INFO]:-/usr/local/hawq/.
+    20170411:19:58:17:428600 hawq_stop:master:gpadmin-[INFO]:-Reloading configuration without restarting hawq cluster
+
+    Continue with HAWQ service stop Yy|Nn (default=N):
+    > 
+    ```
+    
+    Reloading configuration does not actually stop the cluster, as noted in the INFO messages above.
+    
+    HAWQ prompts you to confirm the operation. Enter `y` to confirm:
+    
+    ``` shell
+    > y
+    20170411:19:58:22:428600 hawq_stop:master:gpadmin-[INFO]:-No standby host configured
+    20170411:19:58:23:428600 hawq_stop:master:gpadmin-[INFO]:-Reload hawq cluster
+    20170411:19:58:23:428600 hawq_stop:master:gpadmin-[INFO]:-Reload hawq master
+    20170411:19:58:23:428600 hawq_stop:master:gpadmin-[INFO]:-Master reloaded successfully
+    20170411:19:58:23:428600 hawq_stop:master:gpadmin-[INFO]:-Reload hawq segment
+   20170411:19:58:23:428600 hawq_stop:master:gpadmin-[INFO]:-Reload segments in list: ['segment']
+   20170411:19:58:23:428600 hawq_stop:master:gpadmin-[INFO]:-Total segment number is: 1
+..
+    20170411:19:58:25:428600 hawq_stop:master:gpadmin-[INFO]:-1 of 1 segments reload successfully
+    20170411:19:58:25:428600 hawq_stop:master:gpadmin-[INFO]:-Segments reloaded successfully
+    20170411:19:58:25:428600 hawq_stop:master:gpadmin-[INFO]:-Cluster reloaded successfully
+    ```
+
+    Configuration parameter value changes made by `hawq config` are system-wide; they are propagated to all segments across the cluster.
+
+
+## <a id="tut_ex_hawqstatecmdline"></a>Exercise: View the State of Your HAWQ Cluster via Ambari
+
+You may choose to use Ambari to manage the HAWQ deployment. The Ambari Web UI provides a graphical front-end to HAWQ cluster management activities.
+
+Perform the following steps to view the state of your HAWQ cluster via the Ambari web console:
+
+1. Start the Ambari web UI: 
+
+    ``` shell
+    <ambari-server-node>:8080
+    ```
+    
+    Ambari runs on port 8080.
+
+2. Log in to the Ambari UI using the Ambari user credentials.
+
+    The Ambari UI dashboard window displays.
+
+3. Select the **HAWQ** service from the service list in the left pane.
+
+    The HAWQ service page **Summary** tab is displayed.  This page includes a **Summary** pane identifying the HAWQ master and all HAWQ segment nodes in your cluster. The **Metrics** pane includes a set of HAWQ-specific metrics tiles.
+
+4. Perform a HAWQ service check operation by selecting the **Run Service Check** item from the **Service Actions** button drop-down menu and **Confirm**ing the operation.
+
+    ![HAWQ Service Actions](imgs/hawqsvcacts.png)
+
+    The **Background Operations Running** dialog displays. This dialog identifies all service-related operations performed on your HAWQ cluster.
+    
+    ![Ambari Background Operations](imgs/ambbgops.png)
+    
+5. Select the most recent **HAWQ Service Check** operation from the top of the **Operations** column. Select the HAWQ master host name from the **HAWQ Service Check** dialog, and then select the **Check HAWQ** task.
+
+    ![HAWQ Service Check Output](imgs/hawqsvccheckout.png)
+
+    The **Check HAWQ** task dialog displays the output of the service check operation. This operation returns the state of your HAWQ cluster, as well as the results of HAWQ database operation tests performed by Ambari.
+
+
+## <a id="tut_ex_ambari_cfg"></a>Exercise: View and Update HAWQ Configuration via Ambari
+
+Perform the following steps to view the HDFS NodeName and set the HAWQ PL/Java `pljava_classpath` configuration parameter and value via Ambari:
+
+1. Navigate to the **HAWQ** service page.
+    
+2. Select the **Configs** tab to view the current HAWQ-specific configuration settings.
+
+    HAWQ general settings displayed include master and segment data and temp directory locations, as well as specific resource management parameters.
+    
+3. Select the **Advanced** tab to view additional HAWQ parameter settings.
+
+    ![HAWQ Advanced Configs](imgs/hawqcfgsadv.png)
+
+    The **General** drop down pane opens. This tab displays information including the HAWQ master hostname and master and segment port numbers.
+    
+4. Locate the **HAWQ DFS URL** configuration parameter setting in the **General** pane. This value should match that returned by `hawq config -s hawq_dfs_url` in the previous exercise. Make note of the HDFS NameNode hostname or IP address if you have not done so previously.
+
+    **Note**: The **HDFS** service, **Configs > Advanced Configs** tab also identifies the HDFS NameNode hostname.
+    
+4. **Advanced \<config\>** and **Custom \<config\>** drop down panes provide access to advanced configuration settings for HAWQ and other cluster components. Select the **Advanced hawq-site** drop down.
+
+    ![Advanced hawq-site](imgs/advhawqsite.png)
+
+    Specific HAWQ configuration parameters and values are displayed in the pane. Hover the mouse cursor over the value field to display a tooltip description of a specific configuration parameter.
+
+5. Select the **Custom hawq-site** drop down.
+
+    Currently configured custom parameters and values are displayed in the pane.  If no configuration parameters are set, the pane will be empty.
+
+6. Select **Add Property ...**.
+
+    The **Add Property** dialog displays. This dialog includes **Type**, **Key**, and **Value** entry fields.
+
+7. Select the single property add mode (single label icon) in the **Add Property** dialog and fill in the fields:
+
+    **Key**: pljava_classpath  
+    **Value**: examples.jar
+    
+    ![Add Property](imgs/addprop.png)
+    
+8. **Add** the custom property, then **Save** the updated configuration, optionally providing a **Note** in the **Save Configuration** dialog.
+    
+    ![Restart Button](imgs/orangerestart.png)
+    
+    Notice the now orange-colored **Restart** button in the right corner of the window. You must restart the HAWQ service after adding or updating configuration parameter values through Ambari.
+
+9. Select the orange **Restart** button to **Restart All Affected** HAWQ nodes.
+
+    You can monitor the restart operation from the **Background Operations Running** dialog.
+
+10. When the restart operation completes, log out of the Ambari console by clicking the **admin** button and selecting the **Sign out** drop down menu item.
+
+## <a id="tut_hawqadmin_summary"></a>Summary
+
+In this lesson, you viewed the state of the HAWQ cluster and learned how to change cluster configuration parameters. 
+
+For additional information on HAWQ server configuration parameters, see [Server Configuration Parameter Reference](../../reference/HAWQSiteConfig.html).
+
+The following table identifies HAWQ management commands used in the tutorial exercises. For detailed information on specific HAWQ management commands, refer to the [HAWQ Management Tools Reference](../../reference/cli/management_tools.html).
+
+<a id="topic_table_clustmgmtcmd"></a>
+
+| Action                                                    | Command                                                                                                                                                                                            |
+|-------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Get HAWQ cluster status | `$ hawq state` |
+| Start/stop/restart HAWQ \<object\> (cluster, master, segment, standby, allsegments) | `$ hawq start <object>` <p> `$ hawq stop <object>` <p> `$ hawq restart <object>` |
+| List all HAWQ configuration parameters and their current settings     | `$ hawq config -l`                                                                                         |
+| Display the current setting of a specific HAWQ configuration parameter    | `$ hawq config -s <param-name>`                                                                                         |
+| Add/change the value of HAWQ configuration parameter (command-line managed HAWQ clusters only)  | `$ hawq config -c <param-name> -v <value>`                                                                                         |
+| Reload HAWQ configuration        | `$ hawq stop cluster --reload`                                                                                         |
+
+
+Lesson 3 introduces basic HAWQ database administration activities and commands.
+
+**Lesson 3**: [Database Administration](basicdbadmin.html)

diff --git a/markdown/tutorial/gettingstarted/dataandscripts.html.md.erb b/markdown/tutorial/gettingstarted/dataandscripts.html.md.erb
new file mode 100644
index 0000000..d50162a
--- /dev/null
+++ b/markdown/tutorial/gettingstarted/dataandscripts.html.md.erb

@@ -0,0 +1,266 @@
+---
+title: Lesson 4 - Sample Data Set and HAWQ Schemas
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+The sample Retail demo data set used in the tutorial exercises models an online retail store operation. The store carries different categories of products. Customers order the products. The company delivers the products to the customers.
+
+This and later exercises operate on this example data set. The data set is provided in a set of `gzip`'d `.tsv` (tab-separated values) text files. The exercises also reference scripts and other supporting files that operate on the data set.
+
+In this section, you are introduced to the Retail demo data schema. You will download and examine the data set and work files. You will also load some of the data into HDFS.
+
+## <a id="tut_dataset_prereq"></a>Prerequisites
+
+Ensure that you have [Created the HAWQ Tutorial Database](basicdbadmin.html#tut_ex_createdb) and that your HAWQ cluster is up and running.
+
+
+## <a id="tut_exdownloadfilessteps"></a>Exercise: Download the Retail Demo Data and Script Files
+
+Perform the following steps to download the sample data set and scripts:
+
+1. Open a terminal window and log in to the HAWQ master node as the `gpadmin` user:
+
+    ``` shell
+    $ ssh gpadmin@<master>
+    ```
+
+3. Create a working directory for the data files and scripts:
+
+    ``` shell
+    gpadmin@master$ mkdir /tmp/hawq_getstart
+    gpadmin@master$ cd /tmp/hawq_getstart
+    ```
+    
+    You may choose a different base work directory. If you do, ensure that all path components up to and including the `hawq_getstart` directory have read and execute permissions for all.
+
+4. Download the tutorial work and data files from `github`, checking out the appropriate tag/branch:
+
+    ``` shell
+    gpadmin@master$ git clone https://github.com/pivotalsoftware/hawq-samples.git
+    Cloning into 'hawq-samples'...
+    remote: Counting objects: 42, done.
+    remote: Total 42 (delta 0), reused 0 (delta 0), pack-reused 42
+    Unpacking objects: 100% (42/42), done.
+    Checking out files: 100% (18/18), done.
+    gpadmin@master$ cd hawq-samples
+    gpadmin@master$ git checkout hawq2x_tutorial
+    ```
+
+5. Save the path to the work files base directory:
+
+    ``` shell
+    gpadmin@master$ export HAWQGSBASE=/tmp/hawq_getstart/hawq-samples
+    ```
+    
+    (If you chose a different base work directory, modify the command as appropriate.) 
+    
+6. Add the `$HAWQGSBASE` environment variable setting to your `.bash_profile`.
+
+7. Examine the tutorial files. Exercises in this guide reference data files and SQL and shell scripts residing in the `hawq-samples` repository.  Specifically:
+  
+    | Directory                                                    | Content                                                                                                                                                                                            |
+    |----------------------------------------|----------------------------------------------------------------------------------|
+    | datasets/retail/ | Retail demo data set data files (`.tsv.gz` format) |
+    | tutorials/getstart/        | *Getting Started with HAWQ* guide work files |
+    | tutorials/getstart/hawq/  | SQL and shell scripts used by the HAWQ tables exercises                    |
+    | tutorials/getstart/pxf/   | SQL and shell scripts used by the PXF exercises                                                                                                                                                                                 |
+    <p>
+
+    (`hawq-samples` repository directories not mentioned in the table above are not used by the *Getting Started with HAWQ* exercises.)
+
+
+## <a id="tut_dsschema_ex"></a>Exercise: Create the Retail Demo HAWQ Schema
+
+A HAWQ schema is a namespace for a database. It contains named objects like tables, data types, functions, and operators. Access these objects by qualifying their name with the prefix `<schema-name>`.
+
+Perform the following steps to create the Retail demo data schema:
+
+1. Start the `psql` subsystem:
+
+    ``` shell
+    gpadmin@master$ psql
+    hawqgsdb=#
+    ```
+    
+    You are connected to the `hawqgsdb` database.
+
+2. List the HAWQ schemas:
+
+    ``` sql
+    hawqgsdb=# \dn
+           List of schemas
+            Name        |  Owner  
+    --------------------+---------
+     hawq_toolkit       | gpadmin
+     information_schema | gpadmin
+     pg_aoseg           | gpadmin
+     pg_bitmapindex     | gpadmin
+     pg_catalog         | gpadmin
+     pg_toast           | gpadmin
+     public             | gpadmin
+    (7 rows)
+    ```
+    
+    Every database includes a schema named `public`. Database objects you create without specifying a schema are created in the default schema. The default HAWQ schema is the `public` schema, unless you explicitly set it to another schema. (More about this later.)
+
+3. Display the tables in the `public` schema:
+
+    ``` sql
+    hawqgsdb=#\dt public.*
+               List of relations
+     Schema |    Name   | Type  |  Owner  |   Storage   
+    --------+-----------+-------+---------+-------------
+     public | first_tbl | table | gpadmin | append only
+    (1 row)
+    ```
+    
+    In Lesson 3, you created the `first_tbl` table in the `public` schema.
+
+4. Create a schema named `retail_demo` to represent the Retail demo namespace:
+
+    ``` sql
+    hawqgsdb=# CREATE SCHEMA retail_demo;
+    CREATE SCHEMA
+    ```
+
+5. The `search_path` server configuration parameter identifies the order in which HAWQ should search or apply schemas for objects. Set the schema search path to include the new `retail_demo` schema first:
+
+    ``` sql
+    hawqgsdb=# SET search_path TO retail_demo, public;
+    SET
+    ```
+    
+    `retail_demo`, the first schema in your `search_path`, becomes your default schema.
+    
+    **Note**: Setting `search_path` in this manner sets the parameter only for the current `psql` session. You must re-set `search_path` in subsequent `psql` sessions.
+
+4. Create another table named `first_tbl`:
+
+    ``` sql
+    hawqgsdb=# CREATE TABLE first_tbl( i int );
+    CREATE TABLE
+    hawqgsdb=# INSERT INTO first_tbl SELECT generate_series(100,103);
+    INSERT 0 4
+    hawqgsdb=# SELECT * FROM first_tbl;
+      i  
+    -----
+     100
+     101
+     102
+     103
+    (4 rows)
+    ```
+    
+    HAWQ creates this table named `first_tbl` in your default schema since no schema was explicitly identified for the table. Your default schema is  `retail_demo` due to your current `search_path` schema ordering.
+
+5. Verify that this `first_tbl` was created in the `retail_demo` schema by displaying the tables in this schema:
+
+    ``` sql
+    hawqgsdb=#\dt retail_demo.*
+                         List of relations
+       Schema    |         Name         | Type  |  Owner  |   Storage   
+    -------------+----------------------+-------+---------+-------------
+     retail_demo | first_tbl            | table | gpadmin | append only
+    (1 row)
+    ```
+
+6. Query the `first_tbl` table that you created in Lesson 3:
+
+    ``` sql
+    hawqgsdb=# SELECT * from public.first_tbl;
+      i 
+    ---
+     1
+     2
+     3
+     4
+     5
+    (5 rows)
+    ```
+
+    You must prepend the table name with `public.` to explicitly identify the `first_tbl` table in which you are interested. 
+
+7. Exit `psql`:
+
+    ``` sql
+    hawqgsdb=# \q
+    ```
+
+## <a id="tut_loadhdfs_ex"></a>Exercise: Load the Dimension Data to HDFS
+
+The Retail demo data set includes the entities described in the table below. A fact table consists of business facts. Orders and order line items are fact tables. Dimension tables provide descriptive information for the measurements in a fact table. The other entities are represented in dimension tables. 
+
+|   Entity   | Description  |
+|---------------------|----------------------------|
+| customers\_dim  |  Customer data: first/last name, id, gender  |
+| customer\_addresses\_dim  |  Address and phone number of each customer |
+| email\_addresses\_dim  |  Customer e-mail addresses |
+| categories\_dim  |  Product category name, id |
+| products\_dim  |  Product details including name, id, category, and price |
+| date\_dim  |  Date information including year, quarter, month, week, day of week |
+| payment\_methods  |  Payment method code, id |
+| orders  |  Details of an order such as the id, payment method, billing address, day/time, and other fields. Each order is associated with a specific customer. |
+| order\_lineitems  |  Details of an order line item such as the id, item id, category, store, shipping address, and other fields. Each line item references a specific product from a specific order from a specific customer. |
+
+Perform the following steps to load the Retail demo dimension data into HDFS for later consumption:
+
+1. Navigate to the PXF script directory:
+
+    ``` shell
+    gpadmin@master$ cd $HAWQGSBASE/tutorials/getstart/pxf
+    ```
+
+2. Using the provided script, load the sample data files representing dimension data into an HDFS directory named `/retail_demo`. The script removes any existing `/retail_demo` directory and contents before loading the data: 
+
+    ``` shell
+    gpadmin@master$ ./load_data_to_HDFS.sh
+    running: sudo -u hdfs hdfs -rm -r -f -skipTrash /retail_demo
+    sudo -u hdfs hdfs dfs -mkdir /retail_demo/categories_dim
+    sudo -u hdfs hdfs dfs -put /tmp/hawq_getstart/hawq-samples/datasets/retail/categories_dim.tsv.gz /retail_demo/categories_dim/
+    sudo -u hdfs hdfs dfs -mkdir /retail_demo/customer_addresses_dim
+    sudo -u hdfs hdfs dfs -put /tmp/hawq_getstart/hawq-samples/datasets/retail/customer_addresses_dim.tsv.gz /retail_demo/customer_addresses_dim/
+    ...
+    ```
+	
+	 `load_to_HDFS.sh` loads the dimension data `.tsv.gz` files directly into HDFS. Each file is loaded to its respective `/retail_demo/<basename>/<basename>.tsv.gz` file path.
+	 
+3. View the contents of the HDFS `/retail_demo` directory hierarchy:
+
+    ``` shell
+    gpadmin@master$ sudo -u hdfs hdfs dfs -ls /retail_demo/*
+    -rw-r--r--   3 hdfs hdfs        590 2017-04-10 19:59 /retail_demo/categories_dim/categories_dim.tsv.gz
+    Found 1 items
+    -rw-r--r--   3 hdfs hdfs   53995977 2017-04-10 19:59 /retail_demo/customer_addresses_dim/customer_addresses_dim.tsv.gz
+    Found 1 items
+    -rw-r--r--   3 hdfs hdfs    4646775 2017-04-10 19:59 /retail_demo/customers_dim/customers_dim.tsv.gz
+    Found 1 items
+    ...
+    
+    Because the retail demo data exists only as `.tsv.gz` files in HDFS, you cannot immediately query the data using HAWQ. In the next lesson, you create HAWQ external tables that reference these data files, after which you can query them via PXF.
+    ```
+
+## <a id="tut_dataset_summary"></a>Summary
+
+In this lesson, you downloaded the tutorial data set and work files, created the `retail_demo` HAWQ schema, and loaded the Retail demo dimension data into HDFS. 
+
+In Lessons 5 and 6, you will create and query HAWQ internal and external tables in the `retail_demo` schema.
+
+**Lesson 5**: [HAWQ Tables](introhawqtbls.html)

diff --git a/markdown/tutorial/gettingstarted/imgs/addprop.png b/markdown/tutorial/gettingstarted/imgs/addprop.png
new file mode 100644
index 0000000..930bc92
--- /dev/null
+++ b/markdown/tutorial/gettingstarted/imgs/addprop.png
Binary files differ

diff --git a/markdown/tutorial/gettingstarted/imgs/advhawqsite.png b/markdown/tutorial/gettingstarted/imgs/advhawqsite.png
new file mode 100644
index 0000000..4d4afa0
--- /dev/null
+++ b/markdown/tutorial/gettingstarted/imgs/advhawqsite.png
Binary files differ

diff --git a/markdown/tutorial/gettingstarted/imgs/ambariconsole.png b/markdown/tutorial/gettingstarted/imgs/ambariconsole.png
new file mode 100644
index 0000000..45a5202
--- /dev/null
+++ b/markdown/tutorial/gettingstarted/imgs/ambariconsole.png
Binary files differ

diff --git a/markdown/tutorial/gettingstarted/imgs/ambbgops.png b/markdown/tutorial/gettingstarted/imgs/ambbgops.png
new file mode 100644
index 0000000..9882371
--- /dev/null
+++ b/markdown/tutorial/gettingstarted/imgs/ambbgops.png
Binary files differ

diff --git a/markdown/tutorial/gettingstarted/imgs/hawqcfgsadv.png b/markdown/tutorial/gettingstarted/imgs/hawqcfgsadv.png
new file mode 100644
index 0000000..5bccd19
--- /dev/null
+++ b/markdown/tutorial/gettingstarted/imgs/hawqcfgsadv.png
Binary files differ

diff --git a/markdown/tutorial/gettingstarted/imgs/hawqsvcacts.png b/markdown/tutorial/gettingstarted/imgs/hawqsvcacts.png
new file mode 100644
index 0000000..775220c
--- /dev/null
+++ b/markdown/tutorial/gettingstarted/imgs/hawqsvcacts.png
Binary files differ

diff --git a/markdown/tutorial/gettingstarted/imgs/hawqsvccheckout.png b/markdown/tutorial/gettingstarted/imgs/hawqsvccheckout.png
new file mode 100644
index 0000000..70d91b1
--- /dev/null
+++ b/markdown/tutorial/gettingstarted/imgs/hawqsvccheckout.png
Binary files differ

diff --git a/markdown/tutorial/gettingstarted/imgs/orangerestart.png b/markdown/tutorial/gettingstarted/imgs/orangerestart.png
new file mode 100644
index 0000000..94f7836
--- /dev/null
+++ b/markdown/tutorial/gettingstarted/imgs/orangerestart.png
Binary files differ

diff --git a/markdown/tutorial/gettingstarted/introhawqenv.html.md.erb b/markdown/tutorial/gettingstarted/introhawqenv.html.md.erb
new file mode 100644
index 0000000..1749d2c
--- /dev/null
+++ b/markdown/tutorial/gettingstarted/introhawqenv.html.md.erb

@@ -0,0 +1,188 @@
+---
+title: Lesson 1 - Runtime Environment
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This section introduces you to the HAWQ runtime environment. You will examine your HAWQ installation, set up your HAWQ environment, and execute HAWQ management commands. If installed in your environment, you will also explore the Ambari management console.
+
+## <a id="tut_runtime_usercred"></a>Prerequisites
+
+- Install a HAWQ commercial product distribution or HAWQ sandbox virtual machine or docker environment, or build and install HAWQ from source. Ensure that your HAWQ installation is configured appropriately.
+
+- Make note of the HAWQ master node hostname or IP address.
+
+- The HAWQ administrative user is named `gpadmin`. This is the user account from which you will administer your HAWQ cluster. To perform the exercises in this tutorial, you must:
+
+    - Obtain the `gpadmin` user credentials.
+
+    - Ensure that your HAWQ runtime environment is configured such that the HAWQ admin user `gpadmin` can run commands to access the HDFS Hadoop system accounts (`hdfs`, `hadoop`) via `sudo` without having to provide a password.
+
+    - Obtain the Ambari UI user name and password (optional, if Ambari is installed in your HAWQ deployment). The default Ambari user name and password are both `admin`.
+
+## <a id="tut_runtime_setup"></a> Exercise: Set Up your HAWQ Runtime Environment
+
+HAWQ installs a script that you can use to set up your HAWQ cluster environment. The `greenplum_path.sh` script, located in your HAWQ root install directory, sets `$PATH` and other environment variables to find HAWQ files.  Most importantly, `greenplum_path.sh` sets the `$GPHOME` environment variable to point to the root directory of the HAWQ installation.  If you installed HAWQ from a product distribution or are running a HAWQ sandbox environment, the HAWQ root is typically `/usr/local/hawq`. If you built HAWQ from source or downloaded the tarball, your `$GPHOME` may differ.
+
+Perform the following steps to set up your HAWQ runtime environment:
+
+4.	Log in to the HAWQ master node using the `gpadmin` user credentials; you may not need to provide a password:
+
+    ``` shell
+    $ ssh gpadmin@<master>
+    Password:
+    gpadmin@master$ 
+    ```
+
+5. Set up your HAWQ operating environment by sourcing the `greenplum_path.sh` file. If you built HAWQ from source or downloaded the tarball, substitute the path to the installed or extracted `greenplum_path.sh` file \(for example `/opt/hawq-2.1.0.0/greenplum_path.sh`\):
+
+    ``` shell
+    gpadmin@master$ source /usr/local/hawq/greenplum_path.sh
+    ```
+    
+    `source`ing `greenplum_path.sh` sets:
+    - `$GPHOME`
+    - `$PATH` to include the HAWQ `$GPHOME/bin/` directory 
+    - `$LD_LIBRARY_PATH` to include the HAWQ libraries in `$GPHOME/lib/`
+    
+    
+    ``` shell
+    gpadmin@master$ echo $GPHOME
+    /usr/local/hawq/.
+    gpadmin@master$ echo $PATH
+    /usr/local/hawq/./bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/gpadmin/bin
+    gpadmin@master$ echo $LD_LIBRARY_PATH
+    /usr/local/hawq/./lib
+    ```
+    
+    **Note**: You must source `greenplum_path.sh` before invoking any HAWQ commands. 
+
+3. Edit your (`gpadmin`) `.bash_profile` or other shell initialization file to source `greenplum_path.sh` on login.  For example, add:
+
+    ``` shell
+    source /usr/local/hawq/greenplum_path.sh
+    ```
+    
+4. Set the HAWQ-specific environment variables relevant to your deployment in your shell initialization file. These include `PGDATABASE`, `PGHOST`, `PGOPTIONS`, `PGPORT`, and `PGUSER.` You may not need to set any of these environment variables. For example, if you use a custom HAWQ master port number, make this port number the default by setting the `PGPORT` environment variable in your shell initialization file; add:
+
+    ``` shell
+    export PGPORT=5432
+    ```
+    
+    Setting `PGPORT` simplifies `psql` invocation by providing a default for the port option value.
+    
+    Similarly, setting `PGDATABASE` simplifies `psql` invocation by providing a default for the database option value.
+
+
+6. Examine your HAWQ installation:
+
+    ``` shell
+    gpadmin@master$ ls $GPHOME
+    bin  docs  etc  greenplum_path.sh  include  lib  sbin  share
+    ```
+    
+    The HAWQ command line utilities are located in `$GPHOME/bin`. `$GPHOME/lib` includes HAWQ and PostgreSQL libraries.
+  
+6. View the current state of your HAWQ cluster, and if it is not already running, start the cluster. In practice, you will perform different procedures depending upon whether you manage your cluster from the command line or use Ambari. While you are introduced to both in this tutorial, lessons will focus on command line instructions, as not every HAWQ deployment will utilize Ambari.<p>
+
+    *Command Line*:
+
+    ``` shell
+    gpadmin@master$ hawq state
+    Failed to connect to database, this script can only be run when the database is up.
+    ```
+    
+    If your cluster is not running, start it:
+    
+    ``` shell
+    gpadmin@master$ hawq start cluster
+    20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-Prepare to do 'hawq start'
+    20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-You can find log in:
+    20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_start_20170411.log
+    20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-GPHOME is set to:
+    20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-/usr/local/hawq/.
+    20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-Start hawq with args: ['start', 'cluster']
+    20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-Gathering information and validating the environment...
+    20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-No standby host configured
+    20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-Start all the nodes in hawq cluster
+    20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-Starting master node 'master'
+    20170411:15:54:47:357122 hawq_start:master:gpadmin-[INFO]:-Start master service
+    20170411:15:54:48:357122 hawq_start:master:gpadmin-[INFO]:-Master started successfully
+    20170411:15:54:48:357122 hawq_start:master:gpadmin-[INFO]:-Start all the segments in hawq cluster
+    20170411:15:54:48:357122 hawq_start:master:gpadmin-[INFO]:-Start segments in list: ['segment']
+    20170411:15:54:48:357122 hawq_start:master:gpadmin-[INFO]:-Start segment service
+    20170411:15:54:48:357122 hawq_start:master:gpadmin-[INFO]:-Total segment number is: 1
+    .....
+    20170411:15:54:53:357122 hawq_start:master:gpadmin-[INFO]:-1 of 1 segments start successfully
+    20170411:15:54:53:357122 hawq_start:master:gpadmin-[INFO]:-Segments started successfully
+    20170411:15:54:53:357122 hawq_start:master:gpadmin-[INFO]:-HAWQ cluster started successfully
+    ```
+    
+    Get the status of your cluster:
+    
+    ``` shell
+    gpadmin@master$ hawq state
+    20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:-- HAWQ instance status summary
+    20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:------------------------------------------------------
+    20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:--   Master instance                                = Active
+    20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:--   No Standby master defined                           
+    20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:--   Total segment instance count from config file  = 1
+    20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:------------------------------------------------------ 
+    20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:--   Segment Status                                    
+    20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:------------------------------------------------------ 
+    20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:--   Total segments count from catalog      = 1
+    20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:--   Total segment valid (at master)        = 1
+    20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:--   Total segment failures (at master)     = 0
+    20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:--   Total number of postmaster.pid files missing   = 0
+    20170411:16:39:18:370305 hawq_state:master:gpadmin-[INFO]:--   Total number of postmaster.pid files found     = 1
+    ```
+    
+    State information returned includes the status of the master node, standby master, number of segment instances, and for each segment, the number valid and failed.<p>
+
+    *Ambari*:
+    
+    If your deployment includes an Ambari server, perform the following steps to start and view the current state of your HAWQ cluster. 
+    
+    1. Start the Ambari management console by entering the following URL in your favorite (supported) browser window:
+
+        ``` shell
+        <ambari-server-node>:8080
+        ```
+
+    2. Log in with the Ambari credentials (default `admin`:`admin`) and view the Ambari dashboard:
+
+        ![Ambari Dashboard](imgs/ambariconsole.png)
+ 
+        The Ambari dashboard provides an at-a-glance status of the health of your HAWQ cluster. A list of each running service and its status is provided in the left panel. The main display area includes a set of configurable tiles providing specific information about your cluster, including HAWQ segment status, HDFS disk usage, and resource manager metrics. 
+        
+    3. Navigate to the **HAWQ** service listed in the left pane. If the service is not running (i.e. no green checkmark to the left of the service name), start your HAWQ cluster by clicking the **HAWQ** service name, and then selecting the **Start** operation from the **Service Actions** menu button.
+
+    4. Log out of the Ambari console by clicking the **admin** button and selecting the **Sign out** drop down menu item.
+
+## <a id="tut_runtime_sumary"></a>Summary
+Your HAWQ cluster is now running. For additional information:
+
+- [HAWQ Files and Directories](../../admin/setuphawqopenv.html#hawq_env_files_and_dirs) identifies HAWQ files and directories and their install locations.
+- [Environment Variables](../../reference/HAWQEnvironmentVariables.html#optionalenvironmentvariables) includes a complete list of HAWQ deployment-specific environment variables.
+- [Running a HAWQ Cluster](../../admin/RunningHAWQ.html) provides an overview of the components comprising a HAWQ cluster, including the users (administrative and operating), deployment systems (HAWQ master, standby, and segments), databases, and data sources.
+
+Lesson 2 introduces basic HAWQ cluster administration activities and commands.
+ 
+**Lesson 2**: [Cluster Administration](basichawqadmin.html)

diff --git a/markdown/tutorial/gettingstarted/introhawqtbls.html.md.erb b/markdown/tutorial/gettingstarted/introhawqtbls.html.md.erb
new file mode 100644
index 0000000..c2a72dc
--- /dev/null
+++ b/markdown/tutorial/gettingstarted/introhawqtbls.html.md.erb

@@ -0,0 +1,222 @@
+---
+title: Lesson 5 - HAWQ Tables
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+HAWQ writes data to, and reads data from, HDFS natively. HAWQ tables are similar to tables in any relational database, except that table rows (data) are distributed across the different segments in the cluster.
+
+In this exercise, you will run scripts that use the SQL `CREATE TABLE` command to create HAWQ tables. You will load the Retail demo fact data into the HAWQ tables using the SQL `COPY` command. You will then perform simple and complex queries on the data.
+
+
+## <a id="tut_introhawqtblprereq"></a>Prerequisites
+
+Ensure that you have:
+
+- [Set Up your HAWQ Runtime Environment](introhawqenv.html#tut_runtime_setup)
+- [Created the HAWQ Tutorial Database](basicdbadmin.html#tut_ex_createdb)
+- [Downloaded the Retail Data and Script Files](dataandscripts.html#tut_exdownloadfilessteps)
+- [Created the Retail Demo HAWQ Schema](dataandscripts.html#tut_dsschema_ex)
+- Started your HAWQ cluster.
+
+## <a id="tut_excreatehawqtblsteps"></a>Exercise: Create, Add Data to, and Query HAWQ Retail Demo Tables
+
+Perform the following steps to create and load HAWQ tables from the sample Retail demo data set. 
+
+1. Navigate to the HAWQ script directory:
+
+    ``` shell
+    gpadmin@master$ cd $HAWQGSBASE/tutorials/getstart/hawq
+    ```
+
+2. Create tables for the Retail demo fact data using the script provided:
+    
+    ``` shell
+    gpadmin@master$ psql -f ./create_hawq_tables.sql 
+    psql:./create_hawq_tables.sql:2: NOTICE:  table "order_lineitems_hawq" does not exist, skipping
+    DROP TABLE
+    CREATE TABLE
+    psql:./create_hawq_tables.sql:41: NOTICE:  table "orders_hawq" does not exist, skipping
+    DROP TABLE
+    CREATE TABLE
+    ```
+	
+    **Note**: The `create_hawq_tables.sql` script deletes each table before attempting to create it. If this is your first time performing this exercise, you can safely ignore the `psql` "table does not exist, skipping" messages.)
+    
+3. Let's take a look at the `create_hawq_tables.sql` script; for example:
+
+    ``` shell
+    gpadmin@master$ vi create_hawq_tables.sql
+    ```
+
+    Notice the use of the `retail_demo.` schema name prefix to the `order_lineitems_hawq` table name:
+    
+    ``` sql
+    DROP TABLE IF EXISTS retail_demo.order_lineitems_hawq;
+    CREATE  TABLE retail_demo.order_lineitems_hawq
+    (
+        order_id TEXT,
+        order_item_id TEXT,
+        product_id TEXT,
+        product_name TEXT,
+        customer_id TEXT,
+        store_id TEXT,
+        item_shipment_status_code TEXT,
+        order_datetime TEXT,
+        ship_datetime TEXT,
+        item_return_datetime TEXT,
+        item_refund_datetime TEXT,
+        product_category_id TEXT,
+        product_category_name TEXT,
+        payment_method_code TEXT,
+        tax_amount TEXT,
+        item_quantity TEXT,
+        item_price TEXT,
+        discount_amount TEXT,
+        coupon_code TEXT,
+        coupon_amount TEXT,
+        ship_address_line1 TEXT,
+        ship_address_line2 TEXT,
+        ship_address_line3 TEXT,
+        ship_address_city TEXT,
+        ship_address_state TEXT,
+        ship_address_postal_code TEXT,
+        ship_address_country TEXT,
+        ship_phone_number TEXT,
+        ship_customer_name TEXT,
+        ship_customer_email_address TEXT,
+        ordering_session_id TEXT,
+        website_url TEXT
+    )
+    WITH (appendonly=true, compresstype=zlib) DISTRIBUTED RANDOMLY;
+    ```
+    
+    The `CREATE TABLE` statement above creates a table named `order_lineitems_hawq` in the `retail_demo` schema. `order_lineitems_hawq` has several columns. `order_id` and `customer_id` provide keys into the orders fact and customers dimension tables. The data in `order_lineitems_hawq` is distributed randomly and is compressed using the `zlib` compression algorithm.
+    
+    The `create_hawq_tables.sql` script also creates the `orders_hawq` fact table.
+
+6. Take a look at the `load_hawq_tables.sh` script:
+
+    ``` shell
+    gpadmin@master$ vi load_hawq_tables.sh
+    ```
+
+    Again, notice the use of the `retail_demo.` schema name prefix to the table names. 
+    
+    Examine the `psql -c` `COPY` commands:
+    
+    ``` shell
+    zcat $DATADIR/order_lineitems.tsv.gz | psql -d hawqgsdb -c "COPY retail_demo.order_lineitems_hawq FROM STDIN DELIMITER E'\t' NULL E'';"
+    zcat $DATADIR/orders.tsv.gz | psql -d hawqgsdb -c "COPY retail_demo.orders_hawq FROM STDIN DELIMITER E'\t' NULL E'';"
+    ```
+    The `load_hawq_tables.sh` shell script uses the `zcat` command to uncompress the `.tsv.gz` data files. The SQL `COPY` command copies `STDIN` (i.e. the output of the `zcat` command) to the HAWQ table. The `COPY` command also identifies the `DELIMITER` used in the file (tab) and the `NULL` string ('').
+    
+6. Use the `load_hawq_tables.sh` script to load the Retail demo fact data into the newly-created tables. This process may take some time to complete.
+
+    ``` shell
+    gpadmin@master$ ./load_hawq_tables.sh
+    ```
+
+6. Use the provided script to verify that the Retail demo fact tables were loaded successfully:
+
+    ``` shell
+    gpadmin@master$ ./verify_load_hawq_tables.sh
+    ```
+
+    The output of the `verify_load_hawq_tables.sh` script should match the following:
+
+    ``` shell						    
+        Table Name                |    Count 
+    ------------------------------+------------------------
+     order_lineitems_hawq         |   744196
+     orders_hawq                  |   512071
+    ------------------------------+------------------------
+    ```
+    
+7. Run a query on the `order_lineitems_hawq` table that returns the `product_id`, `item_quantity`, `item_price`, and `coupon_amount` for all order line items associated with order id `8467975147`:
+
+    ``` shell
+    gpadmin@master$ psql
+    hawqgsdb=# SELECT product_id, item_quantity, item_price, coupon_amount 
+                 FROM retail_demo.order_lineitems_hawq 
+                 WHERE order_id='8467975147' ORDER BY item_price;
+     product_id | item_quantity | item_price | coupon_amount 
+    ------------+---------------+------------+---------------
+     1611429    | 1             | 11.38      | 0.00000
+     1035114    | 1             | 12.95      | 0.15000
+     1382850    | 1             | 17.56      | 0.50000
+     1562908    | 1             | 18.50      | 0.00000
+     1248913    | 1             | 34.99      | 0.50000
+     741706     | 1             | 45.99      | 0.00000
+    (6 rows)
+    ```
+    
+    The `ORDER BY` clause identifies the sort column, `item_price`. If you do not specify an `ORDER BY` column(s), the rows are returned in the order in which they were added to the table.
+
+7. Determine the top three postal codes by order revenue by running the following query on the `orders_hawq` table:
+
+    ``` sql
+    hawqgsdb=# SELECT billing_address_postal_code,
+                 sum(total_paid_amount::float8) AS total,
+                 sum(total_tax_amount::float8) AS tax
+               FROM retail_demo.orders_hawq
+                 GROUP BY billing_address_postal_code
+                 ORDER BY total DESC LIMIT 3;
+    ```
+    
+    Notice the use of the `sum()` aggregate function to add the order totals (`total_amount_paid`) and tax totals (`total_tax_paid`) for all orders. These totals are grouped/summed for each `billing_address_postal_code`.
+    
+    Compare your output to the following:
+ 
+    ``` pre
+     billing_address_postal_code |   total   |    tax    
+    ----------------------------+-----------+-----------
+     48001                       | 111868.32 | 6712.0992
+     15329                       | 107958.24 | 6477.4944
+     42714                       | 103244.58 | 6194.6748
+    (3 rows)
+    ```
+
+10. Run the following query on the `orders_hawq` and `order_lineitems_hawq` tables to display the `product_id`, `item_quantity`, and `item_price` for all line items identifying a `product_id` of `1869831`:
+
+    ``` sql
+    hawqgsdb=# SELECT retail_demo.order_lineitems_hawq.order_id, product_id, item_quantity, item_price
+                 FROM retail_demo.order_lineitems_hawq, retail_demo.orders_hawq
+               WHERE retail_demo.order_lineitems_hawq.order_id=retail_demo.orders_hawq.order_id AND retail_demo.order_lineitems_hawq.product_id=1869831
+                 ORDER BY retail_demo.order_lineitems_hawq.order_id, product_id;
+      order_id  | product_id | item_quantity | item_price 
+    ------------+------------+---------------+------------
+     4831097728 | 1869831    | 1             | 11.87
+     6734073469 | 1869831    | 1             | 11.87
+    (2 rows)
+    ```
+   
+11. Exit the `psql` subsystem:
+
+    ``` sql
+    hawqgsdb=# \q
+    ```
+
+## <a id="tut_introhawqtbl_summary"></a>Summary
+In this lesson, you created and loaded Retail order and order line item data into HAWQ fact tables. You also queried these tables, learning how to filter the data to your needs. 
+
+In Lesson 6, you use PXF external tables to similarly access dimension data stored in HDFS.
+ 
+**Lesson 6**: [HAWQ Extension Framework (PXF)](intropxfhdfs.html)

diff --git a/markdown/tutorial/gettingstarted/intropxfhdfs.html.md.erb b/markdown/tutorial/gettingstarted/intropxfhdfs.html.md.erb
new file mode 100644
index 0000000..029ff2b
--- /dev/null
+++ b/markdown/tutorial/gettingstarted/intropxfhdfs.html.md.erb

@@ -0,0 +1,224 @@
+---
+title: Lesson 6 - HAWQ Extension Framework (PXF)
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+Data in many HAWQ deployments may already reside in external sources. The HAWQ Extension Framework (PXF) provides access to this external data via built-in connectors called plug-ins. PXF plug-ins facilitate mapping a data source to a HAWQ external table definition. PXF is installed with HDFS, Hive, HBase, and JSON plug-ins.
+
+In this exercise, you use the PXF HDFS plug-in to: 
+
+- Create PXF external table definitions
+- Perform queries on the data you loaded into HDFS
+- Run more complex queries on HAWQ and PXF tables
+
+## <a id="tut_intropxfprereq"></a>Prerequisites
+
+Ensure that you have:
+
+- [Set Up your HAWQ Runtime Environment](introhawqenv.html#tut_runtime_setup)
+- [Created the HAWQ Tutorial Database](basicdbadmin.html#tut_ex_createdb)
+- [Downloaded the Retail Data and Script Files](dataandscripts.html#tut_exdownloadfilessteps)
+- [Created the Retail Demo HAWQ Schema](dataandscripts.html#tut_dsschema_ex)
+- [Loaded the Dimension Data to HDFS](dataandscripts.html#tut_loadhdfs_ex)
+- [Created the HAWQ Retail Demo Fact Tables](introhawqtbls.html#tut_excreatehawqtblsteps)
+- Started your HAWQ cluster. 
+
+You should also retrieve the hostname or IP address of the HDFS NameNode that you noted in [View and Update HAWQ Configuration](basichawqadmin.html#tut_ex_cmdline_cfg).
+
+## <a id="tut_excreatepxftblsteps"></a>Exercise: Create and Query PXF External Tables
+
+Perform the following steps to create HAWQ external table definitions to read the dimension data you previously loaded into HDFS.
+
+1. Log in to the HAWQ master node as the `gpadmin` user:
+
+    ``` shell
+    $ ssh gpadmin@<master>
+    ```
+
+2. Navigate to the PXF script directory:
+
+    ``` shell
+    gpadmin@master$ cd $HAWQGSBASE/tutorials/getstart/pxf
+    ```
+
+6. Start the `psql` subsystem:
+
+    ``` shell
+    gpadmin@master$ psql
+    hawqgsdb=#
+    ```
+
+8. Create a HAWQ external table definition to represent the Retail demo `customers_dim` dimension data you loaded into HDFS in Lesson 4; substitute your NameNode hostname or IP address in the \<namenode\> field of the `LOCATION` clause:
+
+	 ``` sql
+    hawqgsdb=# CREATE EXTERNAL TABLE retail_demo.customers_dim_pxf
+                (customer_id TEXT, first_name TEXT,
+                 last_name TEXT, gender TEXT)
+               LOCATION ('pxf://<namenode>:51200/retail_demo/customers_dim/customers_dim.tsv.gz?profile=HdfsTextSimple')
+               FORMAT 'TEXT' (DELIMITER = E'\t');
+    CREATE EXTERNAL TABLE
+    ```
+
+    The `LOCATION` clause of a `CREATE EXTERNAL TABLE` statement specifying the `pxf` protocol must include:
+    - The hostname or IP address of your HAWQ cluster's HDFS \<namenode\>.
+    - The location and/or name of the external data source. You specified the HDFS file path to the `customer_dim` data file above.
+    - The PXF `profile` to use to access the external data. The PXF HDFS plug-in supports the `HdfsTextSimple` profile to access delimited text format data.
+
+    The `FORMAT` clause of a `CREATE EXTERNAL TABLE` statement specifying the `pxf` protocol and `HdfsTextSimple` profile must identify `TEXT` format and include the `DELIMITER` character used to access the external data source. You identified a tab delimiter character above.
+
+5. The `create_pxf_tables.sql` SQL script creates HAWQ external table definitions for the remainder of the Retail dimension data. In another terminal window, edit `create_pxf_tables.sql`, replacing each occurrence of NAMENODE with the hostname or IP address you specified in the previous step. For example:
+
+    ``` shell
+    gpadmin@master$ cd $HAWQGSBASE/tutorials/getstart/pxf
+    gpadmin@master$ vi create_pxf_tables.sql
+    ```
+
+6. Run the `create_pxf_tables.sql` SQL script to create the remainder of the HAWQ external table definitions, then exit the `psql` subsystem:
+
+    ``` sql
+    hawqgsdb=# \i create_pxf_tables.sql
+    hawqgsdb=# \q
+    ```
+    	
+    **Note**: The `create_pxf_tables.sql` script deletes each external table before attempting to create it. If this is your first time performing this exercise, you can safely ignore the `psql` "table does not exist, skipping" messages.
+    
+6. Run the following script to verify that you successfully created the external table definitions:
+
+    ``` shell
+    gpadmin@master$ ./verify_create_pxf_tables.sh 
+    ```
+   	 
+    The output of the script should match the following:
+
+    ``` pre
+        Table Name                 |    Count 
+    -------------------------------+------------------------
+     customers_dim_pxf             |   401430  
+     categories_dim_pxf            |   56 
+     customer_addresses_dim_pxf    |   1130639
+     email_addresses_dim_pxf       |   401430
+     payment_methods_pxf           |   5
+     products_dim_pxf              |   698911
+    -------------------------------+------------------------
+    ```
+
+8. Display the allowed payment methods by running the following query on the `payment_methods_pxf` table:
+
+    ``` sql
+    gpadmin@master$ psql
+    hawqgsdb=# SELECT * FROM retail_demo.payment_methods_pxf;
+     payment_method_id | payment_method_code 
+    -------------------+---------------------
+                     4 | GiftCertificate
+                     3 | CreditCard
+                     5 | FreeReplacement
+                     2 | Credit
+                     1 | COD
+    (5 rows)
+    ```
+
+8. Run the following query on the `customers_dim_pxf` and `customer_addresses_dim_pxf` tables to display the names of all male customers in the 06119 zip code:
+
+    ``` sql
+    hawqgsdb=# SELECT last_name, first_name
+                 FROM retail_demo.customers_dim_pxf, retail_demo.customer_addresses_dim_pxf
+               WHERE retail_demo.customers_dim_pxf.customer_id=retail_demo.customer_addresses_dim_pxf.customer_id AND
+                 retail_demo.customer_addresses_dim_pxf.zip_code='06119' AND 
+                 retail_demo.customers_dim_pxf.gender='M';
+    ```
+
+    Compare your output to the following:
+ 
+    ``` shell
+     last_name | first_name 
+    -----------+------------
+     Gigliotti | Maurice
+     Detweiler | Rashaad
+     Nusbaum   | Morton
+     Mann      | Damian
+     ...
+    ```
+
+11. Exit the `psql` subsystem:
+
+    ``` sql
+    hawqgsdb=# \q
+    ```
+
+
+## <a id="tut_exhawqpxfquerysteps"></a>Exercise: Query HAWQ and PXF Tables
+
+Often, data will reside in both HAWQ tables and external data sources. In these instances, you can use both HAWQ internal and PXF external tables to relate and query the data.
+
+Perform the following steps to identify the names and email addresses of all customers who made gift certificate purchases, providing an overall order total for such purchases. The orders fact data resides in a HAWQ-managed table and the customers data resides in HDFS.
+
+1. Start the `psql` subsystem:
+
+    ``` shell
+    gpadmin@master$ psql
+    hawqgsdb=#
+    ```
+
+2. The orders fact data is accessible via the `orders_hawq` table created in the previous lesson. The customers data is accessible via the `customers_dim_pxf` table created in the previous exercise. Using these internal and external HAWQ  tables, construct a query to identify the names and email addresses of all customers who made gift certificate purchases; also include an overall order total for such purchases:
+
+    ``` sql
+    hawqgsdb=# SELECT substring(retail_demo.orders_hawq.customer_email_address for 37) AS email_address, last_name, 
+                 sum(retail_demo.orders_hawq.total_paid_amount::float8) AS gift_cert_total
+               FROM retail_demo.customers_dim_pxf, retail_demo.orders_hawq
+               WHERE retail_demo.orders_hawq.payment_method_code='GiftCertificate' AND 
+                     retail_demo.orders_hawq.customer_id=retail_demo.customers_dim_pxf.customer_id
+               GROUP BY retail_demo.orders_hawq.customer_email_address, last_name ORDER BY last_name;
+    ```
+    
+    The `SELECT` statement above uses columns from the HAWQ `orders_hawq` and PXF external `customers_dim_pxf` tables to form the query. The `orders_hawq` `customer_id` field is compared with the `customers_dim_pxf` `customer_id` field to produce the orders associated with a specific customer where the `orders_hawq` `payment_method_code` identifies `GiftCertificate`.
+    
+    Query output:
+    
+    ``` pre
+                 email_address             |   last_name    |   gift_cert_total    
+    ---------------------------------------+----------------+-------------------
+     Christopher.Aaron@phpmydirectory.com  | Aaron          |             17.16
+     Libbie.Aaron@qatarw.com               | Aaron          |            102.33
+     Jay.Aaron@aljsad.net                  | Aaron          |             72.36
+     Marybelle.Abad@idividi.com.mk         | Abad           |             14.97
+     Suellen.Abad@anatranny.com            | Abad           |            125.93
+     Luvenia.Abad@mediabiz.de              | Abad           |            107.99
+     ...
+    ```
+    
+    Enter `q` at any time to exit the query results.
+
+3. Exit the `psql` subsystem:
+
+    ``` sql
+    hawqgsdb=# \q
+    ```
+
+## <a id="tut_intropxf_summary"></a>Summary    
+In this lesson, you created PXF external tables to access HDFS data and queried these tables. You also performed a query using this external data and the HAWQ internal fact tables created previously, executing business logic on both your managed and unmanaged data.
+
+For additional information about PXF, refer to [Using PXF with Unmanaged Data](../../pxf/HawqExtensionFrameworkPXF.html).
+
+Refer to [Accessing HDFS File Data](../../pxf/HDFSFileDataPXF.html) for detailed information about the PXF HDFS Plug-in.
+
+This lesson wraps up the *Getting Started with HAWQ* tutorial. Now that you are familiar with basic environment set-up, cluster, database, and data management activities, you should feel more confident interacting with your HAWQ cluster.
+ 
+**Next Steps**: View HAWQ documentation related to [Running a HAWQ Cluster](../../admin/RunningHAWQ.html).

diff --git a/markdown/tutorial/overview.html.md.erb b/markdown/tutorial/overview.html.md.erb
new file mode 100644
index 0000000..7216b62
--- /dev/null
+++ b/markdown/tutorial/overview.html.md.erb

@@ -0,0 +1,46 @@
+---
+title: Getting Started with HAWQ
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## <a id="tut_getstartov"></a>Overview
+
+This tutorial provides a quick introduction to get you up and running with your HAWQ installation.  You will be introduced to basic HAWQ functionality, including cluster management, database creation, and simple querying. You will also become acquainted with using the HAWQ Extension Framework (PXF) to access and query external HDFS data sources.
+
+
+## <a id="tut_getstartov_prereq"></a>Prerequisites
+
+Ensure that you have a running HAWQ 2.x single or multi-node cluster. You may choose to use a:
+
+- HAWQ commercial product distribution, such as [Pivotal HDB](https://pivotal.io/pivotal-hdb).
+- [HAWQ sandbox virtual machine](https://network.pivotal.io/products/pivotal-hdb) or [HAWQ docker environment](https://github.com/apache/incubator-hawq/tree/master/contrib/hawq-docker).
+- HAWQ installation you built from [source](https://cwiki.apache.org/confluence/display/HAWQ/Build+and+Install).
+
+## <a id="tut_hawqexlist"></a>Lessons 
+
+This guide includes the following content and exercises:
+
+[Lesson 1: Runtime Environment](gettingstarted/introhawqenv.html) - Examine and set up the HAWQ runtime environment.  
+[Lesson 2: Cluster Administration](gettingstarted/basichawqadmin.html) - Perform common HAWQ cluster management activities.  
+[Lesson 3: Database Administration](gettingstarted/basicdbadmin.html) - Perform common HAWQ database management activities.  
+[Lesson 4: Sample Data Set and HAWQ Schemas](gettingstarted/dataandscripts.html) - Download tutorial data and work files, create the Retail demo schema, load data to HDFS.  
+[Lesson 5: HAWQ Tables](gettingstarted/introhawqtbls.html) - Create and query HAWQ-managed tables.  
+[Lesson 6: HAWQ Extension Framework (PXF)](gettingstarted/intropxfhdfs.html) - Use PXF to access external HDFS data.
commit	b50bbf2590b5047a011462dd6f31750026ea1c8e	[log] [tgz]
author	David Yozie <yozie@apache.org>	Tue May 23 16:18:10 2017 -0700
committer	David Yozie <yozie@apache.org>	Tue May 23 16:18:10 2017 -0700
tree	e0ceb8a064b53ee3175800e7f2eb1a7f0ac1eded
parent	a65fa2896a266c70faf248b30583fdade46faaae [diff]
parent	18ee9446db2dbe05522eb528c6d91a5de0fff070 [diff]