blob: 18f655d2d33afd10cc7a51d5cd20842cab510570 [file] [log] [blame]
---
title: Using Profiles to Read and Write Data
---
PXF profiles are collections of common metadata attributes that can be used to simplify the reading and writing of data. You can use any of the built-in profiles that come with PXF or you can create your own.
For example, if you are writing single line records to text files on HDFS, you could use the built-in HdfsTextSimple profile. You specify this profile when you create the PXF external table used to write the data to HDFS.
## <a id="built-inprofiles"></a>Built-In Profiles
PXF comes with a number of built-in profiles that group together a collection of metadata attributes. PXF built-in profiles simplify access to the following types of data storage systems:
- HDFS File Data (Read + Write)
- Hive (Read only)
- HBase (Read only)
- JSON (Read only)
You can specify a built-in profile when you want to read data that exists inside HDFS files, Hive tables, HBase tables, and JSON files and for writing data into HDFS files.
<table>
<colgroup>
<col width="33%" />
<col width="33%" />
<col width="33%" />
</colgroup>
<thead>
<tr class="header">
<th>Profile</th>
<th>Description</th>
<th>Fragmenter/Accessor/Resolver</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>HdfsTextSimple</td>
<td>Read or write delimited single line records from or to plain text files on HDFS.</td>
<td><ul>
<li>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</li>
<li>org.apache.hawq.pxf.plugins.hdfs.LineBreakAccessor</li>
<li>org.apache.hawq.pxf.plugins.hdfs.StringPassResolver</li>
</ul></td>
</tr>
<tr class="even">
<td>HdfsTextMulti</td>
<td>Read delimited single or multi-line records (with quoted linefeeds) from plain text files on HDFS. This profile is not splittable (non parallel); reading is slower than reading with HdfsTextSimple.</td>
<td><ul>
<li>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</li>
<li>org.apache.hawq.pxf.plugins.hdfs.QuotedLineBreakAccessor</li>
<li>org.apache.hawq.pxf.plugins.hdfs.StringPassResolver</li>
</ul></td>
</tr>
<tr class="odd">
<td>Hive</td>
<td>Read a Hive table with any of the available storage formats: text, RC, ORC, Sequence, or Parquet.</td>
<td><ul>
<li>org.apache.hawq.pxf.plugins.hive.HiveDataFragmenter</li>
<li>org.apache.hawq.pxf.plugins.hive.HiveAccessor</li>
<li>org.apache.hawq.pxf.plugins.hive.HiveResolver</li>
</ul></td>
</tr>
<tr class="even">
<td>HiveRC</td>
<td>Optimized read of a Hive table where each partition is stored as an RCFile.
<div class="note note">
Note: The <code class="ph codeph">DELIMITER</code> parameter is mandatory.
</div></td>
<td><ul>
<li>org.apache.hawq.pxf.plugins.hive.HiveInputFormatFragmenter</li>
<li>org.apache.hawq.pxf.plugins.hive.HiveRCFileAccessor</li>
<li>org.apache.hawq.pxf.plugins.hive.HiveColumnarSerdeResolver</li>
</ul></td>
</tr>
<tr class="odd">
<td>HiveText</td>
<td>Optimized read of a Hive table where each partition is stored as a text file.
<div class="note note">
Note: The <code class="ph codeph">DELIMITER</code> parameter is mandatory.
</div></td>
<td><ul>
<li>org.apache.hawq.pxf.plugins.hive.HiveInputFormatFragmenter</li>
<li>org.apache.hawq.pxf.plugins.hive.HiveLineBreakAccessor</li>
<li>org.apache.hawq.pxf.plugins.hive.HiveStringPassResolver</li>
</ul></td>
</tr>
<tr class="even">
<td>HBase</td>
<td>Read an HBase data store engine.</td>
<td><ul>
<li>org.apache.hawq.pxf.plugins.hbase.HBaseDataFragmenter</li>
<li>org.apache.hawq.pxf.plugins.hbase.HBaseAccessor</li>
<li>org.apache.hawq.pxf.plugins.hbase.HBaseResolver</li>
</ul></td>
</tr>
<tr class="odd">
<td>Avro</td>
<td>Read Avro files (fileName.avro).</td>
<td><ul>
<li>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</li>
<li>org.apache.hawq.pxf.plugins.hdfs.AvroFileAccessor</li>
<li>org.apache.hawq.pxf.plugins.hdfs.AvroResolver</li>
</ul></td>
</tr>
<tr class="odd">
<td>JSON</td>
<td>Read JSON files (fileName.json) from HDFS.</td>
<td><ul>
<li>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</li>
<li>org.apache.hawq.pxf.plugins.json.JsonAccessor</li>
<li>org.apache.hawq.pxf.plugins.json.JsonResolver</li>
</ul></td>
</tr>
</tbody>
</table>
## <a id="addingandupdatingprofiles"></a>Adding and Updating Profiles
Each profile has a mandatory unique name and an optional description. In addition, each profile contains a set of plug-ins that are an extensible set of metadata attributes. Administrators can add new profiles or edit the built-in profiles defined in `/etc/pxf/conf/pxf-profiles.xml`.
**Note:** Add the JAR files associated with custom PXF plug-ins to the `/etc/pxf/conf/pxf-public.classpath` configuration file.
After you make changes in `pxf-profiles.xml` (or any other PXF configuration file), propagate the changes to all nodes with PXF installed, and then restart the PXF service on all nodes.