src/site/xdoc/user-guide.xml - commons-csv - Git at Google

 <?xml version="1.0"?>
 <!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 -->
 <document>
   <properties>
     <title>User Guide</title>
     <author email="dev@commons.apache.org">Apache Commons Documentation Team</author>
   </properties>
   <body>
     <!-- ================================================== -->

     <h1>Apache Commons CSV User Guide</h1>

     <macro name="toc">
     </macro>

     <section name="Parsing files">

       Parsing files with Apache Commons CSV is relatively straight forward.
       The CSVFormat class provides some commonly used CSV variants:

       <dl>
         <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#DEFAULT">DEFAULT</a></dt><dd>Standard Comma Separated Value format, as for RFC4180 but allowing empty lines.</dd>
         <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#EXCEL">EXCEL</a></dt><dd>The Microsoft Excel CSV format.</dd>
         <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#INFORMIX_UNLOAD">INFORMIX_UNLOAD<sup>1.3</sup></a></dt><dd>Informix <a href="http://www.ibm.com/support/knowledgecenter/SSBJG3_2.5.0/com.ibm.gen_busug.doc/c_fgl_InOutSql_UNLOAD.htm">UNLOAD</a> format used by the <code>UNLOAD TO file_name</code> operation.</dd>
         <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#INFORMIX_UNLOAD_CSV">INFORMIX_UNLOAD_CSV<sup>1.3</sup></a></dt><dd>Informix <a href="http://www.ibm.com/support/knowledgecenter/SSBJG3_2.5.0/com.ibm.gen_busug.doc/c_fgl_InOutSql_UNLOAD.htm">CSV UNLOAD</a> format used by the <code>UNLOAD TO file_name</code> operation (escaping is disabled.)</dd>
         <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#MYSQL">MONGO_CSV<sup>1.7</sup></a></dt><dd>MongoDB CSV format used by the <code>mongoexport</code> operation.</dd>
         <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#MYSQL">MONGO_TSV<sup>1.7</sup></a></dt><dd>MongoDB TSV format used by the <code>mongoexport</code> operation.</dd>
         <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#MYSQL">MYSQL</a></dt><dd>The MySQL CSV format.</dd>
         <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#ORACLE">ORACLE<sup>1.6</sup></a></dt><dd>Default Oracle format used by the SQL*Loader utility.</dd>
         <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#POSTGRESSQL_CSV">POSTGRESSQL_CSV<sup>1.5</sup></a></dt><dd>Default PostgreSQL CSV format used by the COPY operation.</dd>
         <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#POSTGRESSQL_TEXT">POSTGRESSQL_TEXT<sup>1.5</sup></a></dt><dd>Default PostgreSQL text format used by the COPY operation.</dd>
         <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#RFC4180">RFC-4180</a></dt><dd>The RFC-4180 format defined by <a href="https://datatracker.ietf.org/doc/html/rfc4180">RFC-4180</a>.</dd>
         <dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#TDF">TDF</a></dt><dd>A tab delimited format.</dd>
       </dl>

       <subsection name="Example: Parsing an Excel CSV File">
         <p>To parse an Excel CSV file, write:</p>
         <source>Reader in = new FileReader(&quot;path/to/file.csv&quot;);
 Iterable&lt;CSVRecord&gt; records = CSVFormat.EXCEL.parse(in);
 for (CSVRecord record : records) {
     String lastName = record.get("Last Name");
     String firstName = record.get("First Name");
 }
         </source>
       </subsection>
       <subsection name="Handling Byte Order Marks">
         <p>
           To handle files that start with a Byte Order Mark (BOM) like some Excel CSV files, you need an extra step to
           deal with these optional bytes.
           You can use the
           <a href="https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html">
             BOMInputStream
           </a>
           class from
           <a href="https://commons.apache.org/proper/commons-io/">Apache Commons IO</a>
           for example:
         </p>
         <source>final URL url = ...;
 try (final Reader reader = new InputStreamReader(new BOMInputStream(url.openStream()), "UTF-8");
      final CSVParser parser = CSVFormat.EXCEL.builder()
        .setHeader()
        .build()
        .parse(reader)) {
     for (final CSVRecord record : parser) {
         final String string = record.get("SomeColumn");
         ...
     }
 }
         </source>
         <p>
           You might find it handy to create something like this:
         </p>
         <source>/**
  * Creates a reader capable of handling BOMs.
  */
 public InputStreamReader newReader(final InputStream inputStream) {
     return new InputStreamReader(new BOMInputStream(inputStream), StandardCharsets.UTF_8);
 }
         </source>
       </subsection>
     </section>

     <section name="Working with headers">

       Apache Commons CSV provides several ways to access record values.
       The simplest way is to access values by their index in the record.
       However, columns in CSV files often have a name, for example: ID, CustomerNo, Birthday, etc.
       The CSVFormat class provides an API for specifying these <i>header</i> names and CSVRecord on
       the other hand has methods to access values by their corresponding header name.

       <subsection name="Accessing column values by index">
         To access a record value by index, no special configuration of the CSVFormat is necessary:
         <source>Reader in = new FileReader(&quot;path/to/file.csv&quot;);
 Iterable&lt;CSVRecord&gt; records = CSVFormat.RFC4180.parse(in);
 for (CSVRecord record : records) {
     String columnOne = record.get(0);
     String columnTwo = record.get(1);
 }
         </source>
       </subsection>
       <subsection name="Defining a header manually">
         Indices may not be the most intuitive way to access record values. For this reason it is possible to
         assign names to each column in the file:
         <source>Reader in = new FileReader(&quot;path/to/file.csv&quot;);
 Iterable&lt;CSVRecord&gt; records = CSVFormat.RFC4180.builder()
   .setHeader("ID", "CustomerNo", "Name")
   .build()
   .parse(in);
 for (CSVRecord record : records) {
     String id = record.get("ID");
     String customerNo = record.get("CustomerNo");
     String name = record.get("Name");
 }
         </source>
         Note that column values can still be accessed using their index.
       </subsection>
       <subsection name="Using an enum to define a header">
         Using String values all over the code to reference columns can be error prone. For this reason,
         it is possible to define an enum to specify header names. Note that the enum constant names are
         used to access column values. This may lead to enums constant names which do not follow the Java
         coding standard of defining constants in upper case with underscores:
         <source>public enum Headers {
     ID, CustomerNo, Name
 }
 Reader in = new FileReader(&quot;path/to/file.csv&quot;);
 Iterable&lt;CSVRecord&gt; records = CSVFormat.RFC4180.builder()
   .setHeader(Headers.class)
   .build()
   .parse(in);
 for (CSVRecord record : records) {
     String id = record.get(Headers.ID);
     String customerNo = record.get(Headers.CustomerNo);
     String name = record.get(Headers.Name);
 }
         </source>
         Again it is possible to access values by their index and by using a String (for example "CustomerNo").
       </subsection>
       <subsection name="Header auto detection">
         Some CSV files define header names in their first record. If configured, Apache Commons CSV can parse
         the header names from the first record:
         <source>Reader in = new FileReader(&quot;path/to/file.csv&quot;);
 Iterable&lt;CSVRecord&gt; records = CSVFormat.RFC4180.builder()
   .setHeader()
   .setSkipHeaderRecord(true)
   .build()
   .parse(in);
 for (CSVRecord record : records) {
     String id = record.get("ID");
     String customerNo = record.get("CustomerNo");
     String name = record.get("Name");
 }
         </source>
         This will use the values from the first record as header names and skip the first record when iterating.
       </subsection>
       <subsection name="Printing with headers">
         <p>
           To print a CSV file with headers, you specify the headers in the format:
         </p>
         <source>final Appendable out = ...;
 final CSVPrinter printer = CSVFormat.DEFAULT.builder()
   .setHeader("H1", "H2")
   .build()
   .print(out);
         </source>
         <p>
           To print a CSV file with JDBC column labels, you specify the ResultSet in the format:
         </p>
         <source>try (final ResultSet resultSet = ...) {
   final CSVPrinter printer = CSVFormat.DEFAULT.builder()
     .setHeader(resultSet)
     .build()
     .print(out);
 }
         </source>
       </subsection>
     </section>
   </body>
 </document>
	<?xml version="1.0"?>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->
	<document>
	<properties>
	<title>User Guide</title>
	<author email="dev@commons.apache.org">Apache Commons Documentation Team</author>
	</properties>
	<body>
	<!-- ================================================== -->

	<h1>Apache Commons CSV User Guide</h1>

	<macro name="toc">
	</macro>

	<section name="Parsing files">

	Parsing files with Apache Commons CSV is relatively straight forward.
	The CSVFormat class provides some commonly used CSV variants:

	<dl>
	<dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#DEFAULT">DEFAULT</a></dt><dd>Standard Comma Separated Value format, as for RFC4180 but allowing empty lines.</dd>
	<dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#EXCEL">EXCEL</a></dt><dd>The Microsoft Excel CSV format.</dd>
	<dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#INFORMIX_UNLOAD">INFORMIX_UNLOAD<sup>1.3</sup></a></dt><dd>Informix <a href="http://www.ibm.com/support/knowledgecenter/SSBJG3_2.5.0/com.ibm.gen_busug.doc/c_fgl_InOutSql_UNLOAD.htm">UNLOAD</a> format used by the <code>UNLOAD TO file_name</code> operation.</dd>
	<dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#INFORMIX_UNLOAD_CSV">INFORMIX_UNLOAD_CSV<sup>1.3</sup></a></dt><dd>Informix <a href="http://www.ibm.com/support/knowledgecenter/SSBJG3_2.5.0/com.ibm.gen_busug.doc/c_fgl_InOutSql_UNLOAD.htm">CSV UNLOAD</a> format used by the <code>UNLOAD TO file_name</code> operation (escaping is disabled.)</dd>
	<dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#MYSQL">MONGO_CSV<sup>1.7</sup></a></dt><dd>MongoDB CSV format used by the <code>mongoexport</code> operation.</dd>
	<dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#MYSQL">MONGO_TSV<sup>1.7</sup></a></dt><dd>MongoDB TSV format used by the <code>mongoexport</code> operation.</dd>
	<dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#MYSQL">MYSQL</a></dt><dd>The MySQL CSV format.</dd>
	<dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#ORACLE">ORACLE<sup>1.6</sup></a></dt><dd>Default Oracle format used by the SQL*Loader utility.</dd>
	<dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#POSTGRESSQL_CSV">POSTGRESSQL_CSV<sup>1.5</sup></a></dt><dd>Default PostgreSQL CSV format used by the COPY operation.</dd>
	<dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#POSTGRESSQL_TEXT">POSTGRESSQL_TEXT<sup>1.5</sup></a></dt><dd>Default PostgreSQL text format used by the COPY operation.</dd>
	<dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#RFC4180">RFC-4180</a></dt><dd>The RFC-4180 format defined by <a href="https://datatracker.ietf.org/doc/html/rfc4180">RFC-4180</a>.</dd>
	<dt><a href="https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#TDF">TDF</a></dt><dd>A tab delimited format.</dd>
	</dl>

	<subsection name="Example: Parsing an Excel CSV File">
	<p>To parse an Excel CSV file, write:</p>
	<source>Reader in = new FileReader("path/to/file.csv");
	Iterable<CSVRecord> records = CSVFormat.EXCEL.parse(in);
	for (CSVRecord record : records) {
	String lastName = record.get("Last Name");
	String firstName = record.get("First Name");
	}
	</source>
	</subsection>
	<subsection name="Handling Byte Order Marks">
	<p>
	To handle files that start with a Byte Order Mark (BOM) like some Excel CSV files, you need an extra step to
	deal with these optional bytes.
	You can use the
	<a href="https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html">
	BOMInputStream
	</a>
	class from
	<a href="https://commons.apache.org/proper/commons-io/">Apache Commons IO</a>
	for example:
	</p>
	<source>final URL url = ...;
	try (final Reader reader = new InputStreamReader(new BOMInputStream(url.openStream()), "UTF-8");
	final CSVParser parser = CSVFormat.EXCEL.builder()
	.setHeader()
	.build()
	.parse(reader)) {
	for (final CSVRecord record : parser) {
	final String string = record.get("SomeColumn");
	...
	}
	}
	</source>
	<p>
	You might find it handy to create something like this:
	</p>
	<source>/**
	* Creates a reader capable of handling BOMs.
	*/
	public InputStreamReader newReader(final InputStream inputStream) {
	return new InputStreamReader(new BOMInputStream(inputStream), StandardCharsets.UTF_8);
	}
	</source>
	</subsection>
	</section>

	<section name="Working with headers">

	Apache Commons CSV provides several ways to access record values.
	The simplest way is to access values by their index in the record.
	However, columns in CSV files often have a name, for example: ID, CustomerNo, Birthday, etc.
	The CSVFormat class provides an API for specifying these <i>header</i> names and CSVRecord on
	the other hand has methods to access values by their corresponding header name.

	<subsection name="Accessing column values by index">
	To access a record value by index, no special configuration of the CSVFormat is necessary:
	<source>Reader in = new FileReader("path/to/file.csv");
	Iterable<CSVRecord> records = CSVFormat.RFC4180.parse(in);
	for (CSVRecord record : records) {
	String columnOne = record.get(0);
	String columnTwo = record.get(1);
	}
	</source>
	</subsection>
	<subsection name="Defining a header manually">
	Indices may not be the most intuitive way to access record values. For this reason it is possible to
	assign names to each column in the file:
	<source>Reader in = new FileReader("path/to/file.csv");
	Iterable<CSVRecord> records = CSVFormat.RFC4180.builder()
	.setHeader("ID", "CustomerNo", "Name")
	.build()
	.parse(in);
	for (CSVRecord record : records) {
	String id = record.get("ID");
	String customerNo = record.get("CustomerNo");
	String name = record.get("Name");
	}
	</source>
	Note that column values can still be accessed using their index.
	</subsection>
	<subsection name="Using an enum to define a header">
	Using String values all over the code to reference columns can be error prone. For this reason,
	it is possible to define an enum to specify header names. Note that the enum constant names are
	used to access column values. This may lead to enums constant names which do not follow the Java
	coding standard of defining constants in upper case with underscores:
	<source>public enum Headers {
	ID, CustomerNo, Name
	}
	Reader in = new FileReader("path/to/file.csv");
	Iterable<CSVRecord> records = CSVFormat.RFC4180.builder()
	.setHeader(Headers.class)
	.build()
	.parse(in);
	for (CSVRecord record : records) {
	String id = record.get(Headers.ID);
	String customerNo = record.get(Headers.CustomerNo);
	String name = record.get(Headers.Name);
	}
	</source>
	Again it is possible to access values by their index and by using a String (for example "CustomerNo").
	</subsection>
	<subsection name="Header auto detection">
	Some CSV files define header names in their first record. If configured, Apache Commons CSV can parse
	the header names from the first record:
	<source>Reader in = new FileReader("path/to/file.csv");
	Iterable<CSVRecord> records = CSVFormat.RFC4180.builder()
	.setHeader()
	.setSkipHeaderRecord(true)
	.build()
	.parse(in);
	for (CSVRecord record : records) {
	String id = record.get("ID");
	String customerNo = record.get("CustomerNo");
	String name = record.get("Name");
	}
	</source>
	This will use the values from the first record as header names and skip the first record when iterating.
	</subsection>
	<subsection name="Printing with headers">
	<p>
	To print a CSV file with headers, you specify the headers in the format:
	</p>
	<source>final Appendable out = ...;
	final CSVPrinter printer = CSVFormat.DEFAULT.builder()
	.setHeader("H1", "H2")
	.build()
	.print(out);
	</source>
	<p>
	To print a CSV file with JDBC column labels, you specify the ResultSet in the format:
	</p>
	<source>try (final ResultSet resultSet = ...) {
	final CSVPrinter printer = CSVFormat.DEFAULT.builder()
	.setHeader(resultSet)
	.build()
	.print(out);
	}
	</source>
	</subsection>
	</section>
	</body>
	</document>