| --- |
| layout: global |
| title: JSON Files |
| displayTitle: JSON Files |
| --- |
| |
| <div class="codetabs"> |
| |
| <div data-lang="scala" markdown="1"> |
| Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset[Row]`. |
| This conversion can be done using `SparkSession.read.json()` on either a `Dataset[String]`, |
| or a JSON file. |
| |
| Note that the file that is offered as _a json file_ is not a typical JSON file. Each |
| line must contain a separate, self-contained valid JSON object. For more information, please see |
| [JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/). |
| |
| For a regular multi-line JSON file, set the `multiLine` option to `true`. |
| |
| {% include_example json_dataset scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %} |
| </div> |
| |
| <div data-lang="java" markdown="1"> |
| Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset<Row>`. |
| This conversion can be done using `SparkSession.read().json()` on either a `Dataset<String>`, |
| or a JSON file. |
| |
| Note that the file that is offered as _a json file_ is not a typical JSON file. Each |
| line must contain a separate, self-contained valid JSON object. For more information, please see |
| [JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/). |
| |
| For a regular multi-line JSON file, set the `multiLine` option to `true`. |
| |
| {% include_example json_dataset java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %} |
| </div> |
| |
| <div data-lang="python" markdown="1"> |
| Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame. |
| This conversion can be done using `SparkSession.read.json` on a JSON file. |
| |
| Note that the file that is offered as _a json file_ is not a typical JSON file. Each |
| line must contain a separate, self-contained valid JSON object. For more information, please see |
| [JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/). |
| |
| For a regular multi-line JSON file, set the `multiLine` parameter to `True`. |
| |
| {% include_example json_dataset python/sql/datasource.py %} |
| </div> |
| |
| <div data-lang="r" markdown="1"> |
| Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame. using |
| the `read.json()` function, which loads data from a directory of JSON files where each line of the |
| files is a JSON object. |
| |
| Note that the file that is offered as _a json file_ is not a typical JSON file. Each |
| line must contain a separate, self-contained valid JSON object. For more information, please see |
| [JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/). |
| |
| For a regular multi-line JSON file, set a named parameter `multiLine` to `TRUE`. |
| |
| {% include_example json_dataset r/RSparkSQLExample.R %} |
| |
| </div> |
| |
| <div data-lang="sql" markdown="1"> |
| |
| {% highlight sql %} |
| |
| CREATE TEMPORARY VIEW jsonTable |
| USING org.apache.spark.sql.json |
| OPTIONS ( |
| path "examples/src/main/resources/people.json" |
| ) |
| |
| SELECT * FROM jsonTable |
| |
| {% endhighlight %} |
| |
| </div> |
| |
| </div> |