| <!DOCTYPE html> |
| <html> |
| |
| <head> |
| |
| <meta charset="UTF-8"> |
| <meta name=viewport content="width=device-width, initial-scale=1"> |
| |
| |
| <title>Schema-free JSON Data Infrastructure - Apache Drill</title> |
| |
| <link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.3.0/css/font-awesome.min.css" rel="stylesheet" type="text/css"/> |
| <link href="/css/site.css" rel="stylesheet" type="text/css"/> |
| |
| <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon"/> |
| <link rel="icon" href="/favicon.ico" type="image/x-icon"/> |
| |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.1/jquery.min.js" language="javascript" type="text/javascript"></script> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery-easing/1.3/jquery.easing.min.js" language="javascript" type="text/javascript"></script> |
| <script language="javascript" type="text/javascript" src="/js/modernizr.custom.js"></script> |
| <script language="javascript" type="text/javascript" src="/js/script.js"></script> |
| <script language="javascript" type="text/javascript" src="/js/drill.js"></script> |
| |
| </head> |
| |
| |
| <body onResize="resized();"> |
| <div class="page-wrap"> |
| <div class="bui"></div> |
| |
| <div id="menu" class="mw"> |
| <ul> |
| <li class='toc-categories'> |
| <a class="expand-toc-icon" href="javascript:void(0);"><i class="fa fa-bars"></i></a> |
| </li> |
| <li class="logo"><a href="/"></a></li> |
| <li class='expand-menu'> |
| <a href="javascript:void(0);"><span class='menu-text'>Menu</span><span class='expand-icon'><i class="fa fa-bars"></i></span></a> |
| </li> |
| <li class="clear-float"></li> |
| <li class="nav"> |
| <a>Language</a> |
| <ul> |
| |
| <li> |
| <a style="font-weight: bold;" href="/blog/2015/01/27/schema-free-json-data-infrastructure/" >en</a> |
| </li> |
| |
| <li> |
| <a href="/zh/blog/2015/01/27/schema-free-json-data-infrastructure/" >zh</a> |
| </li> |
| |
| </ul> |
| </li> |
| <li class="apache-link"> |
| <a href="/apacheASF/">Apache</a> |
| </li> |
| <li class="poweredby"> |
| <a href="/poweredBy">Powered By</a> |
| </li> |
| <li class="documentation-menu"> |
| <a href="/docs/">Documentation</a> |
| <ul> |
| |
| |
| <li><a href="/docs/getting-started/">Getting Started</a></li> |
| |
| <li><a href="/docs/architecture/">Architecture</a></li> |
| |
| <li><a href="/docs/tutorials/">Tutorials</a></li> |
| |
| <li><a href="/docs/drill-on-yarn/">Drill-on-YARN</a></li> |
| |
| <li><a href="/docs/install-drill/">Install Drill</a></li> |
| |
| <li><a href="/docs/configure-drill/">Configure Drill</a></li> |
| |
| <li><a href="/docs/connect-a-data-source/">Connect a Data Source</a></li> |
| |
| <li><a href="/docs/odbc-jdbc-interfaces/">ODBC/JDBC Interfaces</a></li> |
| |
| <li><a href="/docs/query-data/">Query Data</a></li> |
| |
| <li><a href="/docs/performance-tuning/">Performance Tuning</a></li> |
| |
| <li><a href="/docs/log-and-debug/">Log and Debug</a></li> |
| |
| <li><a href="/docs/sql-reference/">SQL Reference</a></li> |
| |
| <li><a href="/docs/data-sources-and-file-formats/">Data Sources and File Formats</a></li> |
| |
| <li><a href="/docs/develop-custom-functions/">Develop Custom Functions</a></li> |
| |
| <li><a href="/docs/troubleshooting/">Troubleshooting</a></li> |
| |
| <li><a href="/docs/developer-information/">Developer Information</a></li> |
| |
| <li><a href="/docs/release-notes/">Release Notes</a></li> |
| |
| <li><a href="/docs/sample-datasets/">Sample Datasets</a></li> |
| |
| <li><a href="/docs/project-bylaws/">Project Bylaws</a></li> |
| |
| <li><a href="/docs/ecosystem/">Ecosystem</a></li> |
| |
| </ul> |
| </li> |
| <li class='nav'> |
| <a href="/community-resources/">Community</a> |
| <ul> |
| <li><a href="/team/">Team</a></li> |
| <li><a href="/mailinglists/">Mailing Lists</a></li> |
| <li><a href="/community-resources/">Community Resources</a></li> |
| </ul> |
| </li> |
| <li class='nav'><a href="/faq/">FAQ</a></li> |
| <li class='nav'><a href="/blog/">Blog</a></li> |
| <li class="social-menu-item"><a href="https://twitter.com/apachedrill" title="apachedrill on twitter" target="_blank"><img src="/images/twitter_32_26_white.png" alt="twitter logo" align="center"></a> </li> |
| <li class="social-menu-item"><a href="https://join.slack.com/t/apache-drill/shared_invite/enQtNTQ4MjM1MDA3MzQ2LTJlYmUxMTRkMmUwYmQ2NTllYmFmMjU4MDk0NjYwZjBmYjg0MDZmOTE2ZDg0ZjBlYmI3Yjc4Y2I2NTQyNGVlZTc" title="Apache Drill Slack channels" |
| target="_blank"><img src="/images/slack-logo.svg" alt="Slack logo" align="center"></a> </li> |
| <li class='search-bar'> |
| <form id="drill-search-form"> |
| <input type="text" placeholder="Search Apache Drill" id="drill-search-term" /> |
| <button type="submit"> |
| <i class="fa fa-search"></i> |
| </button> |
| </form> |
| </li> |
| <li class="d"> |
| <a href="/download/"> |
| <i class="fa fa-cloud-download"></i> Download |
| </a> |
| </li> |
| </ul> |
| </div> |
| |
| <link href="/css/content.css" rel="stylesheet" type="text/css"> |
| |
| <div class="post int_text"> |
| <header class="post-header"> |
| <div class="int_title"> |
| <h1 class="post-title">Schema-free JSON Data Infrastructure</h1> |
| </div> |
| <p class="post-meta"> |
| |
| |
| |
| <strong>Author:</strong> Tomer Shiran (Founder, PMC Member and Committer, Apache Drill)<br /> |
| |
| <strong>Date:</strong> Jan 27, 2015 |
| </p> |
| </header> |
| <div class="addthis_sharing_toolbox"></div> |
| |
| <article class="post-content"> |
| <p>JSON has emerged in recent years as the de-facto standard data exchange format. It is being used everywhere. Front-end Web applications use JSON to maintain data and communicate with back-end applications. Web APIs are JSON-based (eg, <a href="https://dev.twitter.com/rest/public">Twitter REST APIs</a>, <a href="http://developers.marketo.com/documentation/rest/">Marketo REST APIs</a>, <a href="https://developer.github.com/v3/">GitHub API</a>). It’s the format of choice for public datasets, operational log files and more.</p> |
| |
| <h1 id="why-is-json-a-convenient-data-exchange-format">Why is JSON a Convenient Data Exchange Format?</h1> |
| |
| <p>While I won’t dive into the historical roots of JSON (JavaScript Object Notation, <a href="http://en.wikipedia.org/wiki/JSON#JavaScript_eval.28.29"><code class="language-plaintext highlighter-rouge">eval()</code></a>, etc.), I do want to highlight several attributes of JSON that make it a convenient data exchange format:</p> |
| |
| <ul> |
| <li><strong>JSON is self-describing</strong>. You can look at a JSON document and understand what it represents. The field names are included in the document. You don’t need an external schema or definition to interpret JSON-encoded data. This makes life easier for anyone who wants to deal with the data, and it also means that a collection of JSON documents represents what many people call a “schema-less dataset” (where structure can evolve, and different records can have different fields).</li> |
| <li><strong>JSON is simple</strong>. Other self-describing formats such as XML are much more complicated. A JSON document is made up of arrays and maps (or objects, in JSON terminology), and that’s about it.</li> |
| <li><strong>JSON can naturally represent real-world objects</strong>. Try representing your application’s <code class="language-plaintext highlighter-rouge">Customer</code> object (with the person’s address, order history, etc.) in a CSV file or a relational database. It’s hard. In fact, ORM systems were invented to help alleviate this issue.</li> |
| <li><strong>JSON libraries are available in virtually every programming language</strong>. Take a look at <a href="http://www.json.org/">the list of supported languages on JSON.org</a>. I counted 15 languages that start with the letters A, B or C.</li> |
| <li><strong>JSON is idiomatic in loosely typed languages</strong>. Many loosely typed languages, such as Python, Ruby and JavaScript, have data structures that are similar to JSON objects, making it very natural to handle JSON data in those languages. For example, a Python dictionary looks just like a JSON object. This makes it easy for developers to utilize JSON in their applications.</li> |
| </ul> |
| |
| <h1 id="json-data-infrastructure">JSON Data Infrastructure</h1> |
| |
| <p>Traditional data infrastructure, such as relational databases, has some features that make it easier to store and process JSON-encoded data. For example, Oracle has <a href="https://docs.oracle.com/database/121/ADXDB/json.htm">a JSON data type and a set of functions for handling JSON data</a>.</p> |
| |
| <p>However, a new class of data infrastructure is providing a much more seamless experience via a full-fledged JSON data model. For example:</p> |
| |
| <ul> |
| <li>Drill is a SQL engine in which each record is conceptually a JSON document.</li> |
| <li>Elasticsearch is a search engine in which each indexed document is conceptually a JSON document.</li> |
| <li>MongoDB is an operational database in which each record is conceptually a JSON document.</li> |
| </ul> |
| |
| <p>These systems view JSON as a data model as opposed to one of many data types, realizing that JSON offers a simple way to represent real-world objects.</p> |
| |
| <table> |
| <thead> |
| <tr> |
| <th> </th> |
| <th>Traditional Infrastructure</th> |
| <th>JSON Infrastructure</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><strong>Examples:</strong></td> |
| <td>Oracle, SQL Server</td> |
| <td>Drill, Elasticsearch, MongoDB</td> |
| </tr> |
| <tr> |
| <td><strong>Record:</strong></td> |
| <td>Tuple</td> |
| <td>JSON document</td> |
| </tr> |
| <tr> |
| <td><strong>Variable schema:</strong></td> |
| <td>No</td> |
| <td>Yes</td> |
| </tr> |
| </tbody> |
| </table> |
| |
| <p>If you happen to be in the Bay Area tomorrow, please join Gaurav Gupta (VP Product Management, Elasticsearch), Paul Pedersen (Deputy CTO, MongoDB), Robert Greene (Senior Principal Product Manager, Oracle), Sukanta Ganguly (VP Solutions Architecture, Aerospike) and me for a panel moderated by Gartner’s Nick Heudecker on this new world of schema-free JSON. Check out <a href="http://www.meetup.com/SF-Bay-Areas-Big-Data-Think-Tank/">The Hive Big Data Think Tank</a> for more information.</p> |
| |
| </article> |
| <div id="disqus_thread"></div> |
| <script type="text/javascript"> |
| /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ |
| var disqus_shortname = 'drill'; // required: replace example with your forum shortname |
| |
| /* * * DON'T EDIT BELOW THIS LINE * * */ |
| (function() { |
| var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; |
| dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; |
| (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); |
| })(); |
| </script> |
| <noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> |
| |
| </div> |
| <script type="text/javascript" src="https://s7.addthis.com/js/300/addthis_widget.js#pubid=ra-548b2caa33765e8d" async="async"></script> |
| |
| </div> |
| <p class="push"></p> |
| <div id="footer" class="mw"> |
| <div class="wrapper"> |
| Copyright © 2012-2025 The Apache Software Foundation, licensed under the Apache License, Version 2.0.<br> |
| Apache and the Apache feather logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners.<br/><br/> |
| </div> |
| </div> |
| |
| <script type="text/javascript" src="https://s7.addthis.com/js/300/addthis_widget.js#pubid=ra-548b2caa33765e8d" async="async"></script> |
| |
| </body> |
| </html> |