tree: a562273652502525a437d6c870e7fc256d7bac83 [path history] [tgz]
  1. src/
  2. pom.xml
  3. README.md
integration/presto/README.md

Please follow the below steps to query carbondata in presto

Config presto server

  • Download presto server (0.187 is suggested and supported) : https://repo1.maven.org/maven2/com/facebook/presto/presto-server/

  • Finish presto configuration following https://prestodb.io/docs/current/installation/deployment.html. A configuration example:

    config.properties:
    coordinator=true
    node-scheduler.include-coordinator=true
    http-server.http.port=8086
    query.max-memory=5GB
    query.max-memory-per-node=1GB
    discovery-server.enabled=true
    discovery.uri=http://localhost:8086
    reorder-joins=true
    
    
    jvm.config:
    -server
    -Xmx4G
    -XX:+UseG1GC
    -XX:G1HeapRegionSize=32M
    -XX:+UseGCOverheadLimit
    -XX:+ExplicitGCInvokesConcurrent
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:OnOutOfMemoryError=kill -9 %p
    -XX:+TraceClassLoading
    
    log.properties:
    com.facebook.presto=DEBUG
    com.facebook.presto.server.PluginManager=DEBUG
    
    node.properties:
    node.environment=carbondata
    node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
    node.data-dir=/Users/apple/DEMO/presto_test/data
    
  • Config carbondata-connector for presto

    Firstly: Compile carbondata, including carbondata-presto integration module

    $ git clone https://github.com/apache/carbondata
    $ cd carbondata
    $ mvn -DskipTests -P{spark-version} -Dspark.version={spark-version-number} -Dhadoop.version={hadoop-version-number} clean package
    

    Replace the spark and hadoop version with the version used in your cluster. For example, if you are using Spark 2.2.1 and Hadoop 2.7.2, you would like to compile using:

    mvn -DskipTests -Pspark-2.2 -Dspark.version=2.2.1 -Dhadoop.version=2.7.2 clean package
    

    Secondly: Create a folder named ‘carbondata’ under $PRESTO_HOME$/plugin and copy all jars from carbondata/integration/presto/target/carbondata-presto-x.x.x-SNAPSHOT to $PRESTO_HOME$/plugin/carbondata

    Thirdly: Create a carbondata.properties file under $PRESTO_HOME$/etc/catalog/ containing the following contents:

    connector.name=carbondata
    carbondata-store={schema-store-path}
    enable.unsafe.in.query.processing=false
    carbon.unsafe.working.memory.in.mb={value}
    

    Replace the schema-store-path with the absolute path of the parent directory of the schema. For example, if you have a schema named ‘default’ stored in hdfs://namenode:9000/test/carbondata/, Then set carbondata-store=hdfs://namenode:9000/test/carbondata

    enable.unsafe.in.query.processing property by default is true in CarbonData system, the carbon.unsafe.working.memory.in.mb property defines the limit for Unsafe Memory usage in Mega Bytes, the default value is 512 MB. If your tables are big you can increase the unsafe memory, or disable unsafe via setting enable.unsafe.in.query.processing=false.

    If you updated the jar balls or configuration files, make sure you have dispatched them to all the presto nodes and restarted the presto servers on the nodes. The updates will not take effect before restarting.

Generate CarbonData file

Please refer to quick start: https://github.com/apache/carbondata/blob/master/docs/quick-start-guide.md. Load data statement in Spark can be used to create carbondata tables. And then you can easily find the created carbondata files.

Query carbondata in CLI of presto