Ingest and backup Amazon S3 data to Hadoop HDFS for data download from Amazon to hadoop.
This application transfers files from the configured S3 location to the destination path in HDFS. The source code is available at: https://github.com/DataTorrent/examples/tree/master/tutorials/s3-to-hdfs-sync Send feedback or feature requests to feedback@datatorrent.com
Sample application to show how to use the S3 tuple output module.
The application reads records from HDFS using FSRecordReaderModule
. These records are then written to Amazon S3 using S3BytesOutputModule
.
The properties file META-INF/properties.xml shows how to configure the respective operators.
shell> mvn clean package
This will generate application package s3-tuple-output-1.0-SNAPSHOT.apa inside target directory.
Use the application package generated above to launch the application from UI console(if available) or apex command line interface.
apex> launch target/s3-tuple-output-1.0-SNAPSHOT.apa
Sample application to show how to use the S3OutputModule to upload files into Amazon S3 Bucket.
Operators in sample application are as follows:
Please configure the below S3OutputModule properties in src/main/resources/META-INF/properties.xml before launching the application:
accessKey - String
secretAccessKey - String
For more information about access key and secret access key, Please refer to IAM
bucketName - String
outputDirectoryPath - String
Suppose, app.hdfs2s3 is the name of the bucket and you want to copy the files to S3 location (app.hdfs2s3/apex/s3output) then configure the properties as below:
<property> <name>dt.operator.S3OutputModule.prop.bucketName</name> <value>app.hdfs2s3</value> </property> <property> <name>dt.operator.S3OutputModule.prop.outputDirectoryPath</name> <value>apex/s3output</value> </property>
Sample application to show how to use the S3OutputModule to upload files into Amazon S3 Bucket.
Operators in sample application are as follows:
Please configure the below S3OutputModule properties in src/main/resources/META-INF/properties.xml before launching the application:
accessKey - String
secretAccessKey - String
For more information about access key and secret access key, Please refer to IAM
bucketName - String
outputDirectoryPath - String
Suppose, app.hdfs2s3 is the name of the bucket and you want to copy the files to S3 location (app.hdfs2s3/apex/s3output) then configure the properties as below:
<property> <name>dt.operator.S3OutputModule.prop.bucketName</name> <value>app.hdfs2s3</value> </property> <property> <name>dt.operator.S3OutputModule.prop.outputDirectoryPath</name> <value>apex/s3output</value> </property>