tree: 1fcf83684e02d19b954acd6c9066c3f522436b0d [path history] [tgz]
  1. examples/
  2. src/
  3. pom.xml
  4. README.md
streaming-twitter/README.md

Spark Streaming Twitter Connector

A library for reading social data from twitter using Spark Streaming.

Linking

Using SBT:

libraryDependencies += "org.apache.bahir" %% "spark-streaming-twitter" % "{{site.SPARK_VERSION}}"

Using Maven:

<dependency>
    <groupId>org.apache.bahir</groupId>
    <artifactId>spark-streaming-twitter_{{site.SCALA_BINARY_VERSION}}</artifactId>
    <version>{{site.SPARK_VERSION}}</version>
</dependency>

This library can also be added to Spark jobs launched through spark-shell or spark-submit by using the --packages command line option. For example, to include it when starting the spark shell:

$ bin/spark-shell --packages org.apache.bahir:spark-streaming-twitter_{{site.SCALA_BINARY_VERSION}}:{{site.SPARK_VERSION}}

Unlike using --jars, using --packages ensures that this library and its dependencies will be added to the classpath. The --packages argument can also be used with bin/spark-submit.

This library is cross-published for Scala 2.11 and Scala 2.12, so users should replace the proper Scala version in the commands listed above.

Examples

TwitterUtils uses Twitter4j to get the public stream of tweets using Twitter's Streaming API. Authentication information can be provided by any of the methods supported by Twitter4J library. You can import the TwitterUtils class and create a DStream with TwitterUtils.createStream as shown below.

Scala API

import org.apache.spark.streaming.twitter._

TwitterUtils.createStream(ssc, None)

Java API

import org.apache.spark.streaming.twitter.*;

TwitterUtils.createStream(jssc);

You can also either get the public stream, or get the filtered stream based on keywords. See end-to-end examples at Twitter Examples.

Unit Test

Executing integration tests requires users to register custom application at Twitter Developer Portal and obtain private OAuth credentials. Below listing present how to run complete test suite on local workstation.

cd streaming-twitter
env ENABLE_TWITTER_TESTS=1 \
    twitter4j.oauth.consumerKey=${customer key} \
    twitter4j.oauth.consumerSecret=${customer secret} \
    twitter4j.oauth.accessToken=${access token} \
    twitter4j.oauth.accessTokenSecret=${access token secret} \
    mvn clean test