blob: 51446a551c00ee5ba66da66a3e632bb3541ea2f0 [file] [log] [blame]
:py:mod:`tests.system.providers.apache.hive.example_twitter_dag`
================================================================
.. py:module:: tests.system.providers.apache.hive.example_twitter_dag
.. autoapi-nested-parse::
This is an example dag for managing twitter data.
Module Contents
---------------
Functions
~~~~~~~~~
.. autoapisummary::
tests.system.providers.apache.hive.example_twitter_dag.fetch_tweets
tests.system.providers.apache.hive.example_twitter_dag.clean_tweets
tests.system.providers.apache.hive.example_twitter_dag.analyze_tweets
tests.system.providers.apache.hive.example_twitter_dag.transfer_to_db
Attributes
~~~~~~~~~~
.. autoapisummary::
tests.system.providers.apache.hive.example_twitter_dag.ENV_ID
tests.system.providers.apache.hive.example_twitter_dag.DAG_ID
tests.system.providers.apache.hive.example_twitter_dag.fetch
tests.system.providers.apache.hive.example_twitter_dag.test_run
.. py:data:: ENV_ID
.. py:data:: DAG_ID
:annotation: = example_twitter_dag
.. py:function:: fetch_tweets()
This task should call Twitter API and retrieve tweets from yesterday from and to for the four twitter
users (Twitter_A,..,Twitter_D) There should be eight csv output files generated by this task and naming
convention is direction(from or to)_twitterHandle_date.csv
.. py:function:: clean_tweets()
This is a placeholder to clean the eight files. In this step you can get rid of or cherry pick columns
and different parts of the text.
.. py:function:: analyze_tweets()
This is a placeholder to analyze the twitter data. Could simply be a sentiment analysis through algorithms
like bag of words or something more complicated. You can also take a look at Web Services to do such
tasks.
.. py:function:: transfer_to_db()
This is a placeholder to extract summary from Hive data and store it to MySQL.
.. py:data:: fetch
.. py:data:: test_run