examples/hadoop_cql3_word_count/README.txt - cassandra - Git at Google

 Introduction
 ============

 WordCount hadoop example: Inserts a bunch of words across multiple rows,
 and counts them, with RandomPartitioner. The word_count_counters example sums
 the value of counter columns for a key.

 The scripts in bin/ assume you are running with cwd of examples/word_count.


 Running
 =======

 First build and start a Cassandra server with the default configuration*. Ensure that the Thrift
 interface is enabled, either by setting start_rpc:true in cassandra.yaml or by running
 `nodetool enablethrift` after startup.
 Once Cassandra has started and the Thrift interface is available, run

 contrib/word_count$ ant
 contrib/word_count$ bin/word_count_setup
 contrib/word_count$ bin/word_count
 contrib/word_count$ bin/word_count_counters

 In order to view the results in Cassandra, one can use bin/cqlsh and
 perform the following operations:
 $ bin/cqlsh localhost
 > use cql3_wordcount;
 > select * from output_words;

 The output of the word count can now be configured. In the bin/word_count
 file, you can specify the OUTPUT_REDUCER. The two options are 'filesystem'
 and 'cassandra'. The filesystem option outputs to the /tmp/word_count*
 directories. The cassandra option outputs to the 'output_words' column family
 in the 'cql3_wordcount' keyspace.  'cassandra' is the default.

 Read the code in src/ for more details.

 The word_count_counters example sums the counter columns for a row. The output
 is written to a text file in /tmp/word_count_counters.

 *It is recommended to turn off vnodes when running Cassandra with hadoop.
 This is done by setting "num_tokens: 1" in cassandra.yaml. If you want to
 point wordcount at a real cluster, modify the seed and listenaddress
 settings accordingly.


 Troubleshooting
 ===============

 word_count uses conf/logback.xml to log to wc.out.
	Introduction
	============

	WordCount hadoop example: Inserts a bunch of words across multiple rows,
	and counts them, with RandomPartitioner. The word_count_counters example sums
	the value of counter columns for a key.

	The scripts in bin/ assume you are running with cwd of examples/word_count.


	Running
	=======

	First build and start a Cassandra server with the default configuration*. Ensure that the Thrift
	interface is enabled, either by setting start_rpc:true in cassandra.yaml or by running
	`nodetool enablethrift` after startup.
	Once Cassandra has started and the Thrift interface is available, run

	contrib/word_count$ ant
	contrib/word_count$ bin/word_count_setup
	contrib/word_count$ bin/word_count
	contrib/word_count$ bin/word_count_counters

	In order to view the results in Cassandra, one can use bin/cqlsh and
	perform the following operations:
	$ bin/cqlsh localhost
	> use cql3_wordcount;
	> select * from output_words;

	The output of the word count can now be configured. In the bin/word_count
	file, you can specify the OUTPUT_REDUCER. The two options are 'filesystem'
	and 'cassandra'. The filesystem option outputs to the /tmp/word_count*
	directories. The cassandra option outputs to the 'output_words' column family
	in the 'cql3_wordcount' keyspace. 'cassandra' is the default.

	Read the code in src/ for more details.

	The word_count_counters example sums the counter columns for a row. The output
	is written to a text file in /tmp/word_count_counters.

	*It is recommended to turn off vnodes when running Cassandra with hadoop.
	This is done by setting "num_tokens: 1" in cassandra.yaml. If you want to
	point wordcount at a real cluster, modify the seed and listenaddress
	settings accordingly.


	Troubleshooting
	===============

	word_count uses conf/logback.xml to log to wc.out.