Sub-task
[KAFKA-176] - Fix existing perf tools
[KAFKA-237] - create/delete ZK path for a topic in an admin tool
[KAFKA-239] - Wire existing producer and consumer to use the new ZK data structure
[KAFKA-240] - implement new producer and consumer request format
[KAFKA-329] - Remove the watches/broker for new topics and partitions and change create topic admin API to send start replica state change to all brokers
[KAFKA-335] - Implement an embedded controller
[KAFKA-336] - add an admin RPC to communicate state changes between the controller and the broker
[KAFKA-337] - upgrade ZKClient to allow conditional updates in ZK
[KAFKA-338] - controller failover
[KAFKA-339] - using MultiFetch in the follower
[KAFKA-342] - revisit the broker startup procedure according to V3 design
[KAFKA-343] - revisit the become leader and become follower state change operations using V3 design
[KAFKA-344] - migration tool from 0.7 to 0.8
[KAFKA-356] - Create a generic Kafka thread class that includes basic boiler plate code of instantiating and shutting down threads cleanly
[KAFKA-362] - ZookeeperConsumerConnector needs to connect to new leader after leadership change
[KAFKA-369] - remove ZK dependency on producer
[KAFKA-458] - remove errorcode from ByteBufferMessageSet
[KAFKA-482] - Make producer to run for the entire duration of the System Test
[KAFKA-488] - Port Mirroring System Test to this python system test framework
[KAFKA-492] - Sometimes the python system test framework doesn't terminate all running processes
[KAFKA-494] - Relative paths should be used for svg URLs in dashboards html
[KAFKA-502] - Simplify setup / initialization in replication_basic_test.py
[KAFKA-503] - Support "testcase_to_run" or "testcase_to_skip"
[KAFKA-507] - Shut down ZK last to avoid hanging brokers running processes
[KAFKA-513] - Add state change log to Kafka brokers
[KAFKA-571] - Add more test cases to System Test
[KAFKA-731] - ~/ivy2/cache should be a variable in the kafka-run-class bash script
[KAFKA-780] - Reassign partitions tool produces NPE in shutdown handler
[KAFKA-814] - Controller should not throw exception when a preferred replica is already the leader for a partition
[KAFKA-843] - Re-add the release-zip sbt target
Bug
[KAFKA-15] - SBT release-zip target doesn't include bin and config directories anymore
[KAFKA-42] - Support rebalancing the partitions with replication
[KAFKA-43] - Rebalance to preferred broke with intra-cluster replication support
[KAFKA-46] - Commit thread, ReplicaFetcherThread for intra-cluster replication
[KAFKA-49] - Add acknowledgement to the produce request.
[KAFKA-81] - wrong path in bin/kafka-run-class.sh
[KAFKA-97] - SocketServer.scala refers to Handler-specific variables
[KAFKA-171] - Kafka producer should do a single write to send message sets
[KAFKA-192] - CompressionUtilTest does not run and fails when it does
[KAFKA-215] - Improve system tests for the mirroring code
[KAFKA-229] - SimpleConsumer is not logging exceptions correctly so detailed stack trace is not coming in the logs
[KAFKA-259] - Give better error message when trying to run shell scripts without having built/downloaded the jars yet
[KAFKA-295] - Bug in async producer DefaultEventHandler retry logic
[KAFKA-305] - SyncProducer does not correctly timeout
[KAFKA-306] - broker failure system test broken on replication branch
[KAFKA-351] - Refactor some new components introduced for replication
[KAFKA-352] - Throw exception to client if it makes a produce/consume request to a Kafka broker for a topic that hasn't been created
[KAFKA-367] - StringEncoder/StringDecoder use platform default character set
[KAFKA-370] - Exception "java.util.NoSuchElementException: None.get" appears inconsistently in Mirror Maker log.
[KAFKA-371] - Creating topic of empty string puts broker in a bad state
[KAFKA-376] - expose different data to fetch requests from the follower replicas and consumer clients
[KAFKA-379] - TopicCount.constructTopicCount isn't thread-safe
[KAFKA-382] - Write ordering guarantee violated
[KAFKA-385] - RequestPurgatory enhancements - expire/checkSatisfy issue; add jmx beans
[KAFKA-386] - Race condition in accessing ISR
[KAFKA-391] - Producer request and response classes should use maps
[KAFKA-396] - Mirroring system test fails on 0.8
[KAFKA-412] - deal with empty TopicData list in producer and fetch request
[KAFKA-413] - single_host_multi_brokers system test fails on laptop
[KAFKA-415] - Controller throws NoSuchElementException while marking a broker failed
[KAFKA-416] - Controller tests throw several zookeeper errors
[KAFKA-418] - NullPointerException in ConsumerFetcherManager
[KAFKA-420] - maintain HW correctly with only 1 replica in ISR
[KAFKA-422] - LazyInitProducerTest has transient test failure
[KAFKA-424] - Remove invalid mirroring arguments from kafka-server-start.sh
[KAFKA-425] - Wrong class name in performance test scripts
[KAFKA-427] - LogRecoverTest.testHWCheckpointWithFailuresSingleLogSegment has transient failure
[KAFKA-428] - need to update leaderAndISR path in ZK conditionally in ReplicaManager
[KAFKA-431] - LogCorruptionTest.testMessageSizeTooLarge fails occasionally
[KAFKA-433] - ensurePartitionLeaderOnThisBroker should not access ZK
[KAFKA-434] - IllegalMonitorStateException in ReplicaManager.makerFollower
[KAFKA-452] - follower replica may need to backoff the fetching if leader is not ready yet
[KAFKA-453] - follower replica may need to backoff the fetching if leader is not ready yet
[KAFKA-456] - ProducerSendThread calls ListBuffer.size a whole bunch. That is a O(n) operation
[KAFKA-459] - KafkaController.RequestSendThread can throw exception on broker socket
[KAFKA-460] - ControllerChannelManager needs synchronization btw shutdown and add/remove broker
[KAFKA-461] - remove support for format for magic byte 0 in 0.8
[KAFKA-463] - log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log
[KAFKA-464] - KafkaController NPE in SessionExpireListener
[KAFKA-466] - Controller tests throw IllegalStateException
[KAFKA-467] - Controller based leader election failed ERROR messages in LazyInitProducerTest
[KAFKA-468] - String#getBytes is platform dependent
[KAFKA-470] - transient unit test failure in RequestPurgatoryTest
[KAFKA-471] - Transient failure in ProducerTest
[KAFKA-473] - Use getMetadata Api in ZookeeperConsumerConnector
[KAFKA-474] - support changing host/port of a broker
[KAFKA-481] - Require values in Utils.getTopic* methods to be positive
[KAFKA-490] - Check max message size on server instead of producer
[KAFKA-491] - KafkaRequestHandler needs to handle exceptions
[KAFKA-495] - Handle topic names with "/" on Kafka server
[KAFKA-497] - recover consumer during unclean leadership change
[KAFKA-499] - Refactor controller state machine
[KAFKA-500] - javaapi support for getTopoicMetaData
[KAFKA-501] - getOfffset Api needs to return different latest offset to regular and follower consumers
[KAFKA-506] - Store logical offset in log
[KAFKA-508] - split out partiondata from fetchresponse and producerrequest
[KAFKA-509] - server should shut down on encountering invalid highwatermark file
[KAFKA-510] - broker needs to know the replication factor per partition
[KAFKA-511] - offset returned in Producer response may not be correct
[KAFKA-512] - Remove checksum from ByteBufferMessageSet.iterator
[KAFKA-514] - Replication with Leader Failure Test: Log segment files checksum mismatch
[KAFKA-516] - Consider catching all exceptions in ShutdownableThread
[KAFKA-525] - newly created partitions are not added to ReplicaStateMachine
[KAFKA-528] - IndexOutOfBoundsException thrown by kafka.consumer.ConsumerFetcherThread
[KAFKA-531] - kafka.server.ReplicaManager: java.nio.channels.NonWritableChannelException
[KAFKA-537] - expose clientId and correlationId in ConsumerConfig
[KAFKA-539] - Replica.hw should be initialized to the smaller of checkedpointed HW and log end offset
[KAFKA-540] - log.append() should halt on IOException
[KAFKA-553] - confusing reference to zk.connect in config/producer.properties
[KAFKA-556] - Change MessageSet.sizeInBytes to Int
[KAFKA-557] - Replica fetch thread doesn't need to recompute message id
[KAFKA-562] - Non-failure System Test Log Segment File Checksums mismatched
[KAFKA-563] - KafkaScheduler shutdown in ZookeeperConsumerConnector should check for config.autocommit
[KAFKA-567] - Replication Data Loss in Mirror Maker Bouncing testcase
[KAFKA-573] - System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1
[KAFKA-575] - Partition.makeFollower() reads broker info from ZK
[KAFKA-576] - SimpleConsumer throws UnsupportedOperationException: empty.head
[KAFKA-577] - extend DumpLogSegments to verify consistency btw data and index
[KAFKA-578] - Leader finder thread in ConsumerFetcherManager needs to handle exceptions
[KAFKA-579] - remove connection timeout in SyncProducer
[KAFKA-580] - system test testcase_0122 under replication fails due to large # of data loss
[KAFKA-584] - produce/fetch remote time metric not set correctly when num.acks = 1
[KAFKA-586] - system test configs are broken
[KAFKA-591] - Add test cases to test log size retention and more
[KAFKA-592] - Register metrics beans at kafka server startup
[KAFKA-596] - LogSegment.firstAppendTime not reset after truncate to
[KAFKA-604] - Add missing metrics in 0.8
[KAFKA-608] - getTopicMetadata does not respect producer config settings
[KAFKA-612] - move shutting down of fetcher thread out of critical path
[KAFKA-613] - MigrationTool should disable shallow iteration in the 0.7 consumer
[KAFKA-614] - DumpLogSegment offset verification is incorrect for compressed messages
[KAFKA-618] - Deadlock between leader-finder-thread and consumer-fetcher-thread during broker failure
[KAFKA-622] - Create mbeans per client
[KAFKA-625] - Improve MessageAndMetadata to expose the partition
[KAFKA-634] - ConsoleProducer compresses messages and ignores the --compress flag
[KAFKA-646] - Provide aggregate stats at the high level Producer and ZookeeperConsumerConnector level
[KAFKA-648] - Use uniform convention for naming properties keys
[KAFKA-664] - Kafka server threads die due to OOME during long running test
[KAFKA-668] - Controlled shutdown admin tool should not require controller JMX url/port to be supplied
[KAFKA-669] - Irrecoverable error on leader while rolling to a new segment
[KAFKA-673] - Broker recovery check logic is reversed
[KAFKA-680] - ApiUtils#writeShortString uses String length instead of byte length
[KAFKA-681] - Unclean shutdown testing - truncateAndStartWithNewOffset is not invoked when it is expected to
[KAFKA-684] - ConsoleProducer does not have the queue-size option
[KAFKA-690] - TopicMetadataRequest throws exception when no topics are specified
[KAFKA-691] - Fault tolerance broken with replication factor 1
[KAFKA-692] - ConsoleConsumer outputs diagnostic message to stdout instead of stderr
[KAFKA-693] - Consumer rebalance fails if no leader available for a partition and stops all fetchers
[KAFKA-695] - Broker shuts down due to attempt to read a closed index file
[KAFKA-698] - broker may expose uncommitted data to a consumer
[KAFKA-701] - ConsoleProducer does not exit correctly and fix some config properties following KAFKA-648
[KAFKA-702] - Deadlock between request handler/processor threads
[KAFKA-708] - ISR becomes empty while marking a partition offline
[KAFKA-710] - Some arguments are always set to default in ProducerPerformance
[KAFKA-713] - Update Hadoop producer for Kafka 0.8 changes
[KAFKA-726] - Add ReplicaFetcherThread name to mbean names
[KAFKA-732] - MirrorMaker with shallow.iterator.enable=true produces unreadble messages
[KAFKA-738] - correlationId is not set in FetchRequest in AbstractFetcherThread
[KAFKA-743] - PreferredReplicaLeaderElectionCommand has command line error
[KAFKA-748] - Append to index fails due to invalid offset
[KAFKA-750] - inconsistent index offset during broker startup
[KAFKA-751] - Fix windows build script - kafka-run-class.bat
[KAFKA-753] - Kafka broker shuts down while loading segments
[KAFKA-755] - standardizing json values stored in ZK
[KAFKA-756] - Processor thread blocks due to infinite loop during fetch response send
[KAFKA-757] - System Test Hard Failure cases : "Fatal error during KafkaServerStable startup" when hard-failed broker is re-started
[KAFKA-758] - startHighWaterMarksCheckPointThread is never called
[KAFKA-767] - Message Size check should be done after assigning the offsets
[KAFKA-768] - broker should exit if hitting exceptions durin startup
[KAFKA-769] - On startup, a brokers highwatermark for every topic partition gets reset to zero
[KAFKA-770] - KafkaConfig properties should be verified in the constructor
[KAFKA-772] - System Test Transient Failure on testcase_0122
[KAFKA-776] - Changing ZK format breaks some tools
[KAFKA-779] - Standardize Zk data structures for Re-assign partitions and Preferred replication election
[KAFKA-785] - Resolve bugs in PreferredReplicaLeaderElection admin tool
[KAFKA-786] - Use "withRequiredArg" while parsing jopt options in all tools
[KAFKA-793] - Include controllerId in all requests sent by controller
[KAFKA-798] - Use biased histograms instead of uniform histograms in KafkaMetricsGroup
[KAFKA-800] - inSyncReplica in Partition needs some tweaks
[KAFKA-801] - Fix MessagesInPerSec mbean to count uncompressed message rate
[KAFKA-804] - Incorrect index in the log of a follower
[KAFKA-807] - LineMessageReader doesn't correctly parse the key separator
[KAFKA-809] - Dependency on zkclient 0.1 (redundant) prevents building in IntelliJ
[KAFKA-811] - Fix clientId in migration tool
[KAFKA-813] - Minor cleanup in Controller
[KAFKA-825] - KafkaController.isActive() needs to be synchronized
[KAFKA-826] - Make Kafka 0.8 depend on metrics 2.2.0 instead of 3.x
[KAFKA-827] - improve list topic output format
[KAFKA-828] - Preferred Replica Election does not delete the admin path on controller failover
[KAFKA-830] - partition replica assignment map in the controller should be a Set
[KAFKA-832] - 0.8 Consumer prevents rebalance if consumer thread is blocked or slow
[KAFKA-840] - Controller tries to perform preferred replica election on failover before state machines have started up
[KAFKA-846] - AbstractFetcherThread should do shallow instead of deep iteration
[KAFKA-856] - Correlation id for OffsetFetch request (#2) always responds with 0
[KAFKA-861] - IndexOutOfBoundsException while fetching data from leader
[KAFKA-866] - Recover segment does shallow iteration to fix index causing inconsistencies
[KAFKA-871] - Rename ZkConfig properties
[KAFKA-872] - Socket server does not set send/recv buffer sizes
[KAFKA-880] - NoLeaderPartitionSet should be cleared before leader finder thread is started up
[KAFKA-884] - Get java.lang.NoSuchMethodError: com.yammer.metrics.core.TimerContext.stop()J when stopping kafka brokers
[KAFKA-895] - Protocol documentation is not clear about requiredAcks = 0.
[KAFKA-899] - LeaderNotAvailableException the first time a new message for a partition is processed.
[KAFKA-900] - ClosedByInterruptException when high-level consumer shutdown normally
[KAFKA-903] - [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed
[KAFKA-905] - Logs can have same offsets causing recovery failure
[KAFKA-907] - controller needs to close socket channel to brokers on exception
[KAFKA-914] - Deadlock between initial rebalance and watcher-triggered rebalances
[KAFKA-916] - Deadlock between fetcher shutdown and handling partitions with error
[KAFKA-919] - Disabling of auto commit is ignored during consumer group rebalancing
[KAFKA-920] - zkclient jar 0.2.0 is not compatible with 0.1.0
[KAFKA-921] - Expose max lag mbean for consumers and replica fetchers
[KAFKA-927] - Integrate controlled shutdown into kafka shutdown hook
[KAFKA-937] - ConsumerFetcherThread can deadlock
[KAFKA-938] - High CPU usage when more or less idle
[KAFKA-940] - Scala match error in javaapi.Implicits
[KAFKA-941] - Add Apache 2.0 license to missing code source files
[KAFKA-942] - the version of the jar should be 0.8.0-beta1 not 0.8.0-SNAPSHOT
[KAFKA-944] - the pom output from publish and publish-local is not accurate
Improvement
[KAFKA-77] - Implement "group commit" for kafka logs
[KAFKA-100] - ProducerShell should use high-level producer instead of SyncProducer
[KAFKA-133] - Publish kafka jar to a public maven repository
[KAFKA-134] - Upgrade Kafka to sbt 0.11.3
[KAFKA-139] - cross-compile multiple Scala versions and upgrade to SBT 0.12.1
[KAFKA-155] - Support graceful Decommissioning of Broker
[KAFKA-165] - Add helper script for zkCli.sh
[KAFKA-181] - Log errors for unrecognized config options
[KAFKA-187] - Add Snappy Compression as a Codec and refactor CompressionUtil and option on startup to select what the default codec
[KAFKA-193] - use by name parameter helper for logging and trait to include lazy logging and refactor code to use the new LogHelper
[KAFKA-246] - log configuration values used
[KAFKA-253] - Refactor the async producer to have only one queue instead of one queue per broker in a Kafka cluster
[KAFKA-258] - Remove broker.id from the broker.list config in the Producer
[KAFKA-267] - Enhance ProducerPerformance to generate unique random Long value for payload
[KAFKA-271] - Modify new FetchResponse object to remove the initial offset field
[KAFKA-281] - support multiple root log directories
[KAFKA-285] - Increase maximum value of log.retention.size
[KAFKA-296] - Update Go Client to new version of Go
[KAFKA-311] - Allow KafkaLog4jAppender to take in a configurable producer.type
[KAFKA-314] - Go Client Multi-produce
[KAFKA-323] - Add the ability to use the async producer in the Log4j appender
[KAFKA-324] - enforce broker.id to be a non-negative integer
[KAFKA-325] - revisit broker config in 0.8
[KAFKA-349] - Create individual "Response" types for each kind of request and wrap them with "BoundedByteBufferSend", remove "xxResponseSend" types for all requests except "FetchRequest"
[KAFKA-365] - change copyright in NOTICE to 2012
[KAFKA-366] - add jmx beans in broker to track # bytes in consumer
[KAFKA-368] - use the pig core jar from maven instead of distributing it
[KAFKA-393] - Add constructor for message which takes both byte array offset and length
[KAFKA-408] - ProducerPerformance does not work with all producer config options
[KAFKA-437] - Unused var statement in ZookeeperConsumerConnectorTest
[KAFKA-439] - @returns was used in scala doc when it should have been @return
[KAFKA-505] - Remove errorcode from TopicMetaDataResponse
[KAFKA-548] - remove partition from ProducerRequestPartitionData and FetchResponsePartitionData
[KAFKA-581] - provides windows batch script for starting Kafka/Zookeeper
[KAFKA-632] - ProducerRequest should take ByteBufferMessageSet instead of MessageSet
[KAFKA-638] - remove ProducerShell
[KAFKA-667] - Rename .highwatermark file
[KAFKA-675] - Only bind to the interface declared in the 'hostname' config property
[KAFKA-699] - Disallow clients to set replicaId in FetchRequest
[KAFKA-733] - Fat jar option for build, or override for ivy cache location
[KAFKA-762] - Improve second replica assignment
[KAFKA-763] - Add an option to replica from the largest offset during unclean leader election
[KAFKA-812] - Support deep iteration in DumpLogSegments tool
[KAFKA-850] - add an option to show under replicated partitions in list topic command
[KAFKA-931] - make zookeeper.connect a required property
New Feature
[KAFKA-50] - kafka intra-cluster replication support
[KAFKA-188] - Support multiple data directories
[KAFKA-202] - Make the request processing in kafka asynchonous
[KAFKA-203] - Improve Kafka internal metrics
[KAFKA-235] - Add a 'log.file.age' configuration parameter to force rotation of log files after they've reached a certain age
[KAFKA-429] - Expose JMX operation to set logger level dynamically
[KAFKA-475] - Time based log segment rollout
[KAFKA-545] - Add a Performance Suite for the Log subsystem
[KAFKA-546] - Fix commit() in zk consumer for compressed messages
Task
[KAFKA-93] - Change code header to follow standard ASF source header
[KAFKA-317] - Add support for new wire protocol to Go client
[KAFKA-341] - Create a new single host system test to validate all replicas on 0.8 branch
[KAFKA-348] - rebase 0.8 branch from trunk
[KAFKA-380] - Enhance single_host_multi_brokers test with failure to trigger leader re-election in replication
[KAFKA-440] - Create a regression test framework for distributed environment testing
[KAFKA-526] - System Test should remove the top level data log directory
[KAFKA-594] - Update System Test due to new argument "--sync" in ProducerPerformance
[KAFKA-605] - System Test - Log Retention Cases should wait longer before getting the common starting offset in replica log segments
[KAFKA-688] - System Test - Update all testcase_xxxx_properties.json for properties keys uniform naming convention
[KAFKA-737] - System Test : Disable shallow.iterator in Mirror Maker test cases to make compression work correctly
[KAFKA-791] - Fix validation bugs in System Test
[KAFKA-792] - Update multiple attributes in testcase_xxxx_properties.json
[KAFKA-819] - System Test : Add validation of log segment index to offset
KAFKA-944 fixing up for release pom output from publish and publish-local is not accurate
1 file changed
tree: 2d06d9c90438294ef4edf3827d19c8ef46d606c2
  1. bin/
  2. config/
  3. contrib/
  4. core/
  5. examples/
  6. lib/
  7. perf/
  8. project/
  9. system_test/
  10. .gitignore
  11. .rat-excludes
  12. DISCLAIMER
  13. LICENSE
  14. NOTICE
  15. README.md
  16. sbt
  17. sbt.bat
README.md

Kafka is a distributed publish/subscribe messaging system

It is designed to support the following

  • Persistent messaging with O(1) disk structures that provide constant time performance even with many TB of stored messages.
  • High-throughput: even with very modest hardware Kafka can support hundreds of thousands of messages per second.
  • Explicit support for partitioning messages over Kafka servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics.
  • Support for parallel data load into Hadoop.

Kafka is aimed at providing a publish-subscribe solution that can handle all activity stream data and processing on a consumer-scale web site. This kind of activity (page views, searches, and other user actions) are a key ingredient in many of the social feature on the modern web. This data is typically handled by “logging” and ad hoc log aggregation solutions due to the throughput requirements. This kind of ad hoc solution is a viable solution to providing logging data to an offline analysis system like Hadoop, but is very limiting for building real-time processing. Kafka aims to unify offline and online processing by providing a mechanism for parallel load into Hadoop as well as the ability to partition real-time consumption over a cluster of machines.

See our web site for more details on the project.

Contribution

Kafka is a new project, and we are interested in building the community; we would welcome any thoughts or patches. You can reach us on the Apache mailing lists.

The Kafka code is available from:

To contribute you can follow:

To build for all supported versions of Scala:

  1. ./sbt +package

To build for a particular version of Scala (either 2.8.0, 2.8.2, 2.9.1 or 2.9.2):

  1. ./sbt “++2.8.0 package” or ./sbt “++2.8.2 package” or ./sbt “++2.9.1 package” or ./sbt “++2.9.2 package”

Here are some useful sbt commands, to be executed at the sbt command prompt (./sbt). Prefixing with "++ " runs the command for a specific Scala version, prefixing with “+” will perform the action for all versions of Scala, and no prefix runs the command for the default (2.8.0) version of Scala. -

tasks : Lists all the sbt commands and their descriptions

clean : Deletes all generated files (the target directory).

compile : Compile all the sub projects, but not create the jars

test : Run all unit tests in all sub projects

release-zip : Create all the jars, run unit tests and create a deployable release zip

package: Creates jars for src, test, docs etc

projects : List all the sub projects

project sub_project_name : Switch to a particular sub-project. For example, to switch to the core kafka code, use “project core-kafka”

Following commands can be run only on a particular sub project -

test-only package.test.TestName : Runs only the specified test in the current sub project

run : Provides options to run any of the classes that have a main method. For example, you can switch to project java-examples, and run the examples there by executing “project java-examples” followed by “run”

For more details please see the SBT documentation