BookKeeper is a replicated log storage system that Pulsar uses for durable storage of all messages.
The port on which the bookie server listens.
Default: 3181
Whether the bookie is allowed to use a loopback interface as its primary interface (that is the interface used to establish its identity). By default, loopback interfaces are not allowed to work as the primary interface. Using a loopback interface as the primary interface usually indicates a configuration error. For example, it’s fairly common in some VPS setups to not configure a hostname or to have the hostname resolve to 127.0.0.1
. If this is the case, then all bookies in the cluster will establish their identities as 127.0.0.1:3181
and only one will be able to join the cluster. For VPSs configured like this, you should explicitly set the listening interface.
Default: false
The network interface on which the bookie listens. By default, the bookie listens on all interfaces.
Default: eth0
Configure a specific hostname or IP address that the bookie should use to advertise itself to clients. By default, the bookie advertises either its own IP address or hostname according to the listeningInterface
and useHostNameAsBookieID
settings.
Default: N/A
Configure the bookie to enable/disable multiple ledger/index/journal directories in the same filesystem disk partition.
Default: false
The minimum safe usable size available in index directory for bookie to create index files while replaying journal at the time of bookie starts in Readonly Mode (in bytes).
Default: 1073741824
The directory where BookKeeper outputs its write-ahead log (WAL).
Default: data/bookkeeper/journal
Directories that BookKeeper outputs its write ahead log. Multiple directories are available, being separated by ,
. For example: journalDirectories=/tmp/bk-journal1,/tmp/bk-journal2
. If journalDirectories
is set, the bookies skip journalDirectory
and use this setting directory.
Default: /tmp/bk-journal
The directory where BookKeeper outputs ledger snapshots. This could define multiple directories to store snapshots separated by ,
, for example ledgerDirectories=/tmp/bk1-data,/tmp/bk2-data
. Ideally, ledger dirs and the journal dir are each in a different device, which reduces the contention between random I/O and sequential write. It is possible to run with a single disk, but performance will be significantly lower.
Default: data/bookkeeper/ledgers
The type of ledger manager used to manage how ledgers are stored, managed, and garbage collected. See BookKeeper Internals for more info.
Default: hierarchical
The root ZooKeeper path used to store ledger metadata. This parameter is used by the ZooKeeper-based ledger manager as a root znode to store all ledgers.
Default: /ledgers
Ledger storage implementation class
Default: org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage
Enable or disable entry logger preallocation
Default: true
Max file size of the entry logger, in bytes. A new entry log file will be created when the old one reaches the file size limitation.
Default: 1073741824
Threshold of minor compaction. Entry log files whose remaining size percentage reaches below this threshold will be compacted in a minor compaction. If set to less than zero, the minor compaction is disabled.
Default: 0.2
Time interval to run minor compaction, in seconds. If set to less than zero, the minor compaction is disabled. Note: should be greater than gcWaitTime.
Default: 3600
The threshold of major compaction. Entry log files whose remaining size percentage reaches below this threshold will be compacted in a major compaction. Those entry log files whose remaining size percentage is still higher than the threshold will never be compacted. If set to less than zero, the major compaction is disabled.
Default: 0.5
The time interval to run major compaction, in seconds. If set to less than zero, the major compaction is disabled. Note: should be greater than gcWaitTime.
Default: 86400
If readOnlyModeEnabled=true
, then on all full ledger disks, bookie will be converted to read-only mode and serve only read requests. Otherwise the bookie will be shutdown.
Default: true
Whether the bookie is force started in read only mode.
Default: false
Persist the bookie status locally on the disks. So the bookies can keep their status upon restarts.
Default: false
Sets the maximum number of entries that can be compacted without flushing. When compacting, the entries are written to the entrylog and the new offsets are cached in memory. Once the entrylog is flushed the index is updated with the new offsets. This parameter controls the number of entries added to the entrylog before a flush is forced. A higher value for this parameter means more memory will be used for offsets. Each offset consists of 3 longs. This parameter should not be modified unless you’re fully aware of the consequences.
Default: 100000
The rate at which compaction will read entries, in adds per second.
Default: 1000
Throttle compaction by bytes or by entries.
Default: false
The rate at which compaction will read entries, in adds per second.
Default: 1000
Set the rate at which compaction reads entries. The unit is bytes added per second.
Default: 1000000
Max file size of journal file, in megabytes. A new journal file will be created when the old one reaches the file size limitation.
Default: 2048
The max number of old journal files to keep. Keeping a number of old journal files would help data recovery in special cases.
Default: 5
How space to pre-allocate at a time in the journal.
Default: 16
The of the write buffers used for the journal.
Default: 64
Whether pages should be removed from the page cache after force write.
Default: true
Whether to group journal force writes, which optimizes group commit for higher throughput.
Default: true
The maximum latency to impose on a journal write to achieve grouping.
Default: 1
All the journal writes and commits should be aligned to given size
Default: 4096
Maximum writes to buffer to achieve grouping
Default: 524288
If we should flush the journal when journal queue is empty
Default: false
The number of threads that should handle journal callbacks
Default: 8
The grace period, in milliseconds, that the replication worker waits before fencing and replicating a ledger fragment that's still being written to upon bookie failure.
Default: 30000
The number of max entries to keep in fragment for re-replication
Default: 100
Whether the bookie itself can start auto-recovery service.
Default: true
How long to wait, in seconds, before starting auto recovery of a lost bookie.
Default: 0
How long the interval to trigger next garbage collection, in milliseconds. Since garbage collection is running in background, too frequent gc will heart performance. It is better to give a higher number of gc interval if there is enough disk capacity.
Default: 900000
How long the interval to trigger next garbage collection of overreplicated ledgers, in milliseconds. This should not be run very frequently since we read the metadata for all the ledgers on the bookie from zk.
Default: 86400000
How long the interval to flush ledger index pages to disk, in milliseconds. Flushing index files will introduce much random disk I/O. If separating journal dir and ledger dirs each on different devices, flushing would not affect performance. But if putting journal dir and ledger dirs on same device, performance degrade significantly on too frequent flushing. You can consider increment flush interval to get better performance, but you need to pay more time on bookie server restart after failure.
Default: 60000
Interval to watch whether bookie is dead or not, in milliseconds
Default: 1000
Allow the bookie storage to expand. Newly added ledger and index dirs must be empty.
Default: false
A list of one of more servers on which zookeeper is running. The server list can be comma separated values, for example: zkServers=zk1:2181,zk2:2181,zk3:2181.
Default: localhost:2181
ZooKeeper client session timeout in milliseconds Bookie server will exit if it received SESSION_EXPIRED because it was partitioned off from ZooKeeper for more than the session timeout JVM garbage collection, disk I/O will cause SESSION_EXPIRED. Increment this value could help avoiding this issue
Default: 30000
The start time that the Zookeeper client backoff retries in milliseconds.
Default: 1000
The maximum time that the Zookeeper client backoff retries in milliseconds.
Default: 10000
Set ACLs on every node written on ZooKeeper, allowing users to read and write BookKeeper metadata stored on ZooKeeper. In order to make ACLs work you need to setup ZooKeeper JAAS authentication. All the bookies and Client need to share the same user, and this is usually done using Kerberos authentication. See ZooKeeper documentation.
Default: false
The flag enables/disables starting the admin http server.
Default: false
The HTTP server port to listen on. By default, the value is 8080
. If you want to keep it consistent with the Prometheus stats provider, you can set it to 8000
.
Default: 8080
The http server class.
Default: org.apache.bookkeeper.http.vertx.VertxHttpServer
This settings is used to enabled/disabled Nagle’s algorithm, which is a means of improving the efficiency of TCP/IP networks by reducing the number of packets that need to be sent over the network. If you are sending many small messages, such that more than one can fit in a single IP packet, setting server.tcpnodelay to false to enable Nagle algorithm can provide better performance.
Default: true
This setting is used to send keep-alive messages on connection-oriented sockets.
Default: true
The socket linger timeout on close. When enabled, a close or shutdown will not return until all queued messages for the socket have been successfully sent or the linger timeout has been reached. Otherwise, the call returns immediately and the closing is done in the background.
Default: 0
The maximum buf size of the received ByteBuf allocator.
Default: 1048576
The maximum netty frame size in bytes. Any message received larger than this will be rejected.
Default: 5253120
Max number of ledger index files could be opened in bookie server If number of ledger index files reaches this limitation, bookie server started to swap some ledgers from memory to disk. Too frequent swap will affect performance. You can tune this number to gain performance according your requirements.
Default: 0
Size of a index page in ledger cache, in bytes A larger index page can improve performance writing page to disk, which is efficient when you have small number of ledgers and these ledgers have similar number of entries. If you have large number of ledgers and each ledger has fewer entries, smaller index page would improve memory usage.
Default: 8192
How many index pages provided in ledger cache If number of index pages reaches this limitation, bookie server starts to swap some ledgers from memory to disk. You can increment this value when you found swap became more frequent. But make sure pageLimit*pageSize should not more than JVM max memory limitation, otherwise you would got OutOfMemoryException. In general, incrementing pageLimit, using smaller index page would gain better performance in lager number of ledgers with fewer entries case If pageLimit is -1, bookie server will use 1/3 of JVM memory to compute the limitation of number of index pages.
Default: 0
If all ledger directories configured are full, then support only read requests for clients. If “readOnlyModeEnabled=true” then on all ledger disks full, bookie will be converted to read-only mode and serve only read requests. Otherwise the bookie will be shutdown. By default this will be disabled.
Default: true
For each ledger dir, maximum disk space which can be used. Default is 0.95f. i.e. 95% of disk can be used at most after which nothing will be written to that partition. If all ledger dir partitions are full, then bookie will turn to readonly mode if ‘readOnlyModeEnabled=true’ is set, else it will shutdown. Valid values should be in between 0 and 1 (exclusive).
Default: 0.95
Disk check interval in milli seconds, interval to check the ledger dirs usage.
Default: 10000
Interval at which the auditor will do a check of all ledgers in the cluster. By default this runs once a week. The interval is set in seconds. To disable the periodic check completely, set this to 0. Note that periodic checking will put extra load on the cluster, so it should not be run more frequently than once a day.
Default: 604800
Whether sorted-ledger storage is enabled.
Default: true
The interval between auditor bookie checks. The auditor bookie check, checks ledger metadata to see which bookies should contain entries for each ledger. If a bookie which should contain entries is unavailable, thea the ledger containing that entry is marked for recovery. Setting this to 0 disabled the periodic check. Bookie checks will still run when a bookie fails. The interval is specified in seconds.
Default: 86400
The number of threads that should handle write requests. if zero, the writes would be handled by netty threads directly.
Default: 0
The number of threads that should handle read requests. if zero, the reads would be handled by netty threads directly.
Default: 8
The umber of threads that should be used for high priority requests (i.e. recovery reads and adds, and fencing).
Default: 8
If read workers threads are enabled, limit the number of pending requests, to avoid the executor queue to grow indefinitely.
Default: 2500
The limited number of pending requests, which is used to avoid the executor queue to grow indefinitely when add workers threads are enabled.
Default: 10000
Whether force compaction is allowed when the disk is full or almost full. Forcing GC could get some space back, but could also fill up the disk space more quickly. This is because new log files are created before GC, while old garbage log files are deleted after GC.
Default: false
True if the bookie should double check readMetadata
prior to GC.
Default: false
Entry log flush interval in bytes. Flushing in smaller chunks but more frequently reduces spikes in disk I/O. Flushing too frequently may also affect performance negatively.
Default: 268435456
The number of bytes we should use as capacity for BufferedReadChannel.
Default: 4096
The number of bytes used as capacity for the write buffer
Default: 65536
Whether the bookie should use its hostname to register with the coordination service (e.g.: zookeeper service). When false, bookie will use its ip address for the registration.
Default: false
If you want to custom a bookie ID or use a dynamic network address for the bookie, you can set the bookieId
.
Bookie advertises itself using the bookieId
rather than the BookieSocketAddress
(hostname:port
or IP:port
). If you set the bookieId
, then the useHostNameAsBookieID
does not take effect.
The bookieId
is a non-empty string that can contain ASCII digits and letters ([a-zA-Z9-0]), colons, dashes, and dots.
For more information about bookieId
, see here.
Default: N/A
Whether the bookie is allowed to use an ephemeral port (port 0) as its server port. By default, an ephemeral port is not allowed. Using an ephemeral port as the service port usually indicates a configuration error. However, in unit tests, using an ephemeral port will address port conflict problems and allow running tests in parallel.
Default: false
Whether the bookie is allowed to listen for the BookKeeper clients executed on the local JVM.
Default: false
Whether the bookie is allowed to disable bind on network interfaces. This bookie will be available only to BookKeeper clients executed on the local JVM.
Default: false
The number of bytes that we should use as chunk allocation for org.apache.bookkeeper.bookie.SkipListArena
.
Default: 4194304
The maximum size that we should allocate from the skiplist arena. Allocations larger than this should be allocated directly by the VM to avoid fragmentation.
Default: 131072
The factory class name of the bookie authentication provider. If this is null, then there is no authentication.
Default: null
**Default**: org.apache.bookkeeper.stats.prometheus.PrometheusMetricsProvider **Default**:
**Default**: 8000 **Default**:
Size of Write Cache. Memory is allocated from JVM direct memory. Write cache is used to buffer entries before flushing into the entry log. For good performance, it should be big enough to hold a substantial amount of entries in the flush interval.
Default: 25% of direct memory
Size of Read cache. Memory is allocated from JVM direct memory. This read cache is pre-filled doing read-ahead whenever a cache miss happens. By default, it is allocated to 25% of the available direct memory.
Default: N/A
How many entries to pre-fill in cache after a read cache miss
Default: 1000
Size of RocksDB block-cache. For best performance, this cache should be big enough to hold a significant portion of the index database which can reach ~2GB in some cases. By default, it uses 10% of direct memory.
Default: N/A
Default: 64
Default: 64
Default: 65536
Default: 10
Default: -1
Default: 4
Default: 256