blob: 944d9570838fa1119e9d8142c876b3e8befb2305 [file] [log] [blame]
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
Design for Hybrid SQL/CLFS-Based Store in Qpid
==============================================
CLFS (Common Log File System) is a new facility in recent Windows versions.
CLFS is an ARIES-compliant log intended to support high performance and
transactional applications. CLFS is available in Windows Server 2003R2 and
higher, as well as Windows Vista and Windows 7.
There is currently an all-SQL store in Qpid. The new hybrid SQL-CLFS store
moves the message, messages-mapping to queues, and transaction aspects
of the SQL store into CLFS logs. Records of queues, exchanges, bindings,
and configurations will remain in SQL. The main goal of this change is
to yield higher performance on the time-critical messaging operations.
CLFS and, therefore, the new hybrid store, is not available on Windows XP
and Windows Server prior to 2003R2; these platforms will need to run the
all-SQL store.
Note for future consideration: it is possible to maintain all durable
objects in CLFS, which would remove the need for SQL completely. It would
require added log handling as well as the logic to ensure referential
integrity between exchanges and queues via bindings as SQL does today.
Also, the CLFS store counts on the SQL-stored queue records being correct
when recovering messages; if a message operation in the log refers to a queue
ID that's unknown, the CLFS store assumes the queue was deleted in the
previous broker session and the log wasn't updated. That sort of assumption
would need to be revisited if all content moves to a log.
CLFS Capabilities
-----------------
This section explains some of the key CLFS concepts that are important
in order to understand the designed use of CLFS for the store. It is
not a complete explanation and is not feature-complete. Please see the
CLFS documentation at MSDN for complete details
(http://msdn.microsoft.com/en-us/library/bb986747%28v=VS.85%29.aspx).
CLFS provides logs; each log can be dedicated or multiplexed. A multiplexed
log has multiple streams of independent log records; a dedicated log has
only one stream. Each log uses containers to hold the actual data; a log
requires a minimum of two containers, each of which must be at least 512KB.
Thus, the smallest log possible is 1MB. They can, of course, be larger, but
with 1 MB as minimum size for a log, they shouldn't be used willy-nilly.
The maximum number of streams per log is approximately 100.
As records are written to the log CLFS assigns Log Sequence Numbers (LSNs).
The first valid LSN in a log stream is called the Base, or Tail. CLFS
can automatically reclaim and reuse container space for the log as the
base LSN is moved when records are no longer needed. When a log is multiplexed,
a stream which doesn't move its tail can prevent CLFS from reclaiming space
and cause the log to grow indefinitely. Thus, mixing streams which don't
update (and, thus, move their tails) with streams that are very dynamic in
a single log will probably cause the log to continue to expand even though
much of the space will be unused.
CLFS provides three LSN types that are used to chain records together:
- Next: This is a forward sequence maintained by CLFS itself by the order
records are put into the stream.
- Undo-next, Undo-prev: These are backward-looking chains that are used
to link a new record to some previous record(s) in the same stream.
Also note that although log files are simply located in the file system,
easily locatable, streams within a log are not easily known or listable
outside of some application-specific recording of the stream names somewhere.
Log Usage
---------
There are two logs in use.
- Message: Each message will be represented by a chain of log records. All
messages will be intermixed in the same dedicated stream. Each portion of
a message content (sometimes they are written in multiple chunks) as well
as each operation involving a message (enqueue, dequeue, etc.) will be
in a log record chained to the others related to the same message.
- Transaction: Each transaction, local and distributed, will be represented
by a chain of log records. The record content will denote the transaction
as local or distributed.
Both transaction and message logs use the LSN of the first record for a
given object (message or transaction) as the persistence ID for that object.
The LSN is a CLFS-maintained, always-increasing value that is 64 bits long,
the same as a persistence ID.
Log records that relate to a transaction or message previously logged use the
log record undo-prev LSN to indicate which transaction/message the record
relates to.
Message Log Records
-------------------
Message log records will be one of the following types:
- Message-Start: the first (and possibly only) section of message content
- Message-Chunk: second and succeeding message content chunks
- Message-Delete: marks the end of the message's lifetime
- Message-Enqueue: records the message's placement on a queue
- Message-Dequeue: records the message's removal from a queue
The LSN of the Message-Start record is the persistence ID for the message.
The log record undo-prev LSN is used to link each subsequent record for that
message to the Message-Start record.
A message's sequence of log records is extended for each operation on that
message, until the message is deleted whereupon a Message-Delete record is
written. When the Message-Delete is written, the log's base LSN can be moved
up to the next earliest message if the deleted one opens up a set of
records at the tail of the log that are no longer needed. To help maintain
the order and know when the base can be moved, the store keeps message
information in a STL map whose key is the message ID (Message-Start LSN).
Thus, the first entry in the map is the earliest ID/LSN in use.
During recovery, messages still residing in the log can be ignored when the
record sequence for the message ends with Message-Delete. Similarly, there
may be log records for messages that are deleted; in this case the previous
LSN won't be one that's still within the log and, therefore, there won't have
been a Message Start record recovered and the record can be ignored.
Transaction Log Records
-----------------------
Transaction log records will be one of the following types:
- Dtx-Start: Start of a distributed transaction
- Tx-Start: Start of a local transaction
- End: End of the transaction
- Rollback: Marks that the transaction is rolled back
- Prepare: Marks the dtx as prepared
- Commit: Marks the transaction as committed
- Delete: Notes that the transaction is no longer valid
Transactions are also identified by the LSN of the start (Dtx-Start or
Tx-Start) record. Successive records associated with the same transaction
are linked backwards using the undo-prev LSN.
The association between messages and transactions is maintained in the
message log; if the message enqueue/dequeue operation is part of a transaction,
the operation includes a transaction ID. The transaction log maintains the
state of the transaction itself. Thus, each operation (enqueue, dequeue,
prepare, rollback, commit) is a single log record.
A few notes:
- The transactions need to be recovered and sorted out prior to recovering
the messages. The message recovery needs to know if a enqueue/dequeue
associated with a transaction can be discarded or should be acted on.
- Transaction IDs need to remain valid as long as any messages exist that
refer to them. This prevents the problem of trying to recover a message
with a transaction ID that doesn't exist - was it finalized? was it aborted?
Reference to a missing transaction ID can be ignored with assurance that
the message was deleted further along or the transaction would still be there.
- Transaction IDs needing to be valid requires that a refcount be kept on each
transaction at run time. As messages are deleted, the transaction set can
be notified that the message is gone. To enforce this, Message objects have
a boost::shared_ptr to each Transaction they're associated with. When the
Message is destroyed, refs to Transactions go down too. When Transaction is
destroyed, it's done so write its delete to the log.
In-Memory Objects
-----------------
The store holds the message and transaction relationships in memory. CLFS is
a backing store for that information so it can be reliably reconstructed in
the event of a failure. This is a change from the SQL-only store where all
of the information is maintained in SQL and none is kept in memory. The
CLFS-using store is designed for high-throughput operation where it is assumed
that messages will transit the broker (and, therefore, the store) quickly.
- Message list: this is a map of persistence ID (message LSN) to a list of
queues where the message is located and an indication that there is
(or isn't) a transaction involved and in which direction (enqueue/dequeue)
so a dequeued message doesn't get deleted while a transacted enqueue is
pending.
- Transaction list: also probably a map of id/LSN to a transaction object.
The transaction object needs to keep a list of messages/queues that are
impacted as well as the transaction state and Xid (for dtx).
- Right now log records are written as need with no preallocation or
reservation. It may be better to pre-reserve records in some cases, such
as a transaction prepare where the space for commit or rollback may be
reserved at the same time. This may be the only case where losing a
record may be an issue - needs some more thought.
Recovery
--------
During recovery, need to verify recovered messages' queues exist; if there's a
failure after a queue's deletion is final but before the messages are recorded
as dequeued (and possibly deleted) the remainder of those dequeues (and
possibly deleting the message) needs to be handled during recovery by not
restoring them for the broker, and also logging their deletion. Could also
skip the logging of deletion and let the normal tail-maintenance eventually
move up over the old message entries. Since the invalid messages won't be
kept in the message map, their IDs won't be taken into account when maintaining
the tail - the tail will move up over them as soon as enough messages come
and go.
Plugin Options
--------------
The command-line options added by the CLFS plugin are;
--connect The SQL connect string for the SQL parts; same as the
SQL plugin.
--catalog The SQL database (catalog) name; same as the SQL plugin.
--store-dir The directory to store the logs in. Defaults to the
broker --data-dir value. If --no-data-dir specified,
--store-dir must be.
--container-size The size of each container in the log, in bytes. The
minimum size is 512K (smaller sizes will be rounded up).
Additionally, the size will be rounded up to a multiple
of the sector size on the disk holding the log. Once
the log is created, each newly added container will
be the same size as the initial container(s). Default
is 1MB.
--initial-containers The number of containers to populate a new log with
if a new log is created. Ignored if the log exists.
Default is 2.
--max-write-buffers The maximum number of write buffers that the plugin can
use before CLFS automatically flushes the log to disk.
Lower values flush more often; higher values have
higher performance. Default is 10.
Maybe need an option to hold messages of a certain size in memory? I think
maybe the broker proper holds the message content, so the store need not.
Testing
-------
More tests will need to be written to stress the log container extension
capability and ensure that moving the base LSN works properly and the store
doesn't continually grow the log without bounds.
Note that running "qpid-perftest --durable yes" stresses the log extension
and tail maintenance. It doesn't get run as a normal regression test but should
be run when playing with the container/tail maintenance logic to ensure it's
not broken.