| # Scraper - Analyze qpid-dispatch log files |
| |
| Scraper provides two analysis modes: |
| |
| * Normal mode: combine logs and show details. |
| * Split mode: split a single log into per-connection data and show details. |
| |
| Details are written to stdout in html format. |
| |
| ## Apache License, Version 2.0 |
| |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use |
| this file except in compliance with the License. You may obtain a copy of the |
| License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software distributed |
| under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR |
| CONDITIONS OF ANY KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations under the License. |
| |
| |
| ## Normal Mode |
| |
| Scraper is a data scraping program. It reads qpid-dispatch router log files, |
| categorizes and sorts the data, and produces an HTML summary. In normal mode |
| Scraper can operate on a single log file or on many log files. |
| |
| Normal mode does not write any files to your file system. The result is printed on stdout. |
| |
| From each log file Scraper extracts: |
| |
| * Router version |
| * Router container name |
| * Router restart times. A single log file may contain data from several router |
| reboot instances. |
| * Router link state calculation reports |
| * Interrouter and client connections |
| * AMQP facts |
| * Connection peers |
| * Link pair establishment |
| * Transfer traffic |
| * Message disposition |
| * Flow and credit propagation |
| |
| Scraper sorts these facts with microsecond precision using the log timestamps. |
| |
| Then Scraper merges the data from any number (as long as that number is less than 27!) |
| of independent log files into a single view. |
| |
| Next Scraper performs some higher-level analysis. |
| |
| * Routers are identified by letter rather than by the container name: 'A', 'B', and |
| so on. Log data in a file is grouped into instances and is identified by a number |
| for that router instance: 'A0', 'A1', and so on. |
| * Per router each AMQP data log entry is sorted into per-connection data lists. |
| * Connection data lists are searched to discover router-to-router and router-to-client |
| connection pairs. |
| * Per connection data are subdivided into per-session and per-link lists, sorting |
| the AMQP data into per-link-only views. This information is shown in the _Connection |
| Details_ section. Clicking on the lozenge will expand a connection into a view that |
| shows all the sessions. Clicking on a session lozenge will expand into a view that |
| shows all the links. |
| |
| Each link shows a lozenge that will toggle the link data visibility and a hyperlink |
| that will go to the link data. When the table of links is small and visible on one |
| screen then it's easy to see what's going on. But when the table of links is large, |
| say 2000 links, then the data for each link is 2000 lines later in the web page. |
| When you click the lozenge you don't see the data. For larger data sets the usage |
| rule is to first click the lozenge to expose the data, and then click the hyperlink |
| go make the data visible. |
| |
| * For each link the first AMQP Attach is examined to discover the link's source or |
| target address. All the addresses named in all the links are aggregated into a table. |
| |
| To view the link traffic first click the lozenge in the Link column to make the data |
| visible and then click the hyperlinked link name to see the data. |
| |
| * Bulk AMQP data may be shown or hidden on arbitrary per-connection selections. |
| Note: This feature is slow when a few tens of thousands of lines are being shown or |
| hidden. |
| * Noteworthy AMQP frames are identified. By hand these are hard to find. |
| * AMQP errors |
| * Presettled transfers |
| * Transfers with 'more' bit set |
| * Resumed transfers |
| * Aborted transfers |
| * Flow with 'drain' set |
| * Probable unsettled transfers. |
| * Possible unsettled transfers. When log files are truncated and not all log lines |
| for this connection, session, and link are present then settlement calculations |
| are compromised. Transfers may appear to be unsettled but the settlement could |
| not be correlated properly with the transfer. |
| * Transfer messages are sorted by signature. Then a table is made showing where |
| each message leaves or arrives over a connection. The signature is limited to the |
| body of the message as shown in the Qpid Proton log text. |
| * Settlement state for each unsettled transfer is identified, displayed, and |
| shown with delta and elapsed time values. See example in the Advanced section. |
| * Link name propagation for each named link is shown in a table. |
| * Router, peer, and link names can get really long. Nicknames for each are used |
| with popups showing the full name. |
| * Transfer data is identified with nicknames but without the popups. The popups |
| were so big that Firefox refused to show them; so forget it and they weren't useful anyway. |
| * Router link state cost calculations are merged with router restart records to |
| create a comprehensive link state cost view. Routers may publish cost reports that |
| do not include all other routers. In this case the other routers are identified |
| visually to indicate that they are unreachable. |
| |
| ## Split Mode |
| |
| Split mode was developed to help analyze log files that are generally too big to be |
| processed by Scraper normal mode. Split mode works equally well on both large and small |
| log files. |
| |
| Split mode rewrites all of the log file's AMQP log data into many, possibly tens of |
| thousands, smaller files. Data is confined to a subdirectory named after the original |
| log file. Scraper will not overwrite existing output directories. |
| |
| | Log file | Scraper output directory| |
| |------------|-------------------------| |
| | EB2.log | EB2.log.splits | |
| |
| Split mode directs Scraper to accept a single log file and |
| break it into per-connection data files. The connections are counted and the volume of |
| data handled by each connection is noted. Then each connection's data is written |
| to a subfolder whose name indicates the amount of data the connection handled. |
| |
| | Log lines | Directory| |
| |-------------|----------| |
| | 250 | 10e3 | |
| | 45,000 | 10e5 | |
| | 1,234,567 | 10e7 | |
| |
| |
| Split mode also analyzes: |
| |
| * Each router reboot starts a new router _instance_. Split mode observes the reboots |
| and divides the connection data. Connection names inclued the router instance ID. |
| |
| A\<instance\>_\<connectionId\> |
| |
| * Connection peers are identified. Scraper currently recognizes these peers: |
| |
| * Inter-router |
| * Apache activemq-artemis broker |
| * Apache qpidd broker |
| * Apache Qpid-JMS client |
| |
| * Scraper searches all log lines for AMQP errors and produces a table identifying them. |
| * Connections are listed in order based on |
| * Total transfer count |
| * Total log line count |
| * Connections with zero transfers |
| * Per-connection files are linked from the connection lists for one-click viewing |
| * AMQP Addresses from every every AMQP Attach are indexed. A table for each address |
| shows when the address was referenced and some connection details. |
| |
| ## AMQP performative legend |
| |
| Scraper decorates performative display lines with important AMQP values. |
| |
| | Performative | Decorations | |
| |--------------|-----------------------------| |
| | open | [0] | |
| | close | [0] | |
| | begin | [channel, remoteChannel] | |
| | end | [channel] | |
| | attach | [channel, handle] role linkName (source: src, target: tgt) | |
| | detach | [channel, handle] | |
| | flow | [channel, handle] (deliveryCount, linkCredit) | |
| | transfer | [channel, handle] (deliveryId) .. [flags] length (settement state) | |
| | disposition | [channel] (role firstId-lastId settleFlags settleState) | |
| |
| |
| ## Sequence generator |
| |
| The sequence generator (switches --sequence or -sq) generates some extra output that is |
| useful for making sequence diagrams from the raw data. _This is an experimental feature._ |
| |
| When a sequence switch is set then the AMQP performatives are simplified and dumped |
| into the html data stream near the end. |
| |
| A user then identifies what parts of the log data he needs and then edits the lines |
| in question from the html data stream file. Careful attention to the start and |
| end timestamps will help a user find the data more easily. Save the selected data into an |
| intermediate file. |
| |
| Finally the intermediate file is processed by new file _tools/scraper/seq-diag-gen.py_ |
| to produce the sequence diagram source file. |
| |
| Items to do to make this a reliable feature: |
| |
| * In most cases the the non-instantaneous messages do not exactly line up on a precisely |
| between the sender's and receiver's time lines. |
| * The vertical time scale is wildly exaggerated. There is no scale: equal intervals in |
| vertical space could be microseconds or days. |
| * Messages from non-router actors are shown as horizontal lines. This tool has no access |
| to those actors' trace logs or timestamps. |
| * You may want to manually specify actors and participants to get the vertical time lines |
| in an acceptable order. |
| |
| ## Quick Start |
| |
| * Enable router logging |
| |
| The routers need to generate proper logging for Scraper. |
| Expose the information in router logs that Scraper requires by |
| enabling specific log levels for some of the router logging modules. |
| |
| | Log module and level | Information exposed in logs | |
| |----------------|---------------------------| |
| | ROUTER info | Router version | |
| | SERVER info | Router restart discovery | |
| | SERVER trace | AMQP control and data - versions 1.10 and earlier | |
| | PROTOCOL trace | AMQP control and data - versions 1.11 and later | |
| | ROUTER_LS info | Router link state reports | |
| |
| * Enable version-specific router logging |
| |
| | Version | SERVER trace | PROTOCOL trace | |
| |------------------|-----------------------------|------------------------| |
| | 1.10 and earlier | Required | Error - do not specify | |
| | 1.11 and later | Do not specify - clutters log file with info that Scraper does not use | Required | |
| |
| * Run your tests to populate log files used as Scraper input. |
| |
| * Run Scraper to generate web content |
| |
| tools/scraper/scraper.py -f somefile.log > somefile.html |
| |
| tools/scraper/scraper.py -f *.log > somefile.html |
| |
| * Profit |
| |
| firefox somefile.html |
| |
| ## Scraper command line |
| |
| usage: scraper.py [-h] [--skip-all-data] [--skip-detail] [--skip-msg-progress] |
| [--split] [--time-start TIME_START] [--time-end TIME_END] |
| [--files FILES [FILES ...]] |
| |
| optional arguments: |
| -h, --help show this help message and exit |
| --skip-all-data, -sa Max load shedding: do not store/index transfer, |
| disposition, flow, or EMPTY_FRAME data |
| --skip-detail, -sd Load shedding: do not produce Connection Details |
| tables |
| --skip-msg-progress, -sm |
| Load shedding: do not produce Message Progress tables |
| --split, -sp A single file is split into per-connection data. |
| --time-start TIME_START, -ts TIME_START |
| Ignore log records earlier than this. Format: |
| "2018-08-13 13:15:00.123456" |
| --time-end TIME_END, -te TIME_END |
| Ignore log records later than this. Format: |
| "2018-08-13 13:15:15.123456" |
| --files FILES [FILES ...], -f FILES [FILES ...] |
| |
| * Split mode works with a single file and ignores all other switches |
| * Normal mode (no --split switch) accepts other switches and multiple files |
| |
| ### Switch --skip-all-data |
| tools/scraper/scraper.py --skip-all-data -f FILE [FILE ...] |
| |
| With _--skip-all-data_ AMQP transfer, disposition, and flow frames in the log files are |
| discarded. The resulting web page still includes lots of useful information with |
| connection info, link name propagation, and link state analysis. |
| |
| ### Switch --skip-detail |
| tools/scraper/scraper.py --skip-detail -f FILE [FILE ...] |
| |
| With _--skip-detail_ the display of per-connection, per-session, per-link tables is skipped. |
| |
| ### Switch --skip-msg-progress |
| tools/scraper/scraper.py --skip-msg-progress -f FILE [FILE ...] |
| |
| With _--skip-msg-progress_ the display of transfer analysis tables is skipped. |
| ### Advanced |
| |
| * Merging multiple qpid-dispatch log files |
| |
| Scraper accepts multiple log files names in the command line and |
| merges the log data according to the router log timestamps. |
| |
| tools/scraper/scraper.py --files A.log B.log C.log > abc.html |
| |
| Note that the qpid-dispatch host system clocks for merged log files |
| must be synchronized to within a few microseconds in order for the |
| result to be useful. This is easiest to achieve when the routers are |
| run on the same CPU core on a single system. Running Fedora 27 and 28 |
| on two hosts in a router network where the routers run _ntp_ to the same |
| time provider produces perfectly acceptable results. |
| |
| Scraper does a decent job merging log files created within a |
| qpid-dispatch self test. |
| |
| * Wow, that's a lot of data |
| |
| Indeed it is and good luck figuring it out. Sometimes, though, it's too much. |
| The AMQP transfer data analysis is the worst offender in terms of CPU time, |
| run-time memory usage, and monstrous html output files. |
| Scraper provides command line switches to |
| turn off sections of the data analysis. |
| |
| * How to read the transfer analysis tables. Here's an instance: |
| |
| |
| |Src |Time |Rtr|ConnId|Dir|ConnId|Peer |T delta |T elapsed |Settlement |S elapsed |
| |-------|---------------|---|------|---|------|------|---------|----------|-----------------------------|--------- |
| |A0_2035|09:50:52.027975|A0 |A0_11 |<- | |peer_7|0.000000 |0.000000 |(accepted settled 0.005142 S)|0.005142 |
| |A0_2071|09:50:52.028556|A0 |A0_6 |-> |D0_4 |D |0.000581 |0.000581 |(accepted settled 0.004253 S)|0.004834 |
| |D0_1587|09:50:52.028696|D0 |D0_4 |<- |A0_6 |A |0.000140 |0.000721 |(accepted settled 0.003988 S)|0.004709 |
| |D0_1612|09:50:52.029260|D0 |D0_1 |-> |C0_6 |C |0.000564 |0.001285 |(accepted settled 0.003044 S)|0.004329 |
| |C0_1610|09:50:52.029350|C0 |C0_6 |<- |D0_1 |D |0.000090 |0.001375 |(accepted settled 0.002846 S)|0.004221 |
| |C0_1625|09:50:52.029672|C0 |C0_1 |-> |B0_5 |B |0.000322 |0.001697 |(accepted settled 0.002189 S)|0.003886 |
| |B0_1438|09:50:52.029760|B0 |B0_5 |<- |C0_1 |C |0.000088 |0.001785 |(accepted settled 0.002002 S)|0.003787 |
| |B0_1451|09:50:52.030117|B0 |B0_7 |-> | |peer_7|0.000357 |0.002142 |(accepted settled 0.001318 S)|0.003460 |
| |
| Each row in this table represents the facts about when a single transfer and its corresponding settlement was seen entering or exiting a router. |
| |
| | Field | Contents | |
| |--------------|----------| |
| |Src | Router instance and file line number where the transfer was seen| |
| | Time | timestamp |
| | Rtr | Router letter id and instance |
| | ConnId | Router connection id |
| | Dir | transfer direction. _<-_ indicates into the router, _->_ indicates out of the router |
| | ConnId | peer's connection id. Blank if the peer is a normal client and not a router. |
| | Peer | Peer's name. _peer7_ whold show the peer's container name in a popup. |
| | T delta | Time since previous row |
| | T elapsed | Time since the message first entered the system |
| | Settlement | Settlement state and time delta since message time in column 2 for this row. The settlement disposition log line is hyperlinked from the word _accepted_. |
| | S elapsed | Settlement elapsed time. This is the difference between the accepted disposition log record and the time when the message first entered the system. |
| |
| Row-by-row it is easiest to read the each line from left to right |
| * A0 connecton 11 received the transfer from peer_7. |
| * A0 connection 6 sent the message to D0 connection 4. |
| * D0 connection 4 received the message from A0 connection 6. |
| |
| and so on. This message came from a sender on peer_7, went through routers A, D, C, and B, and finally was |
| returned to a listener on peer_7. The receiver received the message 0.002142 S after the sender sent it. The |
| sender received the accepted disposition 0.005142 S after the sender sent the message. |
| |
| The transmit times are in order from top to bottom and the settlement times are in order from bottom to top. |