Flaky TestPaxos.test_replica_availability

patch by Berenguer Blasi; reviewed by Ekaterina Dimitrova for CASSANDRA-16693
1 file changed
tree: a576a5674c969c452e2c21e849d7175f821fc0ec
  1. .circleci/
  2. byteman/
  3. cassandra-thrift/
  4. conf/
  5. cqlsh_tests/
  6. lib/
  7. meta_tests/
  8. plugins/
  9. repair_tests/
  10. sstables/
  11. stress_profiles/
  12. thrift_bindings/
  13. tools/
  14. upgrade_tests/
  15. .gitignore
  16. .isort.cfg
  17. .travis.yml
  18. auditlog_test.py
  19. auth_join_ring_false_test.py
  20. auth_test.py
  21. batch_test.py
  22. bootstrap_test.py
  23. cdc_test.py
  24. cfid_test.py
  25. client_network_stop_start_test.py
  26. client_request_metrics_test.py
  27. commitlog_test.py
  28. compaction_test.py
  29. compression_test.py
  30. concurrent_schema_changes_test.py
  31. configuration_test.py
  32. conftest.py
  33. consistency_test.py
  34. consistent_bootstrap_test.py
  35. CONTRIBUTING.md
  36. counter_test.py
  37. cql_prepared_test.py
  38. cql_test.py
  39. cql_tracing_test.py
  40. delete_insert_test.py
  41. deletion_test.py
  42. disk_balance_test.py
  43. dtest.py
  44. dtest_config.py
  45. dtest_setup.py
  46. dtest_setup_overrides.py
  47. env.txt
  48. findlibjemalloc.sh
  49. fqltool_test.py
  50. global_row_key_cache_test.py
  51. gossip_test.py
  52. hintedhandoff_test.py
  53. internode_ssl_test.py
  54. jmx_auth_test.py
  55. jmx_test.py
  56. json_test.py
  57. json_tools_test.py
  58. largecolumn_test.py
  59. legacy_sstables_test.py
  60. LICENSE
  61. license.txt
  62. linter_check.sh
  63. materialized_views_test.py
  64. metadata_test.py
  65. mixed_version_test.py
  66. multidc_putget_test.py
  67. native_transport_ssl_test.py
  68. nodetool_test.py
  69. offline_tools_test.py
  70. paging_test.py
  71. paxos_test.py
  72. pending_range_test.py
  73. prepared_statements_test.py
  74. pushed_notifications_test.py
  75. putget_test.py
  76. pytest.ini
  77. range_ghost_test.py
  78. read_failures_test.py
  79. read_repair_test.py
  80. README.md
  81. rebuild_test.py
  82. refresh_test.py
  83. replace_address_test.py
  84. replica_side_filtering_test.py
  85. replication_test.py
  86. requirements.txt
  87. run_dtests.py
  88. schema_metadata_test.py
  89. schema_test.py
  90. scrub_test.py
  91. secondary_indexes_test.py
  92. seed_test.py
  93. snapshot_test.py
  94. snitch_test.py
  95. sslnodetonode_test.py
  96. sstable_generation_loading_test.py
  97. sstablesplit_test.py
  98. sstableutil_test.py
  99. streaming_test.py
  100. stress_tool_test.py
  101. super_column_cache_test.py
  102. super_counter_test.py
  103. system_keyspaces_test.py
  104. thrift_hsha_test.py
  105. thrift_test.py
  106. token_generator_test.py
  107. topology_test.py
  108. transient_replication_ring_test.py
  109. transient_replication_test.py
  110. ttl_test.py
  111. udtencoding_test.py
  112. upgrade_crc_check_chance_test.py
  113. upgrade_internal_auth_test.py
  114. user_functions_test.py
  115. user_types_test.py
  116. wide_rows_test.py
  117. write_failures_test.py
README.md

Cassandra Distributed Tests (DTests)

Cassandra Distributed Tests (or better known as “DTests”) are a set of Python-based tests for Apache Cassandra clusters. DTests aim to test functionality that requires multiple Cassandra instances. Functionality that of code that can be tested in isolation should ideally be a unit test (which can be found in the actual Cassandra repository).

Setup and Prerequisites

Some environmental setup is required before you can start running DTests.

Native Dependencies

DTests requires the following native dependencies:

  • Python 3
  • PIP for Python 3
  • libev
  • git
  • JDK 8 (Java)

Linux

  1. apt-get install git-core python3 python3-pip python3-dev libev4 libev-dev
  2. (Optional - solves warning: “jemalloc shared library could not be preloaded to speed up memory allocations”): apt-get install -y --no-install-recommends libjemalloc1

Mac

On Mac, the easiest path is to install the latest Xcode and Command Line Utilities to bootstrap your development environment and then use Homebrew

  1. (Optional) Make sure brew is in a good state on your system brew doctor
  2. brew install python3 libev

Python Dependencies

There are multiple external Python dependencies required to run DTests. The current Python dependency list is maintained in a file named requirements.txt in the root of the cassandra-dtest repository.

The easiest way to install these dependencies is with pip and virtualenv.

Note: While virtualenv isn't strictly required, using virtualenv is almost always the quickest path to success as it provides common base setup across various configurations.

  1. Install virtualenv: pip install virtualenv
  2. Create a new virtualenv: virtualenv --python=python3 --no-site-packages ~/dtest
  3. Switch/Activate the new virtualenv: source ~/dtest/bin/activate
  4. Install remaining DTest Python dependencies: pip install -r /path/to/cassandra-dtest/requirements.txt

Usage

The tests are executed by the pytest framework which includes a helpful Usage and Invocations document which is a great place to start for basic invocation options when using pytest.

At minimum,

The only thing the framework needs to know is the location of the (compiled (hint: ant clean jar)) sources for Cassandra. There are two options:

Use existing sources:

pytest --cassandra-dir=~/path/to/cassandra

Use ccm ability to download/compile released sources from archives.apache.org:

pytest --cassandra-version=1.0.0

A convenient option if tests are regularly run against the same existing directory is to set cassandra_dir in ~/path/to/cassandra-dtest/pytest.ini:

[pytest]
cassandra_dir=~/path/to/cassandra

The tests will use this directory by default, avoiding the need for any environment variable (that still will have precedence if given though).

To run a specific test file, class or individual test, you only have to pass its path as an argument:

pytest --cassandra-dir=~/path/to/cassandra pending_range_test.py
pytest --cassandra-dir=~/path/to/cassandra pending_range_test.py::TestPendingRangeMovements
pytest --cassandra-dir=~/path/to/cassandra pending_range_test.py::TestPendingRangeMovements::test_pending_range

When adding a new test or modifying an existing one, it's always a good idea to run it several times to make sure it is stable. This can be easily done with the --count option. For example, to run a test class 10 times:

pytest --count=10 --cassandra-dir=~/path/to/cassandra pending_range_test.py

Existing tests are probably the best place to start to look at how to write tests.

Each test spawns a new fresh cluster and tears it down after the test. If a test fails, the logs for the node are saved in a logs/<timestamp> directory for analysis (it‘s not perfect but has been good enough so far, I’m open to better suggestions).

Writing Tests

If you'd like to know what to expect during a code review, please see the included CONTRIBUTING file.

Debugging Tests

Some general tips for debugging dtest/pytest tests

pytest.set_trace()

If there is an unexpected value being asserted on and you'd like to inspect the state of all the tests variables just before a paricular assert, add pytest.set_trace() right before the problematic code. The next time you execute the test, when that line of code is reached pytest will drop you into an interactive python debugger (pdb). From there you can use standard python options to inspect various methods and variables for debugging.

Hung tests/hung pytest framework

Debugging hung tests can be very difficult but thanks to improvements in Python 3 it's now pretty painless to get a python thread dump of all the threads currently running in the pytest process.

import faulthandler
faulthandler.enable()

Adding the above code will install a signal handler into your process. When the process recieves a SIGABRT signal, python will dump python thread dumps for all running threads in the process. DTests installs this by default with the install_debugging_signal_handler fixture.

The following is an example of what you might see if you send a SIGABRT signal to the pytest process while in a hung state during the test teardown phase after the successful completion of the actual dtest.

(env) cassandra-dtest vcooluser$ kill -SIGABRT 24142

Fatal Python error: Aborted

Thread 0x000070000f739000 (most recent call first):
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 295 in wait
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 551 in wait
  File "/Users/mkjellman/src/cassandra-dtest/tools/data.py", line 31 in query_c1c2
  File "/Users/mkjellman/src/cassandra-dtest/bootstrap_test.py", line 91 in <lambda>
  File "/Users/mkjellman/src/cassandra-dtest/dtest.py", line 245 in run
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 916 in _bootstrap_inner
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 884 in _bootstrap

Thread 0x000070000e32d000 (most recent call first):
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncore.py", line 183 in poll2
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncore.py", line 207 in loop
  File "/Users/mkjellman/env3/src/cassandra-driver/cassandra/io/asyncorereactor.py", line 119 in loop
  File "/Users/mkjellman/env3/src/cassandra-driver/cassandra/io/asyncorereactor.py", line 258 in _run_loop
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 864 in run
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 916 in _bootstrap_inner
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 884 in _bootstrap

Current thread 0x00007fffa00dd340 (most recent call first):
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 1072 in _wait_for_tstate_lock
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 1056 in join
  File "/Users/mkjellman/src/cassandra-dtest/dtest.py", line 253 in stop
  File "/Users/mkjellman/src/cassandra-dtest/dtest.py", line 580 in tearDown
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/unittest/case.py", line 608 in run
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/unittest/case.py", line 653 in __call__
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/_pytest/unittest.py", line 174 in runtest
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/_pytest/runner.py", line 107 in pytest_runtest_call
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/pluggy/callers.py", line 180 in _multicall
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/pluggy/__init__.py", line 216 in <lambda>
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/pluggy/__init__.py", line 222 in _hookexec
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/pluggy/__init__.py", line 617 in __call__
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/flaky/flaky_pytest_plugin.py", line 273 in <lambda>
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/_pytest/runner.py", line 191 in __init__
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/flaky/flaky_pytest_plugin.py", line 274 in call_runtest_hook
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/flaky/flaky_pytest_plugin.py", line 118 in call_and_report
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/_pytest/runner.py", line 77 in runtestprotocol
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/_pytest/runner.py", line 63 in pytest_runtest_protocol
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/flaky/flaky_pytest_plugin.py", line 81 in pytest_runtest_protocol
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/pluggy/callers.py", line 180 in _multicall
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/pluggy/__init__.py", line 216 in <lambda>
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/pluggy/__init__.py", line 222 in _hookexec
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/pluggy/__init__.py", line 617 in __call__
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/_pytest/main.py", line 164 in pytest_runtestloop
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/pluggy/callers.py", line 180 in _multicall
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/pluggy/__init__.py", line 216 in <lambda>
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/pluggy/__init__.py", line 222 in _hookexec
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/pluggy/__init__.py", line 617 in __call__
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/_pytest/main.py", line 141 in _main
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/_pytest/main.py", line 103 in wrap_session
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/_pytest/main.py", line 134 in pytest_cmdline_main
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/pluggy/callers.py", line 180 in _multicall
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/pluggy/__init__.py", line 216 in <lambda>
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/pluggy/__init__.py", line 222 in _hookexec
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/pluggy/__init__.py", line 617 in __call__
  File "/Users/mkjellman/env3/lib/python3.6/site-packages/_pytest/config.py", line 59 in main
  File "/Users/mkjellman/env3/bin/pytest", line 11 in <module>
Abort trap: 6

Debugging Issues with Fixtures and Test Setup/Teardown

pytest can appear to be doing “magic” more often than not. One place it may be hard to follow what actual code will get executed by normal code inspection alone is determining which fixtures will run for a given test and in what order. pytest provides a --setup-plan command line argument. When pytest is invoked with this argument it will print a execution plan including all fixtures and tests that actually running the test will invoke. The below is an example for the current execution plan pytest generates for dtest auth_test.py::TestAuthRoles::test_create_drop_role

(env3) Michaels-MacBook-Pro:cassandra-dtest mkjellman$ pytest --cassandra-dir=/Users/mkjellman/src/mkjellman-oss-github-cassandra-trunk auth_test.py::TestAuthRoles::test_create_drop_role --setup-plan
====================================================================== test session starts ======================================================================
platform darwin -- Python 3.6.3, pytest-3.3.0, py-1.5.2, pluggy-0.6.0
rootdir: /Users/mkjellman/src/cassandra-dtest, inifile: pytest.ini
plugins: timeout-1.2.1, raisesregexp-2.1, nose2pytest-1.0.8, flaky-3.4.0
collected 1 item                                                                                                                                                

auth_test.py 
SETUP    S install_debugging_signal_handler
    SETUP    C fixture_logging_setup
      SETUP    F fixture_dtest_setup_overrides
      SETUP    F fixture_log_test_name_and_date
      SETUP    F fixture_maybe_skip_tests_requiring_novnodes
      SETUP    F parse_dtest_config
      SETUP    F fixture_dtest_setup (fixtures used: fixture_dtest_setup_overrides, fixture_logging_setup, parse_dtest_config)
      SETUP    F fixture_since (fixtures used: fixture_dtest_setup)
      SETUP    F fixture_dtest_config (fixtures used: fixture_logging_setup)
      SETUP    F set_dtest_setup_on_function (fixtures used: fixture_dtest_config, fixture_dtest_setup)
        auth_test.py::TestAuthRoles::()::test_create_drop_role (fixtures used: fixture_dtest_config, fixture_dtest_setup, fixture_dtest_setup_overrides, fixture_log_test_name_and_date, fixture_logging_setup, fixture_maybe_skip_tests_requiring_novnodes, fixture_since, install_debugging_signal_handler, parse_dtest_config, set_dtest_setup_on_function)
      TEARDOWN F set_dtest_setup_on_function
      TEARDOWN F fixture_dtest_config
      TEARDOWN F fixture_since
      TEARDOWN F fixture_dtest_setup
      TEARDOWN F parse_dtest_config
      TEARDOWN F fixture_maybe_skip_tests_requiring_novnodes
      TEARDOWN F fixture_log_test_name_and_date
      TEARDOWN F fixture_dtest_setup_overrides
    TEARDOWN C fixture_logging_setup
TEARDOWN S install_debugging_signal_handler
===Flaky Test Report===


===End Flaky Test Report===

====================================================================== 0 tests deselected =======================================================================
================================================================= no tests ran in 0.12 seconds ==================================================================

Instances Failing to Start (Unclean Test Teardown)

Getting into a state (especially while writing new tests or debugging problamatic ones) where pytest/dtest fails to fully tear-down all local C* instancse that were started. You can use this handy one liner to kill all C* instances in one go:

ps aux | grep -ie CassandraDaemon | grep java | awk '{print $2}' | xargs kill

Links