www/testing-goals.html - subversion - Git at Google

 <html>
   <head>
     <title>SVN Test</title>
   </head>

   <body bgcolor="white">

     <h1>Design goals for the SVN test suite</h1>

     <ul>
       <li>
 	<A HREF="#WHY">Why Test?</A>
       </li>
       <li>
 	<A HREF="#AUDIENCE">Audience</A>
       </li>
       <li>
 	<A HREF="#REQUIREMENTS">Requirements</A>
       </li>
       <li>
 	<A HREF="#EASEOFUSE">Ease of Use</A>
       </li>
       <li>
 	<A HREF="#LOCATION">Location</A>
       </li>
       <li>
 	<A HREF="#EXTERNAL">External dependencies</A>
       </li>
     </ul>


     <A NAME="WHY"><h2>Why Test?</h2></A>

     <p>
       Regression testing is an essential element of high quality software.
       Unfortunately, some developers have not had first hand exposure to a
       high quality testing framework.  Lack of familiarity with the positive
       effects of testing can be blamed for statements like:
       <br>
     </p>
     <blockquote>
       "I don't need to test my code, I know it works."
     </blockquote>
     <p>
       It is safe to say that the idea that developers do not introduce
       bugs has been disproved.
     </p>


     <A NAME="AUDIENCE"><h2>Audience</h2></A>

     <p>
       The test suite will be used by both developers and end users.
     </p>

     <p>
       <b>Developers</b> need a test suite to help with:
     </p>

     <p>
       <b><i>Fixing Bugs:</i></b>
       <br>
       Each time a bug is fixed, a test case should be added to the test
       suite. Creating a test case that reproduces a bug is a seemingly
       obvious requirement. If a bug cannot be reproduced, there is no way to
       be sure a given change will actually fix the problem. Once a test case
       has been created, it can be used to validate the correctness of a
       given patch.  Adding a new test case for each bug also ensures that
       the same bug will not be introduced again in the future.
     </p>

     <p>
       <b><i>Impact Analysis:</i></b>
       <br>
       A developer fixing a bug or adding a new feature needs to know if a
       given change breaks other parts of the code. It may seem obvious, but
       keeping a developer from introducing new bugs is one of the primary
       benefits of a using a regression test system.
     </p>

     <p>
       <b><i>Regression Analysis:</i></b>
       <br>
       When a test regression occurs, a developer will need to manually
       determine what has caused the failure.  The test system is not able to
       determine why a test case failed. The test system should simply report
       exactly which test results changed and when the last results were
       generated.
     </p>

     <p>
       <b>Users</b> need a test suite to help with:
     </p>

     <p>
       <b><i>Building:</i></b>
       <br>
       Building software can be a scary process.  Users that have never built
       software may be unwilling to try. Others may have tried to build a
       piece of software in the past, only to be thwarted by a difficult
       build process. Even if the build completed without an error, how can a
       user be confident that the generated executable actually works?  The
       only workable solution to this problem is to provide an easily
       accessible set of tests that the user can run after building.
     </p>

     <p>
       <b><i>Porting:</i></b>
       <br>
       Often, users become porters when the need to run on a previously
       unsupported system arises. This porting process typically require some
       minor tweaking of include files.  It is absolutely critical that
       testing be available when porting since the primary developers may not
       have any way to test changes submitted by someone doing a port.
     </p>


     <p>
       <b><i>Testing:</i></b>
       <br>
       Different installations of the exact same OS can contain subtle
       differences that cause software to operate incorrectly.  Only testing
       on different systems will expose problems of this nature. A test suite
       can help identify these sorts of problems before a program is actually
       put to use.
     </p>


     <A NAME="REQUIREMENTS"><h2>Requirements</h2></A>

     <p>
       Functional requirements of an acceptable test suite include:
     </p>

     <p>
       <b><i>Unique Test Identifiers:</i></b>
       <br>
       Each test case must have a globally unique test identifier, this
       identifier is just a string. A globally unique string is
       required so that test cases can be individually identified by
       name, sorted, and even looked up on the web.  It seems simple,
       perhaps even blatantly obvious, but some other test packages
       have failed to maintain uniqueness in test identifiers and
       developers have suffered because of it. It is even desirable for
       the system actively enforces this uniqueness requirement.
     </p>

     <p>
       <b><i>Exact Results:</i></b>
       <br>
       A test case must have one expected result. If the result of
       running the tests does not exactly match the expected result,
       the test must fail.
     </p>

     <p>
       <b><i>Reproducible Results:</b></i>
       <br>
       Test results should be reproducible.  If a test result matches
       the expected result, it should do so every time the test is
       run. External factors like time stamps must not effect the
       results of a test.
     </p>

     <p>
       <b><i>Self-Contained Tests:</b></i>
       <br>
       Each test should be self-contained.  Results for one test should
       not depend on side effects of previous tests. This is obviously
       a good practice, since one is able to understand everything a
       test is doing without having to look at other tests. The test
       system should also support random access so that a single test
       or set of tests can be run. If a test is not self-contained, it
       cannot be run in isolation.
     </p>

     <p>
       <b><i>Selective Execution:</i></b>
       <br>
       It may not be possible to run a given set of tests on certain
       systems. The suite must provide a means of selectively running
       tests cases based on the environment. The test system must also
       provide a way to selectively run a given test case or set of
       test cases on a per invocation basis. It would be incredibly
       tedious to run the entire suite to see the results for a single
       test.
     </p>

     <p>
       <b><i>No Monitoring:</i></b>
       <br>
       The tests must run from start to end without operator
       intervention.  Test results must be generated automatically. It
       is critical that an operator not need to manually compare test
       results to figure out which tests failed and which ones passed.
     </p>


     <p>
       <b><i>Automatic Logging of Results:</i></b>
       <br>
       The system must store test results so that they can be compared
       later. This applies to machine readable results as well as human
       readable results. For example, assume we have a test named
       <code>client-1</code>, it expects a result of 1 but instead 0 is
       returned by the test case.  We should expect the system to store
       two distinct pieces of information. First, that the test
       failed. Second, how the test failed, meaning how the expected
       result differed from the actual result.
     <p>

     <p>
       This following example shows the kind of results we might record
       in a results log file.
     </p>

     <p>
       <code><pre>
    client-1 FAILED
    client-2 PASSED
    client-3 PASSED
     </pre></code>
   </p>

     <p>
       <b><i>Automatic Recovery:</i></b>
       <br>
       The test system must be able to recover from crashes and
       unexpected delays.  For example, a child process might go into a
       infinite loop and would need to be killed. The test shell itself
       might also crash or go into an infinite loop. In these cases,
       the test run must automatically recover and continue with the
       tests directly after the one that crashed.
     </p>

     <p>
       This is critical for a couple of reasons. Nasty crashes and
       infinite loops most often appear on users (not developers)
       systems. Users are not well equipped to deal with these sorts of
       exceptional situations.  It is unrealistic to expect that users
       will be able to manually recover from disaster and restart
       crashed test cases. It is an accomplishment just to get them to
       run the tests in the first place!
     </p>

     <p>
       Ensuring that the test system actually runs each and every test
       is critical, since a failing test near the end of the suite
       might never be noticed if a crash halfway through kept all the
       tests from being run.  This process must be completely
       automated, no operator intervention should be required.
     </p>


     <p>
       <b><i>Report Results Only:</i></b>
       <br>
       When a regression is found, a developer will need to manually
       determine the reason for the regression.  The system should tell
       the developer exactly what tests have failed, when the last set
       of results were generated, and what the previous results
       actually were.  Any additional functionality is outside the
       scope of the test system.
     </p>

     <p>
       <b><i>Platform Specific Results:</i></b>
       <br>
       Each supported platform should have an associated set of test
       results. The naive approach would be to maintain a single set of
       results and compare the output for any platform to the known
       results. The problem with this approach is that is does not
       provide a way to keep track of when changes differ from one
       platform to another. The following example attempts to clarify
       with an example.
     </p>

     <p>
       Assume you have the following tests results generated on a
       reference platform before and after a set of changes were
       committed.
     </p>

     <table BORDER=1 COLS=2>

       <tr>
 	<td><b>Before</b> (Reference Platform)</td>

 	<td><b>After</b> (Reference Platform)</td>
       </tr>

       <tr>
 	<td><code>client-1 PASSED</code></td>
 	<td><code>client-1 PASSED</code></td>
       </tr>

       <tr>
 	<td><code>client-2 PASSED</code></td>
 	<td><code>client-2 FAILED</code></td>
       </tr>

     </table>

     <p>
       It is clear that the change you made introduced a regression in
       the <code>client-2</code> test.  The problem shows up when you
       try to compare results generated from this modified code on some
       other platform. For example, assume you got the following
       results:
     </p>

     <table BORDER=1 COLS=2>

       <tr>
 	<td><b>Before</b> (Reference Platform)</td>

 	<td><b>After</b> (Other Platform)</td>
       </tr>

       <tr>
 	<td><code>client-1 PASSED</code></td>
 	<td><code>client-1 FAILED</code></td>
       </tr>

       <tr>
 	<td><code>client-2 PASSED</code></td>
 	<td><code>client-2 PASSED</code></td>
       </tr>

     </table>

     <p>
       Now things are not at all clear. We know that
       <code>client-1</code> is failing but we don't know if it is
       related to the change we just made. We don't know if this test
       failed the last time we ran the tests on this platform since we
       only have results for the reference platform to compare to. We
       might have fixed a bug in <code>client-2</code>, or we might
       have done nothing to effect it.
     </p>

     <p>
       If we instead keep track of test results on a platform by
       platform basis, we can avoid much of this pain. It is easy to
       imagine how this problem could get considerably worse if there
       were 50 or 100 tests that behaved differently from one platform
       to the next.
     </p>

     <p>
       <b><i>Test Types:</i></b>
       <br>
       The test suite should support two types of tests. The first
       makes use of an external program like the svn client.  These
       kinds of tests will need to exec an external program and check
       the output and exit status of the child process. Note that it
       will not be possible to run this sort of test on Mac OS.  The
       second type of test will load subversion shared libraries and
       invoke methods in-process.
     </p>

     <p>
       This provides the ability to do extensive testing of the various
       subversion APIs without using the svn client. This also has the
       nice benefit that it will work on Mac OS, as well as Windows and
       Unix.
     </p>

     <A NAME="EASEOFUSE"><h2>Ease of Use</h2></A>

     <p>
       Developers will tend to avoid using a test suite if it is not
       easy to add new tests and maintain old ones.  If developers are
       uninterested in using the test suite, it will quickly fall into
       disrepair and become a burden instead of an aide.
     </p>

     <p>
       Users will simply avoid running the test suite if it is not
       extremely simple to use. A user should be able to build the
       software and then run:
     </p>

     <blockquote>
       <code>
 	% make check
       </code>
     </blockquote>

     <p>
       This should run the test suite and provide a very high level set
       of results that include how many tests results have changed
       since the last run.
     </p>

     <p>
       While this high level report is useful to developers, they will
       often need to examine results in more detail.  The system should
       provide a means to manually examine results, compare output,
       invoke a debugger, and other sorts of low level operations.
     </p>

     <p>
       The next example shows how a developer might run a specific
       subset of tests from the command line. The pattern given would
       be used to do a glob style match on the test case identifiers,
       and run any that matched.
     </p>

     <blockquote>
       <code>
 	% svntest "client-*"
       </code>
     </blockquote>

     <A NAME="LOCATION"><h2>Location</h2></A>

     <p>
       The test suite should be packaged along with the source code
       instead of being made available as a separate download. This
       significantly simplifies the process of running tests since they
       are already incorporated into the build tree.
     </p>

     <p>
       The test suite must support building and running inside and
       outside of the source directory. For example, a developer might
       want to run tests on both Solaris and Linux. The developer
       should be able to run the tests concurrently in two different
       build directories without having the tests interfere with each
       other.
     </p>


     <A NAME="EXTERNAL"><h2>External program dependencies</h2></A>

     <p>
       As much as possible, the test suite should avoid depending on
       external programs or libraries.

       Of course, there is a nasty bootstrap problem with a test suite
       implemented in a scripting language. A wide variety of systems
       provide no support for modern scripting languages. We will avoid
       this issue for now and assume that the scripting language of
       choice is supported by the system.
     </p>

     <p>
       For example, the test suite should not depend on CVS to generate
       test results. Many users will not have access to CVS on the
       system they want to test subversion on.
     </p>

   </body>
 </html>
	<html>
	<head>
	<title>SVN Test</title>
	</head>

	<body bgcolor="white">

	<h1>Design goals for the SVN test suite</h1>

	<ul>
	<li>
	<A HREF="#WHY">Why Test?</A>
	</li>
	<li>
	<A HREF="#AUDIENCE">Audience</A>
	</li>
	<li>
	<A HREF="#REQUIREMENTS">Requirements</A>
	</li>
	<li>
	<A HREF="#EASEOFUSE">Ease of Use</A>
	</li>
	<li>
	<A HREF="#LOCATION">Location</A>
	</li>
	<li>
	<A HREF="#EXTERNAL">External dependencies</A>
	</li>
	</ul>



	<A NAME="WHY"><h2>Why Test?</h2></A>

	<p>
	Regression testing is an essential element of high quality software.
	Unfortunately, some developers have not had first hand exposure to a
	high quality testing framework. Lack of familiarity with the positive
	effects of testing can be blamed for statements like:
	<br>
	</p>
	<blockquote>
	"I don't need to test my code, I know it works."
	</blockquote>
	<p>
	It is safe to say that the idea that developers do not introduce
	bugs has been disproved.
	</p>


	<A NAME="AUDIENCE"><h2>Audience</h2></A>

	<p>
	The test suite will be used by both developers and end users.
	</p>

	<p>
	<b>Developers</b> need a test suite to help with:
	</p>

	<p>
	<b><i>Fixing Bugs:</i></b>
	<br>
	Each time a bug is fixed, a test case should be added to the test
	suite. Creating a test case that reproduces a bug is a seemingly
	obvious requirement. If a bug cannot be reproduced, there is no way to
	be sure a given change will actually fix the problem. Once a test case
	has been created, it can be used to validate the correctness of a
	given patch. Adding a new test case for each bug also ensures that
	the same bug will not be introduced again in the future.
	</p>

	<p>
	<b><i>Impact Analysis:</i></b>
	<br>
	A developer fixing a bug or adding a new feature needs to know if a
	given change breaks other parts of the code. It may seem obvious, but
	keeping a developer from introducing new bugs is one of the primary
	benefits of a using a regression test system.
	</p>

	<p>
	<b><i>Regression Analysis:</i></b>
	<br>
	When a test regression occurs, a developer will need to manually
	determine what has caused the failure. The test system is not able to
	determine why a test case failed. The test system should simply report
	exactly which test results changed and when the last results were
	generated.
	</p>

	<p>
	<b>Users</b> need a test suite to help with:
	</p>

	<p>
	<b><i>Building:</i></b>
	<br>
	Building software can be a scary process. Users that have never built
	software may be unwilling to try. Others may have tried to build a
	piece of software in the past, only to be thwarted by a difficult
	build process. Even if the build completed without an error, how can a
	user be confident that the generated executable actually works? The
	only workable solution to this problem is to provide an easily
	accessible set of tests that the user can run after building.
	</p>

	<p>
	<b><i>Porting:</i></b>
	<br>
	Often, users become porters when the need to run on a previously
	unsupported system arises. This porting process typically require some
	minor tweaking of include files. It is absolutely critical that
	testing be available when porting since the primary developers may not
	have any way to test changes submitted by someone doing a port.
	</p>


	<p>
	<b><i>Testing:</i></b>
	<br>
	Different installations of the exact same OS can contain subtle
	differences that cause software to operate incorrectly. Only testing
	on different systems will expose problems of this nature. A test suite
	can help identify these sorts of problems before a program is actually
	put to use.
	</p>




	<A NAME="REQUIREMENTS"><h2>Requirements</h2></A>

	<p>
	Functional requirements of an acceptable test suite include:
	</p>

	<p>
	<b><i>Unique Test Identifiers:</i></b>
	<br>
	Each test case must have a globally unique test identifier, this
	identifier is just a string. A globally unique string is
	required so that test cases can be individually identified by
	name, sorted, and even looked up on the web. It seems simple,
	perhaps even blatantly obvious, but some other test packages
	have failed to maintain uniqueness in test identifiers and
	developers have suffered because of it. It is even desirable for
	the system actively enforces this uniqueness requirement.
	</p>

	<p>
	<b><i>Exact Results:</i></b>
	<br>
	A test case must have one expected result. If the result of
	running the tests does not exactly match the expected result,
	the test must fail.
	</p>

	<p>
	<b><i>Reproducible Results:</b></i>
	<br>
	Test results should be reproducible. If a test result matches
	the expected result, it should do so every time the test is
	run. External factors like time stamps must not effect the
	results of a test.
	</p>

	<p>
	<b><i>Self-Contained Tests:</b></i>
	<br>
	Each test should be self-contained. Results for one test should
	not depend on side effects of previous tests. This is obviously
	a good practice, since one is able to understand everything a
	test is doing without having to look at other tests. The test
	system should also support random access so that a single test
	or set of tests can be run. If a test is not self-contained, it
	cannot be run in isolation.
	</p>

	<p>
	<b><i>Selective Execution:</i></b>
	<br>
	It may not be possible to run a given set of tests on certain
	systems. The suite must provide a means of selectively running
	tests cases based on the environment. The test system must also
	provide a way to selectively run a given test case or set of
	test cases on a per invocation basis. It would be incredibly
	tedious to run the entire suite to see the results for a single
	test.
	</p>

	<p>
	<b><i>No Monitoring:</i></b>
	<br>
	The tests must run from start to end without operator
	intervention. Test results must be generated automatically. It
	is critical that an operator not need to manually compare test
	results to figure out which tests failed and which ones passed.
	</p>


	<p>
	<b><i>Automatic Logging of Results:</i></b>
	<br>
	The system must store test results so that they can be compared
	later. This applies to machine readable results as well as human
	readable results. For example, assume we have a test named
	<code>client-1</code>, it expects a result of 1 but instead 0 is
	returned by the test case. We should expect the system to store
	two distinct pieces of information. First, that the test
	failed. Second, how the test failed, meaning how the expected
	result differed from the actual result.
	<p>

	<p>
	This following example shows the kind of results we might record
	in a results log file.
	</p>

	<p>
	<code><pre>
	client-1 FAILED
	client-2 PASSED
	client-3 PASSED
	</pre></code>
	</p>

	<p>
	<b><i>Automatic Recovery:</i></b>
	<br>
	The test system must be able to recover from crashes and
	unexpected delays. For example, a child process might go into a
	infinite loop and would need to be killed. The test shell itself
	might also crash or go into an infinite loop. In these cases,
	the test run must automatically recover and continue with the
	tests directly after the one that crashed.
	</p>

	<p>
	This is critical for a couple of reasons. Nasty crashes and
	infinite loops most often appear on users (not developers)
	systems. Users are not well equipped to deal with these sorts of
	exceptional situations. It is unrealistic to expect that users
	will be able to manually recover from disaster and restart
	crashed test cases. It is an accomplishment just to get them to
	run the tests in the first place!
	</p>

	<p>
	Ensuring that the test system actually runs each and every test
	is critical, since a failing test near the end of the suite
	might never be noticed if a crash halfway through kept all the
	tests from being run. This process must be completely
	automated, no operator intervention should be required.
	</p>


	<p>
	<b><i>Report Results Only:</i></b>
	<br>
	When a regression is found, a developer will need to manually
	determine the reason for the regression. The system should tell
	the developer exactly what tests have failed, when the last set
	of results were generated, and what the previous results
	actually were. Any additional functionality is outside the
	scope of the test system.
	</p>

	<p>
	<b><i>Platform Specific Results:</i></b>
	<br>
	Each supported platform should have an associated set of test
	results. The naive approach would be to maintain a single set of
	results and compare the output for any platform to the known
	results. The problem with this approach is that is does not
	provide a way to keep track of when changes differ from one
	platform to another. The following example attempts to clarify
	with an example.
	</p>

	<p>
	Assume you have the following tests results generated on a
	reference platform before and after a set of changes were
	committed.
	</p>

	<table BORDER=1 COLS=2>

	<tr>
	<td><b>Before</b> (Reference Platform)</td>

	<td><b>After</b> (Reference Platform)</td>
	</tr>

	<tr>
	<td><code>client-1 PASSED</code></td>
	<td><code>client-1 PASSED</code></td>
	</tr>

	<tr>
	<td><code>client-2 PASSED</code></td>
	<td><code>client-2 FAILED</code></td>
	</tr>

	</table>

	<p>
	It is clear that the change you made introduced a regression in
	the <code>client-2</code> test. The problem shows up when you
	try to compare results generated from this modified code on some
	other platform. For example, assume you got the following
	results:
	</p>

	<table BORDER=1 COLS=2>

	<tr>
	<td><b>Before</b> (Reference Platform)</td>

	<td><b>After</b> (Other Platform)</td>
	</tr>

	<tr>
	<td><code>client-1 PASSED</code></td>
	<td><code>client-1 FAILED</code></td>
	</tr>

	<tr>
	<td><code>client-2 PASSED</code></td>
	<td><code>client-2 PASSED</code></td>
	</tr>

	</table>

	<p>
	Now things are not at all clear. We know that
	<code>client-1</code> is failing but we don't know if it is
	related to the change we just made. We don't know if this test
	failed the last time we ran the tests on this platform since we
	only have results for the reference platform to compare to. We
	might have fixed a bug in <code>client-2</code>, or we might
	have done nothing to effect it.
	</p>

	<p>
	If we instead keep track of test results on a platform by
	platform basis, we can avoid much of this pain. It is easy to
	imagine how this problem could get considerably worse if there
	were 50 or 100 tests that behaved differently from one platform
	to the next.
	</p>

	<p>
	<b><i>Test Types:</i></b>
	<br>
	The test suite should support two types of tests. The first
	makes use of an external program like the svn client. These
	kinds of tests will need to exec an external program and check
	the output and exit status of the child process. Note that it
	will not be possible to run this sort of test on Mac OS. The
	second type of test will load subversion shared libraries and
	invoke methods in-process.
	</p>

	<p>
	This provides the ability to do extensive testing of the various
	subversion APIs without using the svn client. This also has the
	nice benefit that it will work on Mac OS, as well as Windows and
	Unix.
	</p>

	<A NAME="EASEOFUSE"><h2>Ease of Use</h2></A>

	<p>
	Developers will tend to avoid using a test suite if it is not
	easy to add new tests and maintain old ones. If developers are
	uninterested in using the test suite, it will quickly fall into
	disrepair and become a burden instead of an aide.
	</p>

	<p>
	Users will simply avoid running the test suite if it is not
	extremely simple to use. A user should be able to build the
	software and then run:
	</p>

	<blockquote>
	<code>
	% make check
	</code>
	</blockquote>

	<p>
	This should run the test suite and provide a very high level set
	of results that include how many tests results have changed
	since the last run.
	</p>

	<p>
	While this high level report is useful to developers, they will
	often need to examine results in more detail. The system should
	provide a means to manually examine results, compare output,
	invoke a debugger, and other sorts of low level operations.
	</p>

	<p>
	The next example shows how a developer might run a specific
	subset of tests from the command line. The pattern given would
	be used to do a glob style match on the test case identifiers,
	and run any that matched.
	</p>

	<blockquote>
	<code>
	% svntest "client-*"
	</code>
	</blockquote>

	<A NAME="LOCATION"><h2>Location</h2></A>

	<p>
	The test suite should be packaged along with the source code
	instead of being made available as a separate download. This
	significantly simplifies the process of running tests since they
	are already incorporated into the build tree.
	</p>

	<p>
	The test suite must support building and running inside and
	outside of the source directory. For example, a developer might
	want to run tests on both Solaris and Linux. The developer
	should be able to run the tests concurrently in two different
	build directories without having the tests interfere with each
	other.
	</p>


	<A NAME="EXTERNAL"><h2>External program dependencies</h2></A>

	<p>
	As much as possible, the test suite should avoid depending on
	external programs or libraries.

	Of course, there is a nasty bootstrap problem with a test suite
	implemented in a scripting language. A wide variety of systems
	provide no support for modern scripting languages. We will avoid
	this issue for now and assume that the scripting language of
	choice is supported by the system.
	</p>

	<p>
	For example, the test suite should not depend on CVS to generate
	test results. Many users will not have access to CVS on the
	system they want to test subversion on.
	</p>

	</body>
	</html>