This is the README file for the new MCell 3.1 testsuite. Sections: 1. Running the test suite 2. Analyzing errors during a run of the test suite 3. Extending the test suite ========================= 1. Running the test suite ========================= The test suite is launched using the top-level test script in this subdirectory, 'main.py'. Run without any arguments, it will attempt to run the entire test suite. The script depends upon some Python modules which are checked into the BZR repository under testsuite/system_tests, but it will automatically find those modules if the main.py script is in its customary location. When you run the script, it will look for a configuration file called 'test.cfg' in the current directory. A typical test.cfg will only need two lines in it: [DEFAULT] mcellpath = /home/jed/src/mcell/3.1-pristine/build/debug/mcell This specifies which mcell executable to test. If there are other relevant testing settings, they can be placed into this file and accessed by the test suite. The default configuration file may be overridden using the -c command-line argument to the script. The script can take a handful of command-line arguments, which are summarized briefly in the help message provided when you run: ./main.py -h The script can also take a '-T' argument to specify the location of the test cases. If you are running the script from this directory, you do not need to specify the -T flag, as they will be found automatically with the default layout. The argument to -T should be the path to the mdl/testsuite subdirectory of the source tree for normal usage (or a copy thereof). By default, all test results will be deposited under a subdirectory 'test_results' created below the current directory. You may override this using '-r' and another directory name. BE CAREFUL! Presently, whatever directory is being used for test results will be entirely cleaned out before the test suite begins to run. XXX: Might it not be safer to have it move the old directory to a new name? Perhaps worth investigating. On the other hand, this will, by default, result in the rapid accumulation of results directories. Still, probably better to fail to delete a thousand unneeded files than to delete one unintended... main.py also takes '-v' to increase verbosity. Notable levels of verbosity: 0: (default) Very little feedback as tests are running 1: Brief feedback as tests are running (. for successful tests, F for failures, E for errors) 2: Long feedback as tests are running (single lines indicating success or failure of each test, color coded if the output is a tty). Use '-l' to display a list of all of the tests and test collections that main.py "knows" about. Any of these may be included or excluded. For instance, right now, main.py -l shows: Found tests: - reactions : Reaction tests - numericsuite : Numeric validation for reactions - tempsuite : Shortcut to currently developing test - macromols : Macromolecule tests - numericsuite : Numeric validation for Macromolecules - errorsuite : Test error handling for invalid macromolecule constructs in MDL files - parser : Parser torture tests - vizsuite : VIZ output tests for DREAMM V3 modes - oldvizsuite : VIZ output tests for ASCII/RK/DX modes - fasttests : All quick running tests (valid+invalid MDL) - (quicksuite) - (errorsuite) - errorsuite : Test error handling for invalid MDL files - rtcheckpointsuite : Basic test of timed checkpoint functionality - quicksuite : A few quick running tests which cover most valid MDL options - allvizsuite : VIZ output tests for all modes (old+new) - (vizsuite) - (oldvizsuite) - kitchensinksuite : Kitchen Sink Test: (very nearly) every parser option - regression : Regression tests - suite : Regression test suite The indentation is significant, as it indicates subgroupings within the test collections. Note that some of the test collection names are parenthesized. These are collections which are redundant with the other collections in the suite and will not be included a second time, but were added to simplify running simple subsets of the entire test suite. By default, all tests will be run. To select just a subset of the tests, use: ./main.py -i where is a path-like identifier telling which test collection to run. Given the above output from "main.py -l", valid options include: reactions reactions/numericsuite parser parser/fasttests parser/fasttests/quicksuite parser/allvizsuite regression To exclude a subset of the tests, use: ./main.py -e Note that the -i and -e arguments are processed from left to right on the command-line, so you may do something like: ./main.py -i parser -e parser/allvizsuite to run the "parser" tests, skipping over the "allvizsuite" subgroup. Unless some -i arguments are specified, the initial set of included tests is the complete set, so you may also do: ./main.py -e macromols/numericsuite to run all tests except for the numeric validation tests in the macromolecules test suite. Finally, the list of test suites to include may be configured in the test.cfg file by adding a 'run_tests' directive to the test.cfg file, consisting of a comma-separated list: test.cfg: [DEFAULT] run_tests=parser/errorsuite,regression mcellpath=/path/to/mcell This way, the exact set of tests to run can be tailored to the particular configuration file. ========================= 2. Analyzing errors during a run of the test suite ========================= If errors are reported during a run of the test suite, you should get an informative message from the test suite. In many cases, these messages will be related to the exit code from mcell. For instance, here is an example run, edited for brevity, of the regression test suite on an old version of mcell: Running tests: - regression/suite .. .. ====================================================================== FAIL: test_010 (test_regression.TestRegressions) ---------------------------------------------------------------------- Traceback (most recent call last): File "../mdl/testsuite/regression/test_regression.py", line 128, in test_010 mt.invoke(get_output_dir()) File "./system_tests/testutils.py", line 332, in invoke self.__check_results() File "./system_tests/testutils.py", line 350, in __check_results assert os.WEXITSTATUS(self.got_exitcode) == self.expect_exitcode, "Expected exit code %d, got exit code %d" % (self.expect_exitcode, os.WEXITSTATUS(self.got_exitcode)) AssertionError: ./test_results/test-0010: Expected exit code 0, got exit code 139 ---------------------------------------------------------------------- Ran 10 tests in 49.084s FAILED (failures=5) The significant line to look for is the "AssertionError" line, which tells us two things: AssertionError: ./test_results/test-0010: Expected exit code 0, got exit code 139 First, it tells us which subdirectory to look in for exact details of the run which caused the failure, and it will give us a message which hints at the problem. In this case, the run exited with code 139, which is signal 11 (SIGSEGV). On a UNIX or Linux machine, the exit codes generally follow the following convention: 0: normal exit 1-126: miscellaneous errors (MCell always uses 1) 127: can't find executable file 129-255: execution terminated due to signal (exit_code - 128) The signals which may kill an execution are (note that the numbering is taken from a Linux machine, and some of the signals may be numbered or named differently on other systems, though many of these signal numbers are standard, such as 11 for SIGSEGV. Type 'kill -l' or see the signals man page on the system in question for more details on the name <-> number mappings.) sig exit code name/desc 1 129 SIGHUP - unlikely in common use 2 130 SIGINT - user hit Ctrl-C 3 131 SIGQUIT - user hit Ctrl-\ 4 132 SIGILL - illegal instruction -- exe may be for a different CPU 5 133 SIGTRAP - unlikely in common use 6 134 SIGABRT - abort - often caused by an assertion failure 7 135 SIGBUS - bus error -- less likely than SIGSEGV, but similar meaning 8 136 SIGFPE - floating point exception 9 137 SIGKILL - killed by user/sysadmin 10 138 SIGUSR1 - should not happen 11 139 SIGSEGV - accessed a bad pointer 12 140 SIGUSR2 - should not happen 13 141 SIGPIPE - unlikely in context of test suite 14 142 SIGALRM - should not happen 15 143 SIGTERM - manually killed by user/sysadmin Higher-numbered signals do exist, though the numbering becomes less consistent above 15, and the likelihood of occurrence is also much lower. In practice, the only signals likely to be encountered, barring manual user intervention, are SIGABRT, SIGFPE, and SIGSEGV, with SIGILL and SIGBUS thrown in very rarely. Returning to the example above, let's look at the files produced by the run. Looking at the contents of the test_results/test-0010 directory, I see: total 8 -rw-r--r-- 1 jed cnl 293 2009-03-13 17:27 cmdline.txt -rw-r--r-- 1 jed cnl 0 2009-03-13 17:27 poly_w_cracker.txt -rw-r--r-- 1 jed cnl 163 2009-03-13 17:27 realerr -rw-r--r-- 1 jed cnl 0 2009-03-13 17:27 realout -rw-r--r-- 1 jed cnl 0 2009-03-13 17:27 stderr -rw-r--r-- 1 jed cnl 0 2009-03-13 17:27 stdout The important files here are: cmdline.txt: the exact command line required to reproduce this bug realout: what mcell printed to out_file realerr: what mcell printed to err_file stdout: what mcell sent to stdout (usually should be empty) stderr: what mcell sent to stderr (usually should be empty) The contents of cmdline.txt from this run are: executable: /home/jed/src/mcell/3.1-pristine/build/debug/mcell full cmdline: /home/jed/src/mcell/3.1-pristine/build/debug/mcell -seed 13059 -logfile realout -errfile realerr -quiet /netapp/cnl/home/jed/src/mcell/3.2-pristine/mdl/testsuite/regression/10-counting_crashes_on_coincident_wall.mdl The full command-line should use absolute paths, so you should be able to use it to start a gdb session which should replicate this exact problem if it is repeatable. Likewise, any files which the test case should have produced will appear under this directory. This should allow you to examine the reaction and viz output files for necessary clues. ========================= 3. Extending the test suite ========================= Generally, adding tests to the test suite requires three steps: 1. write the mdl files 2. write the Python code to validate the resultant output from the mdl files 3. hook the test case into the system ----------- 1. writing the MDL files ----------- Writing the mdl files should be self explanatory, so I will focus on the other two pieces. It's worth including a few brief notes here, though. First, the MDL should be written to produce all output relative to the current directory. This makes it easy for the test suite scripts to manage the output test results. If additional command-line arguments are needed, they may be specified in the Python portion of the test case. Generally, I've started each test case in the test suite with a block comment which explains the purpose of the test along with an English-language description of the success and failure criteria. For instance: /**************************************************************************** * Regression test 01: Memory corruption when attempting to remove a * per-species list from the hash table. * * This is a bug encountered by Shirley Pepke (2008-04-24). When a * per-species list is removed from the hash table, if the hash table has a * collision for the element being removed, and the element being removed * was not the first element (i.e. was the element which originally * experienced the collision), memory could be corrupted due to a bug in the * hash table removal code. * * Failure: MCell crashes * Success: MCell does not crash (eventually all molecules should be * consumed, and the sim should run very fast) * * Author: Jed Wing * Date: 2008-04-24 ****************************************************************************/ This convention seems like a good way to keep documentation on the individual tests. Some of the subdirectories also have README files which contain brief summaries of the purpose of the tests and some details about the individual tests, but I consider those to be secondary. Still, for any commentary which does not pertain to a single specific test, or which is too long or complex to include in a block comment in the MDL file, creating or adding to a README file is probably a good way to capture the relevant information. ----------- 2. writing the Python code ----------- I've written utilities to help with validating most aspects of MCell output; the utilities are not comprehensive, but they allow many types of validation of reaction data outputs, and make a valiant effort at validating viz output types. I will produce a reference document for the various utilities, but will discuss a few of them briefly here. a. Python unittest quick introduction The MCell test suite is based on Python's unittest system, for which I'll give only a quick summary here. More documentation is available in the Python documentation. A Python unittest test case is a class, usually with a name which starts with "Test", and which subclasses unittest.TestCase: class TestCase1(unittest.TestCase): pass Inside the test case will be one or more methods with names that start with "test". This is not optional (at least for the simple usage I'm describing here). Python unittest uses the method names to automatically pick out all of the individual tests. Test cases may also include a setUp method and a tearDown method. Each test case is run, bracketed by calls to setUp and tearDown if they are defined. Inside the test cases must be a little bit of Python code which tests some aspect of whatever you are testing. These are tested using either Python 'assert' statements, as, or using various methods inherited from unittest.TestCase whose names begin with 'fail', such as 'failIfEqual', 'failUnlessEqual', or simply 'fail': failUnlessAlmostEqual # approximate equality failIfAlmostEqual # approximate equality failUnlessEqual # exact equality failIfEqual # exact equality failIf # arbitrary boolean failUnless # arbitrary boolean failUnlessRaises # check for expected exception fail # fail unilaterally The existing tests are (for some reason?) written using assert, rather than using the 'fail' methods. I'll have to ask myself why I did it that way next time I'm talking to myself. This means that if you run the test suite with -O, it will not work. At some point soon, I may convert it over to use the 'fail' methods which will not be disabled by the -O flag. [Another performance tweak that might help the test suite to run a little faster would be to enable psyco in main.py. psyco generally improves performance anywhere from a factor of 2 to a factor of a hundred. Obviously, it won't make the mcell runs themselves any faster...] So, an example test case might be: class TestReality(unittest.TestCase): def setUp(self): self.a = 1234 def tearDown(self): del self.a def test_a(self): self.failUnlessEqual(self.a, 1234) def test_b(self): if 3*3 != 9: self.fail("3*3 should be 9.") def test_c(self): # Check equality to 7 decimal places self.failUnlessAlmostEqual(1./3., 0.333333333, places=7) Generally, we don't know the order in which the tests are run, but one possible order for the method calls above is: setUp test_a tearDown setUp test_b tearDown setUp test_c tearDown Traditionally, the bottom of a file containing one or more TestCase subclasses will have the following two lines: if __name__ == "__main__": unittest.main() This way, if the file is invoked as a script from the command-line, it will automatically run all of the tests found in this module. Now, these automatically built test suites are not the only way to aggregate tests. unittest includes utilities for automatically scanning a test case for all tests which match a particular pattern. To do this, create a top-level function (i.e. not a method on your test case) which looks like: def vizsuite(): return unittest.makeSuite(TestParseVizDreamm, "test") In the above example, TestParseVizDream is the name of a TestCase subclass, and we've asked unittest to round up all of the methods from that test case whose names start with 'test' and treat them as a test suite. You may also create test suites which aggregate individual tests or other test suites. To do this, again create a top-level function which looks like: def fullsuite(): # Create a new suite suite = unittest.TestSuite() # add test 'test_viz_dreamm1' from the class TestParseVizDream suite.addTest(TestParseVizDreamm("test_viz_dreamm1")) # add the test suite 'vizsuite' defined above in a function suite.addTest(vizsuite()) # return the newly constructed suite return suite These aggregated test suites may be exposed in various ways via the top-level generic test runner I've written, and which I'll explore a little bit in the next section, and in more detail when I explain how to hook the new tests into the system. We will deviate from the established formulas in a few minor ways. The most important is that: if __name__ == "__main__": unittest.main() becomes: if __name__ == "__main__": cleandir(get_output_dir()) unittest.main() b. MCell tests introduction First, let's take a quick look at one of the regression tests. We'll start with the simplest possible test -- one which merely checks that the run didn't crash. To add this, all I did was to add the mdl files under mdl/testsuite/regression, and then add a brief function to test_regression.py. The test cases are generally named XX-whatever.mdl, though that may need to change as the test suite grows. The general structure of an MCell testcase is: 1. construct an McellTest object (or subclass thereof) 2. populate the object with parameters detailing what we consider to be a successful run 3. call obj.invoke(get_output_dir()) on the object to actually run the test. For many purposes, the McellTest class itself will suffice, but in some cases you may find yourself including the same bits of setup in a number of different tests, in which case it's probably worth moving the tests into a custom subclass of McellTest. Here is an example test drawn from the regression test: def test_010(self): mt = McellTest("regression", "10-counting_crashes_on_coincident_wall.mdl", ["-quiet"]) mt.set_check_std_handles(1, 1, 1) mt.invoke(get_output_dir()) As in normal Python unit tests, the name of the method doesn't matter, but it must be a method on a class which subclasses unittest.TestCase, and the name should start with 'test'. The arguments you see above in the construction of McellTest are: 1. "regression": an indication of which section of the test suite the run is part of (not really important, but it allows overriding configuration options for specific subsections of the test suite) 2. "10-counting_crashes_on_coincident_wall.mdl": the name of the top-level MDL file to run (path is relative to the location of the script containing the Python code, and is generally in the same directory at present) 3. ["-quiet"]: a list of additional arguments to give to the run. Generally, the run will also receive a random seed, and will have its log and error outputs redirected to a file. I provide '-quiet' to most of the runs I produce so that only the notifications I explicitly request will be turned on. The next line: mt.set_check_std_handles(1, 1, 1) tells the test to close stdin when it starts and to verify that nothing was written to stdout or stderr. This is almost always a good idea -- we can be sure that all output is properly redirected either to logfile or errfile, rather than being written directly to stdout/stderr using printf/fprintf. And finally: mt.invoke(get_output_dir()) runs the test. Unless otherwise specified (see the reference document for details), McellTest expects that mcell should exit with an exit code of 0, so we don't need to add any additional tests. c. Brief introduction to test utilities In most cases, our job isn't quite that simple, and in these cases, there are various ways to add additional success criteria. For instance: mt.add_extra_check(RequireFileMatches("realout", '\s*Probability.*set for a\{0\} \+ b\{0\} -> c\{0\}', expectMaxMatches=1)) This statement says that in order for the run to be considered successful, the file 'realout' (i.e. the logfile output from mcell) must contain a line which matches the regular expression: '\s*Probability.*set for a\{0\} \+ b\{0\} -> c\{0\}', and we've further specified that it must match it at most once. (By default, it must match at least once, so this means it must match exactly once.) Again, see the reference document for details on RequireFileMatches and other similar utilities. There are also similar utilities for checking various aspects of reaction data output and similarly formatted files. For instance, consider: mt.add_extra_check(RequireCountConstraints("cannonballs.txt", [(1, 1, -1, 0, 0, 0), # 0 (0, 0, 0, 1, 1, -1), # 0 (0, 0, 1, 0, 0, 0), # 500 (0, 0, 0, 0, 0, 1)], # 500 [0, 0, 500, 500], header=True)) This represents a set of exact constraints on the output file 'cannonballs.txt'. This matrix: [(1, 1, -1, 0, 0, 0), # 0 (0, 0, 0, 1, 1, -1), # 0 (0, 0, 1, 0, 0, 0), # 500 (0, 0, 0, 0, 0, 1)], # 500 will be multiplied by each row in the output file (after removing the header line and the "time" column), and each result vector must exactly match the vector: [0, 0, 500, 500], The file is assumed to have a header line, though no specific check is made of the header line -- the first line is just not subjected to the test (because of the 'header=True' directive.) This type of constraint can be used to verify various kinds of behavioral constraints on the counting. For instance, it can verify that the total number of bound and unbound molecules of a given type is constant. Any constraint which can be formalized in this way may be added just by adding another row to the matrix and another item to the vector. Similarly, equilibrium may be verified for count files using something like: t.add_extra_check(RequireCountEquilibrium("dat/01-volume_highconc/V_out.dat", [500] * 26, # [25] * 26, ([25] * 15) + ([500] * 3) + ([25] * 7) + [500], header=True)) The first argument is the output file path relative to the result directory. The second is the expected equilibrium for each of the columns (again, excluding the time column). The third is the allowable tolerance. If, after finishing the run, the mean value of the column differs from the desired equilibrium by more than the tolerance, it will be counted a failure. In the above case, you can see that 4 of the 26 columns have been temporarily set to a tolerance of '500' to prevent the test from failing on 4 cases which fail due to known MCell issues whose fixes will be somewhat involved. For both this and the previous check type, you may specify min_time and max_time arguments to limit the checks to particular segments of time. For equilibrium, this will restrict the rows over which we are averaging to find the mean value. For more details on the specific test utilities I've provided, see the test utilities reference document. ----------- 3. hooking the test case into the system ----------- The test runner looks at the top-level directory of the test suite for a test_info.py file. test_info.py may define a few different variables which determine what tests are included when you run main.py, and what descriptions are shown when you run './main.py -l'. The first such variable is: subdirs = { "macromols" : "Macromolecule tests", "parser" : "Parser torture tests", "reactions" : "Reaction tests", "regression" : "Regression tests" } This specifies that the test suite runner should look in 4 different subdirectories of the directory where test_info.py was found: macromols, parser, reactions, and regression. It also provides descriptions for each of these subdirectories which are displayed in the '-l' output. Each subdirectory should have a test_info.py file which indicates which subdirectories contain tests. The second such variable is: tests = { "oldvizsuite" : "VIZ output tests for ASCII/RK/DX modes", "vizsuite" : "VIZ output tests for DREAMM V3 modes", "errorsuite" : "Test error handling for invalid MDL files", "quicksuite" : "A few quick running tests which cover most valid MDL options", "kitchensinksuite" : "Kitchen Sink Test: (very nearly) every parser option", "rtcheckpointsuite" : "Basic test of timed checkpoint functionality" } This gives several named test suites, and descriptions to be displayed in the '-l' output. These test suites must be imported into the test_info.py file. The parser tests are defined in test_parser.py, so I included the following import statement at the top of the file: from test_parser import oldvizsuite, vizsuite, errorsuite, quicksuite from test_parser import kitchensinksuite, rtcheckpointsuite Any test suites included in the 'tests' map will be included in the full test suite. It may be desirable in some cases to define test suites which run a subset of the functionality, and this brings us to the third variable in a test_info.py file: collections = { "allvizsuite" : ("VIZ output tests for all modes (old+new)", ["oldvizsuite", "vizsuite"]), "fasttests" : ("All quick running tests (valid+invalid MDL)", ["errorsuite", "quicksuite"]), } This defines two collections of tests. These suites which are collected here are already collected in the tests variable above, but we may want to have other aggregations that we can quickly and easily run. Each entry in the collections map has the syntax: name : (desc, [suite1, suite2, ...]) These collections will NOT be included in the top-level test suite by default, as the individual component tests are already included; the collections would, thus, be redundant. Now, the above 'tests' and 'collections' are from the test_parser subdirectory. The test suite system will create several collections of tests which may be included or excluded from the run: parser # all suites included in the 'tests' from parser/test_info.py parser/oldvizsuite parser/vizsuite parser/errorsuite parser/quicksuite parser/kitchensinksuite parser/rtcheckpointsuite parser/allvizsuite parser/fasttests This means that you do not need to explicitly create a separate test to include all of the suites in a directory. Note that more levels of hierarchy are possible -- parser could define 'subdirs' if we wanted to break the parser tests into several different subdirectories, for instance. But each level of the directory hierarchy must have its own test_info.py file.