'Gtk+ unit tests (brainstorming)'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gtk-devel
Subject:    Gtk+ unit tests (brainstorming)
From:       Tim Janik <timj () imendio ! com>
Date:       2006-10-25 15:52:24
Message-ID: Pine.LNX.4.62.0610251725490.8500 () master ! birnet ! private
[Download RAW message or body]

Hi all.

as mentioned in another email already, i've recently worked on improving
unit test integration in Beast and summarized this in my last blog entry:
   http://blogs.gnome.org/view/timj/2006/10/23/0 # Beast and unit testing

while analysing the need for a testing framework and whether it makes sense
for GLib and Gtk+ to depend on yet another package for the sole purpose of
testing, i made/had the following observations/thoughts:

- Unit tests should run fast - a test taking 1/10th of a second is a slow
   unit test, i've mentioned this in my blog entry already.

- the important aspect about a unit test is the testing it does, not the
   testing framework matter. as such, a testing framework doesn't need to
   be big, here is one that is implemented in a whole 4 lines of C source,
   it gets this point across very well: ;)
     http://www.jera.com/techinfo/jtns/jtn002.html

- in the common case, test results should be reduced to a single boolean:
     "all tests passed" vs. "at least one test failed"
   many test frameworks provide means to count and report failing tests
   (even automake's standard check:-rule), there's little to no merit to
   this functionality though.
   having/letting more than one test fail and to continue work in an
   unrelated area rapidly leads to confusion about which tests are
   supposed to work and which aren't, especially in multi-contributor setups.
   figuring whether the right test passed, suddenly requires scanning of
   the test logs and remembering the last count of tests that may validly
   fail. this defeats the purpose using a single quick make check run to
   be confident that one's changes didn't introduce breakage.
   as a result, the whole test harness should always either succeed or
   be immediately fixed.

- for reasons also mentioned in the afformentioned blog entry it might
   be a good idea for Gtk+ as well to split up tests into things that
   can quickly be checked, thoroughly be checked but take long, and into
   performance/benchmark tests.
   these can be executed by make targets check, slowcheck and perf
   respectively.

- for tests that check abort()-like behvaior, it can make sense to fork-off
   a test program and check whether it fails in the correct place.
   allthough this type of checks are the minority, the basic
   fork-functionality shouldn't be reimplemented all over again and warrants
   a test utility function.

- for time bound tasks it can also make sense to fork a test and after
   a certain timeout, abort and fail the test.

- some test suites offer formal setup mechnisms for test "sessions".
   i fail to see the necessity for this. main() { } provides useful test
   grouping just as well, this idea is applied in an example below.

- multiple tests may need to support the same set of command line arguments
   e.g. --test-slow or --test-perf as outlined in the blog entry.
   it makes sense to combine this logic in a common test utility function,
   usually pretty small.

- homogeneous or consistent test output might be desirable in some contexts.
   so far, i've made the experience that for simple make check runs, the most
   important things are that it's fast enough for people to run frequently
   and that it succeeds.
   if somewhat slowly perceived parts are hard to avoid, a progress indicator
   can help a lot to overcome the required waiting time. so, here the exact
   oputput isn't too important as long as some progress is displayed.
   for performance measurements it makes sense to use somewhat canonical
   output formats though (ideally machine parsable) and it can simplify the
   test implementations if performance results may be intermixed with existing
   test outputs (such as progress indicators).
   i've mentioned this in my blog entry as well, it boils down to using a
   small set of utility funcitons to format machine-detectable performance
   test result output.

- GLib based test programs should never produce a "CRITICAL **:" or
   "WARNING **:" message and succeed. the reasoning here is that CRITICALs
   and WARNINGs are indicators for an invalid program or library state,
   anything can follow from this.
   since tests are in place to verify correct implementation/operation, an
   invalid program state should never be reached. as a consequence, all tests
   should upon initialization make CRITICALs and WARNINGs fatal (as if
   --g-fatal-warnings was given).

- test programs should be good glib citizens by definineg G_LOG_DOMAIN, so
   WARNING, CRITICAL, and ERROR printouts can correctly indicate the failing
   component. since multiple test programs usually go into the same directory,
   something like DEFS += -DG_LOG_DOMAIN='"$(basename $(@F))"' (for GNU make)
   or DEFS += -DG_LOG_DOMAIN='"$@"' (for portable makefiles) needs to be used.

as far as a "testing framework" is needed for GLib/Gtk+, i think it would
be sufficient to have a pair of common testutils.[hc] files that provide:

1- an initialization function that calls gtk_init() and preparses
    arguments relevant for test programs. this should also make all WARNINGs
    and CRITICALs fatal.

2- a function to register all widget types provided by Gtk+, (useful for
    automated testing).

3- a function to fork off a test and assert it fails in the expected place
    (around a certain statement).

4- it may be helpful to have a fork-off and timeout helper function as well.

5- simple helper macros to indicate test start/progress/assertions/end.
    (we've at least found these useful to have in Beast.)

6- output formatting functions to consistently present performance measurements
    in a machine parsable manner.

if i'm not mistaken, test frameworks like Check would only help us out with
3, 4 and to some extend 5. i don't think this warrants a new package
dependency, especially since 5 might be highly customized and 3 or 4 could be
useful to provide generally in GLib.

here is an example to be more concrete on what i think Gtk+ tests could look
like, i.e. it shows what we have in beast right now:

==========tests/Makefile.am===================================================
DEFS            += -DG_LOG_DOMAIN='"$(basename $(@F))"'
TESTS           += threads    # "threads" is started by make check
PERFTESTS       += threads    # "threads --test-perf" is started by make perf
==========tests/threads.cc====================================================
/* --- sample test function --- */
static void
test_threads (void)
{
   TSTART ("C++OwnedMutex");
   TASSERT (NULL != &Thread::self());
   static OwnedMutex static_omutex;
   static_omutex.lock();
   TASSERT (static_omutex.mine() == true);
   static_omutex.unlock();
   TASSERT (static_omutex.mine() == false);
   TDONE();
}
/* --- an automatic test session setup is constituted by main() --- */
int
main (int   argc,
       char *argv[])
{
   birnet_init_test (&argc, &argv); // does arg parsing etc.
   test_threads();
   test_atomic();
   if (init_settings().test_perf)   // true for --test-perf
     {
       bench_auto_locker_cxx();
       bench_other_stuff();
     }
   return 0;
}
==========stdout of simple test run (brief)===================================
TEST: threads		# printed by birnet_init_test()
C++OwnedMutex: [---]	# each TASSERT produces a '-'
PASS: threads		# printed by make check
==========

also, i've spent some thoughts on the things that would be nice to have under
automatic unit tests Gtk+:

- for a specific widget type, test input/output conditions of all API
   functions (only for valid use cases though)
- similarly, test all input/output conditions of the Gdk API
- try setting & getting all widget properties on all widgets over the full
   value ranges (sparsely covered by means of random numbers for instance)
- try setting & getting all container child properties analogously
- check layout algorithms by layouting a child widget that does nothing but
   checking the coordinates it's layed out at. i've played around with such
   a test item in Rapicorn. as food for thought, here's a list of the
   properties it currently supports (assertions are carried out upon exposure):
     MakeProperty (TestItem, epsilon,       "Epsilon",       "Epsilon within which \
                assertions must hold",  DFLTEPS,   0,         +MAXFLOAT, 0.01, "rw"),
     MakeProperty (TestItem, assert_left,   "Assert-Left",   "Assert positioning of \
                the left item edge",   -INFINITY, -INFINITY, +MAXFLOAT, 3, "rw"),
     MakeProperty (TestItem, assert_right,  "Assert-Right",  "Assert positioning of \
                the right item edge",  -INFINITY, -INFINITY, +MAXFLOAT, 3, "rw"),
     MakeProperty (TestItem, assert_bottom, "Assert-Bottom", "Assert positioning of \
                the bottom item edge", -INFINITY, -INFINITY, +MAXFLOAT, 3, "rw"),
     MakeProperty (TestItem, assert_top,    "Assert-Top",    "Assert positioning of \
                the top item edge",    -INFINITY, -INFINITY, +MAXFLOAT, 3, "rw"),
     MakeProperty (TestItem, assert_width,  "Assert-Width",  "Assert amount of the \
                item width",            -INFINITY, -INFINITY, +MAXFLOAT, 3, "rw"),
     MakeProperty (TestItem, assert_height, "Assert-Height", "Assert amount of the \
                item height",           -INFINITY, -INFINITY, +MAXFLOAT, 3, "rw"),
     MakeProperty (TestItem, fatal_asserts, "Fatal-Asserts", "Handle assertion \
                failures as fatal errors",  false, "rw"),
- create all widgets with mnemonic constructors and check that their
   activation works.
- generically query all key bindings of stock Gtk+ widgets, and activate them,
   checking that no warnings/criticals are generated.
- create a test rcfile covering all rcfile mechanisms that's parsed and who's
   values are asserted in the resulting GtkStyles.
- for all widget types, create and destroy them in a loop to:
   a) measure basic object setup performance
   b) catch obvious leaks
   (these would be slowcheck/perf tests)

as always, feedback is appreciated, especially objections/concerns
regarding the ideas outlined ;)

---
ciaoTJ
_______________________________________________
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list

[prev in list] [next in list] [prev in thread] [next in thread]