Devin McGloin

Often times testing is viewed as a guaranteed way to ensure your function behaves as you expect it to. However this is far from what is actually happening. A excellent example of this occurred a few years back and I want to take a look at the problem and see what standard testing practices would get you and the best way to approach the problem.

The Problem

During this time I was working for a web scraping company, and my role was to provide some means of anomaly detection and cluster data points into common records. The clustering problem is the question of interest here.

It's a relatively straightforward problem, given fields a, b, and c take the average price for records with the same values of a, b and c. The simplest way to do this is to hash on those values and cluster based on the shared hash value. Here is a simple but flawed approach to creating a perfect hash.

class Record:
    def __init__(self, a,b,c):
        self.a = a
        self.b = b
        self.c = c

    def __hash__(self):
    return int("{}{}{}".format(self.a, self.b, self.c))

We can write some unit tests for this simple hash as well.

def test_hash():
    assert hash(Record(1,2,3)) == hash(Record(1,2,3))
    assert hash(Record(23,45,6)) != hash(Record(45,23,3))
    assert hash(Record(11,11,2)) != hash(Record(11,11,3))

We can keep on creating more test cases, but before we do lets shed some light on a fatal flaw for our simple hash function. All of the following assertions are true with the simple hash.

assert hash(Record(11,21,3)) == hash(Record(11,2,13))

The question is how could we have caught that bug before shipping it into production?

Unit Testing

Consider the case in which a programmer wrote unit tests and had an insight to test for this degenerate case. This is the best outcome, but is not a deterministic one, the programmer could very easily not have had that insight. Our goal is to catch these bugs without relying on an insight from a programmer. However once this insight is had, it's a clear thing to test for and ought to be. It's clear unit testing doesn't help as much as we would like. It is a good exercise for a programmer to think about edge cases, but is not a silver bullet.

Property Based Testing

Consider a programmer who realizes that our unit tests only cover a small section of the input space, it would be good to assert behavior across a broader set of that space. We can write the following:

@given(
    a=integers(max_value=100, min_value=1),
    b=integers(max_value=100, min_value=1),
    c=integers(max_value=100, min_value=1),
    d=integers(max_value=100, min_value=1),
    e=integers(max_value=100, min_value=1),
    f=integers(max_value=100, min_value=1),
)
def test_hash(a, b, c, d, e, f):
    first_record = Record(a, b, c)
    second_record = Record(d, e, f)
    true_equal = a == d and b == e and c == f
    hash_equal = hash(first_record) == hash(second_record)

    if true_equal:
        assert hash_equal
    else:
        assert not hash_equal

    if hash_equal:
        assert true_equal

In order for this bug to be caught with property based testing a number of preconditions must be met.

Given input a, b, c, one of these inputs have to be greater than 10. But more importantly for any two inputs, we have to randomly select 2 specific numbers which are set to either d, e, f. The probability of randomly stumbling across this bug with a property based tester is around one in a million. When we add in one of those inputs needs to be larger than 10 we're left with 1 in 10 million. Needless to say those aren't very good odds.

Where does this leave us

So what then is the best way to approach such problems? It's clear that random property based testing is not effective, and the same is true of standard unit testing. Is this a class of problems that are especially difficult to test for?

The class of issues we're trying to hone in on are unknown unknowns, that is we don't know that we don't know them. If we know that they're a problem then we can clearly write unit tests that test for that issue specifically.

Hypothetically we could enumerate the entire domain of our simple hash function, but in this case we'll be looking at O(100^6) or O(1 Bil) operations. This is also for a relatively small space. So this approach is generally untenable.

Besides hoping something clicks in the mind of the programmer, I'm not sure what else one could do to prevent these sorts of issues. If you have a good approach please send me an email with it and I'll amend my post here with updated information.

Of course the simple fix for our problem is as follows:

class Record:
    """Given bounds on a <= 99999, b <= 99, and c <= 99"""
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

    def __hash__(self):
        return int("{:5d}{:2d}{:2d}".format(self.a, self.b, self.c))

What Testing Gets You

The Problem

Unit Testing

Property Based Testing

Where does this leave us