Automatic Generation of Cucumber from Code

(All of the code mentioned here exists, and we’re using it. Our actual codebase is all in Perl – I’ve written out examples here in Javascript for clarity and so there’s no copyright issues. The actual implementation code for all this is pretty simple and not very clever, so I’m not planing to jump through the hoops needed to actually release it unless there is some massive unexpected demand…)

Introduction, in which we discover Cucumber

Let’s start with Gherkin. Gherkin is basically a constrained application of English meant for specifying test cases. For example, for a calculator you might write:

You then set up a number of step parsers that match the steps (called step definitions), and execute code based on it. eg:

Gherkin is supported by a suite called Cucumber, and people tend to use the word Cucumber to describe the whole thing.

As you might be able to imagine, Cucumber gives Agile Consultants and gullible Business Analysts a vision of paradise. Look! Non-programmers can code! This solves all of our problems! We could write the whole test suite like this, and then it’s also documentation!

This basically doesn’t work in the real world. There are a range of nuanced reasons why it doesn’t work, starting with something called Step Explosion, stopping off along the way at the fiddliness of testing exceptions, and featuring the fact that you really really really don’t want non-programmers doing programming. This didn’t stop me writing a Perl implementation, however, because it’s fun, and why else would you program?

So in summary: Cucumber exists, it’s All The Rage amongst Agile Types, but it’s a curiosity rather than a testing panacea. It’s an interesting way of organizing a small number of test cases, but anyone suggesting its full embrace as a replacement for any other testing tool should be taken outside and … well, at the least, left outside.

The Plot Thickens, in which we set the scene

One of my clients at the moment has several large warehouses in several countries. Every day, they ship a huge number of items under several different brands to customers all around the world.

The business is obsessed with customer service. Fanatical about it. And so every item has to be shipped from the warehouse just right. The right number of bows and ribbons, packaging that cuts no corners, and a rigorous adherence to good taste that’s pervasive down to storage containers in the warehouse being in company colours. That’s why the customers keep coming back.

As a customer, amongst other things, you can ask for a Gift Message to be included in your order, and our hand-crafted warehousing software – with a pre-millenium pedigree – has to make some decisions about that. Circa 2001, the decisions were pretty simple:

But then between 2001 and 2011, things started to get a little more complicated. One of the brands put their foot down and insisted that their Gift Messages needed to be printed on cream-coloured paper, not ivory-coloured. And the physical layout of a new warehouse necessitated that we print out the Gift Messages when we pick orders, rather than when we pack them, like we do in the other warehouses.

And it turns out not all customers are entirely happy to use low-order ASCII for their messages, and that some languages require the message to be type-set by hand. And if the customer spends enough money with us a year, the Gift Message needs to be type-set by hand, and lovingly sprinkled with lavender water, before being signed using one of the pens originally used to sign The Constitution…

If you think the organic implementation of such business rules by a very dedicated but also very, very busy development team over the course of ten years might lead to software with the occasional rough edge, you’d be on the right track.

Not only does this level of complexity make it hard for programmers to extend the code, it also makes it hard for Testers to acceptance test, regression test, or understand it. And it also makes it very difficult for a Business Analyst to learn, document, and strategise how to extend and improve it.

Our Hero Arrives! in which we learn about Contracts

In order to help with some of the issues above, I’ve introduced a Business Rules library. If you’re at decision point in the code which requires several pieces of information, and can have several possible outputs, you pull the code out, give it a name, and assert the Contract.

The Contract is an idea from Programming by Contract. The Contract your business rule has with the rest of the code asserts that the calling code must specify all inputs at call-time, that these inputs must conform to several custom type constraints, and in return, you will get a response conforming to a specific type constraint, and the business rule will operate statelessly. That is: it will not look outside of its inputs for information, and it will not change any persisting values.

Here is an example:

The constraints we’re working with here have some interesting implications. As we guarantee that there’s no access to values outside of the scope, except the values we pass in, we know that given the same inputs when called, we will get the same output.

Also, as we’re defining custom type constraints, where those types are enumerable (ie: they’re either enumerations or Boolean), then we can actually predict in advance what all the inputs could be.

And the implication of those taken together is that we can execute our rule with all possible inputs it’s allowed to have – the Cartesian product of its enumerable inputs. And that means we can build – automatically – a truth table for our code:

warehouse product.measurable result
WH1        false               false
WH1        true                false
WH2        false               false
WH2        true                true

That’s pretty cool, and one of the central reasons it’s pretty cool is that you can pass this truth table to your Tester or Business Analyst to check. Or heck, they can even define the truth table for you from the User Story, and you turn it in to an automated test.

And that’s where this goes from being quite cool to pretty interesting…

The Plot Thickens (again?)

We can programatically simplify the truth table above to its implications, by iterating over the inputs, and finding the simplest set of inputs that always lead to the same values. The above table can be simplified (by a computer) in to the following implications:

warehouse is ‘WH1’ => result is false
product_measurable is false => result is false
warehouse is ‘WH2’ && product.measurable is true => result is true

Astute readers will notice that an implication looks a great deal like a Cucumber scenario. After all, a Cucumber scenario simply states a series of preconditions, and the result that they imply.

If as part of your type constraints, you added in a few extra fields…

Then you could automatically generate Cucumber scenarios and the /step definitions/ needed to parse them:

These are Cucumber scenarios generated from your existing code. Next time you add a new warehouse, or a new type of packaging, or any other complicated decision, you can hand your Business Analyst an auto-generated Cucumber script that describes the current decision logic, and ask them to fix it up, and send it back … and it already runs (but fails). You just need to update the code. That’s both Business-Driven Development and Test-Driven Development…

The End!

Some Practical Considerations…

So we already got to the end of this article. Here are some considerations for developers thinking of trying to implement this themselves… There’s no real structure or conclusion here, just a brain dump…

Enumerating free-form strings

You can’t enumerate all variations if one of your incoming types is a string (rather than an enum of strings). Some of our inputs are strings. Philosophically, you shouldn’t be making any decisions based on a string – if you know what your string might look like, and are taking a decision on it, you should pass it in as an enumerated type. If you’re doing some kind of smart matching on it, your business rule will be simplified by doing that first, and passing in the result of that.

This leaves us with the case where a string provided in the input is used as part of the output. For that, I pass in a canary[link] string which embeds the name of the column that string is passed in as, and then replaces it with a marker in the output. For example, a free-form printer name that’s only used sometimes (but could be anything) is passed in as `str[“printer_name”]`, and then removed from the output, giving an output like, say: “floor3_[printer_name]”. This has worked well so far.

If you really had a string you had to pass in, was used in decisions, and wasn’t going to get passed out again, I’d consider embedding a list of testing strings in the type definition. Not perfect, but probably good enough. I’m yet to have a situation where an input is a naked integer, but that’s probably going to be my solution when I do…

Reducing a truth table to implications

The algorithm that reduces a truth table to its implications – there’s probably a right way to do this, but I just made one up:

For every subset of the input columns, ordered by number of columns used
Generate all Cartesian products of those
    How many different answers do those inputs give?
      If the answer is one, you have an implication
      If this catches rows that haven’t been `seen` yet
        Mark those rows as `seen`
        Save the implication

5 comments

itshouldbeuseful says:

January 5, 2012 at 3:49 am

Cucumber's main strength is its flexibility. Ironically this is also its main weakness.

Saying it does not work in the real world is a little like saying Java is now a redundant programming language. Both statements are true and false, depending on the context.

You seem to conveniently forget that your Cucumber suite is PART OF THE CODEBASE, and as such requires ongoing maintenance and refactoring, just like the rest of your codebase. If you don't maintain your Cucumber test suite with the rest of your codebase, you will build up technical debt that must be repaid. It's at this point that most people blame Cucumber instead of their own team/practices/processes and lack of Cucumber experience. Writing bad Java/Ruby/C# code doesn't make Java/Ruby/C# a bad language, it just means you're a bad programmer.

As for generating scenarios, I think this entirely misses the point of test/behaviour DRIVEN development.

Interesting post though, I just don't agree.
Alex Young says:

February 29, 2012 at 10:04 am

I think itshouldbeuseful has possibly missed the distinction between what a tool is *designed* for and what a tool *can be used* for.
Gtester says:

May 28, 2012 at 9:39 pm

The problem as I see it with using Gherkin/Cucumber is that you need to add in another layer of abstraction that gives you a higher level – forest level – view. At about 900 Gherkin Tests it takes on a life of its own and requires extra people to just manage it. This resolves a problem I see looking on the horizon in a graceful fashion. It’s a Perl thing.
Himanshu Aggarwal says:

September 24, 2016 at 1:06 am

We are facing the same problem that you solved many year back today. We have a code based on FSM and we have a cucumber test – pack with 100s of test cases to test every possible path of that machine based in given set of inputs. I’m trying to test a FSM using a FSM, but the one that would generate Cucumber test cases so that our business users stay happy as well as to keep a tight grip over test case.
Did you make any further advances on this approach?
1. Peter says:
  
  September 27, 2016 at 9:13 am
  
  I think the key insight here is finding minimal input sets that lead to any given output… Perhaps something like FIT would be a better option for what you’re trying to achieve? https://en.wikipedia.org/wiki/Framework_for_integrated_test