Electric Cloud recently released version 8.0 of their build and deployment orchestration tool, Electric Flow. This latest release includes many new features, but the focus of this article will be the ability to retry tasks and stages from a failed pipeline and why this is a very useful feature.
If the reader is familiar with continuous delivery best practices, this may sound like heresy. The point of a pipeline is to take an immutable set of artifacts and propagate them through a series of stages in which they are tested for the worthiness of being released to production. At the end, you should be able to say your artifacts passed all the necessary checks, including playing well with production-like configurations. If anything fails, throw it back, make the fix, and start the pipeline again with the goal of all green lights from start to finish for maximum confidence.
Confidence. That’s the point of testing. That’s the point of pipelines.
In Electric Flow 8.0, we now have the ability to mark tasks within a pipeline stage for manual or automatic retry on failure. In the case of manual retries, an approver or assignee can review the error and retry, skip, or fail the task. Automated retry automatically retries at specific time intervals for a specific number of times, after which you can specify whether the pipeline stops or continues. If the task succeeds following the retry, the stage turns from red to green and the process move on.
Sounds pretty heretical – doesn’t it! Doesn’t this undermine the confidence we’re seeking from knowing that what we put in at the front end is sufficiently constructed to make it through all the tests without any meddling on our parts? Doesn’t this sound like a bad practice? The thought of saying, “It got stuck there for a bit, but we had the intern jiggle the handle and now it seems to be moving along fine,” doesn’t scream “professional” to me.
But then there’s reality. Reality says that if your pipeline failed because the drive on the testing machine was full, then it’s neither the fault of the new code, the deployment process, nor the environment configuration, so maybe it’s ok to clean it up and resume testing. Reality says that sometimes we experience network or power outages and so maybe it would be ok to try again once those resources are restored. Reality says that sometimes we fat finger the value stored in a property on the Pre-Prod Environment and if we just make that one character capital instead of lowercase we could proceed just fine.
The fact of the matter is, pipelines can sometimes take quite a bit of time to progress through each stage if they are faithfully executed. If we have a trivial failure during a late stage, it is costly in both time and resources to rerun a pipeline when no change has been made to either the code being evaluated or the deployment process. Furthermore, if we are serious about maintaining an audit trail that shows how a batch of changes progressed through each environment, simply using the existing feature within Flow that executes only specific stages is insufficient for production use because it doesn’t show a continuous chain from build to production. Only this new retry ability grants us the flexibility to recover from minor errors quickly and efficiently while maintaining our audit path – our confidence.
Sure, you could misuse this feature. You could be a continuous delivery heretic and monkey with your code or process partway through a pipeline, but that’s a really bad idea. Gene Kim would frown at you and Martin Fowler would use you as a cautionary tale during his next talk. Don’t do it.
Make sure to check out our Electric Cloud services page to see how we can help you better automate your development processes.