DataOps in 3 thoughts: our work is never over
Updated: Dec 10, 2019
This is part two in the "DataOps in 3 thoughts" series. Be sure to check the first part as well.
It is a fundamental truth, sung about in ancient lore. OK, it was Daft Punk, but still: years ago. After reading through the 18 principles of DataOps, we picked up a pattern: doing things the DataOps way is about doing things continuously.
DataOps is a journey, not a destination
First and foremost: make sure your customer is always satisfied. In the perspective of DataKitchen, the ‘deliverables’ for a customer are insightful analytics. You could interpret this in a more broad sense, also including reports (or reporting data), up-to-date integrated canonical data stores or real-time event streams.
This satisfaction is obviously linked to the availability of your solution. The general expectation here: “always be available”. But in a technically complex setup, this can not always be easily guaranteed - and should not be. A good understanding of customer needs on the end of the delivery team, and some insight into the actual complexity by the customer should lead to realistic expectations, to be formalized into SLA’s (Service Level Agreements) so response times can be measured. This kind of metrics allow a team to evaluate performance - both external (how the customer perceives it) and internal (however the team itself perceives it). Using these metrics, changes to procedures, tools and the entire setup can be considered. The team must continuously try to improve its performance.
Once there is an agreement on SLA’s, you can try to translate these into rules for your alerting stack. This will require you to continuously monitor the components in your setup, and for an average data platform that will be multiple servers, clusters and components. A prerequisite for monitoring is increasing the overall observability of your system. A common approach is to invest in log aggregation, metrics collection and distributed tracing. These give you a solid foundation for detecting issues and digging all the way to the root cause. In short, to deliver on your SLA’s, you should always know what is happening inside your system!
The open source technology landscape is evolving at incredible speed, so in order to identify and use the most optimal technology for a certain business case, a delivery team must always stay up to date. We try to do this by keeping tabs on news from the open source community and subscribing ourselves to the newsfeeds of releases on Github for projects we are using. This allows us to keep tabs on all official releases and ensure our customer environments are up to par with the latest and greatest the open source world has to offer.
And last but certainly not least: be aware that customer requirements will always keep evolving. As their insights increase or they gain understanding of the possibilities the technology has to offer, there will be new or updated requirements. It will be hard to put a fixed ‘end date’ to this kind of project, and it will require a long time of aftercare.
Implementing DataOps is about continuously iterating, improving and delivering. This is an important enabler for a short cycle time. The lessons you learn in repeating your processes over and over again will then enable you to improve and standardize your processes, leading to an even shorter cycle time.
If you commit to delivering continuously using DataOps, it will continuously deliver value for you as well.
Stay tuned for part three in the "DataOps in 3 thoughts" blog series, in which we'll highlight our vision on the human side of DataOps.