DataOps in 3 thoughts: they’re called best practices for a reason
This is the final part in the "DataOps in 3 thoughts" series. Be sure to check the first, second and third parts as well.
The DataOps methodology borrows heavily from the schools of Agile, DevOps and Lean Manufacturing. Many of the principles listed are indeed best practices that any software engineer would deem ‘basic’, such as versioning, reproducible results and using different environments.
Many data scientists and other analytics professionals are aware of this and try to apply these principles already to their work. But it is when you combine it with some of the newer movements like ‘infrastructure-as-code' and 'configuration-as-code’ that you can reach new levels of automation, reusability and overall performance. Technologies like MLFlow, KubeFlow etc. are all living proof that this wish for a more mature, stable way of getting data science into production is top of mind for many organisations.
In order to deliver products in a comfortable, certain way you should invest appropriate time in testing, preferably automated. Having automated smoke tests and quality checks every time you hit the deploy button gives a warm fuzzy feeling of safety. Careful, you get used to it.
This is of course closely related to the whole monitoring setup, and we often run tests repeatedly throughout the day, exposing them as metrics and showing up in our operational dashboards. Once again: it’s all about understanding the needs of the customer, the golden signals.
Following industry best practices, we’ve ended up writing almost everything as code. This allows us to version everything, test-drive new releases in customer acceptance environments and automate about everything we can, from deployment to monitoring and alerting.
Leaning on the best-of-breed open source technologies allows us to deliver a state-of-the-art data platform to our customers, tailored to each and every one of them. We constantly monitor them and keep them up to date. With a workforce smaller than 5 engineers.
I guess they’re called best practices for a reason.
DataOps is more than just DevOps applied to data (science). It is a set of principles, aiming to help your team achieve a major increase in performance. But getting there will take a lot of effort.
It will require time - the time you need to invest in building up skills, learning and developing your own best practices, making sure you keep checking all the boxes we defined above.
It will require the right people - people with the correct mindset, open for improvement, ready to grow. People who can cope with pressure and never-ending projects.
And finally, it will require a fair investment - investing in creating reusable assets and components and automation.
We believe firmly in those principles, and we believe in bringing them to our customers without having them doing all the required investments. That’s why we started building kuori. We hope you've enjoyed our series on DataOps. Share your thoughts in the comments below!