Tuesday, May 14, 2019

ITIL 4: Nostalgia Trap or Modernized Framework

itil4 gerripoling 
Today many enterprises are stuck in a rut and tormented by ITIL due to governance and compliance requirements. When you talk to the boots on the ground building the organizations' private, hybrid, and multi-cloud solutions, you will often hear frustrations with bureaucracy, slow or failure to launch and a feeling of disconnect with expectations. All the while they press on building a new eco-system - virtualized, containerized, and serverless. That ecosystem is fundamentally and markedly different from how things were when ITIL 3 launched in 2011. 

So, I ask myself why should organizations rely on a framework developed in the 80's and refreshed once a decade? A time well before the industry’s maturation and evolution.
ITIL framework has often been misunderstood as prescriptive and continues to be faulted as failing to meet the demands of modern technology teams. Many an engineer, SysOps or business stakeholder have argued the following in a quest for something better.

  • DevOps and Site Reliability Engineering emphasizes more on service-level management - Stakeholders and customers want immediate gratification. Waterfall software development lifecycles focusing on step-by-step design, testing, and maintenance actions don’t offer the speed, agility, or adaptability in service delivery. Traditional IT shops are at a massive competitive disadvantage without automation and continuous delivery.
  • Service oriented - Technology's focus has shifted from component monitoring to the customer experience. To make this shift successful, it means IT must dedicate more time and resources to application development and maintenance.
  • Complex ecosystem - The tech industry has radically transformed in the last ten years with the introduction of virtualization, micro services, containerization, cloud-native workloads and server-less computing. ITIL wasn't designed to support a shared Service Catalog and ambiguity in ownership within a SaaS or IaaS solution.
  • Shared ownership - Forward-thinking IT organizations have realized that empowering service and line-of-business owners with a management model aimed at maintaining websites, apps, portals, and other customer touch points is the real key to gaining a strategic edge.
  • Fear of change - Change management advisory boards in past iterations of ITIL as blockers to rapid deployments and iterative cycles of software development. Contrary to popular belief, ITIL was not intended to be implemented where IT was responsible for evaluating or validating every component  of a change.

    In this modern IT era we must be agile, efficient, and flexible enough to service planned workloads and cloud native solutions. Solutions built to support continuous change, automated deployments, and immutable components.

    Enter ITIL 4 -  Designed to help organizations build more flexible ITSM strategies, ITIL 4 now includes more LEAN, Agile and DevOps based methodologies. It actually encourages collaboration and communication across organizations and offers guidance for implementing changes rapidly and fluidly. While it is not a "from scratch" rewrite, v4 may become complimentary to those other methodologies and integrate deeper with other service management best practices in the context of newer technologies, such as serverless, containers, micro-services and multi-cloud.  

    Let's revisit this topic in 2, 5 and 10 years to explore how the technical culture has evolved with or without ITIL. Maybe a BOT will respond to the question...

    Monday, March 25, 2019

    CI for Leaders

    Continuous Improvement

    Continuous Improvement sounds a lot like the over used business term "Raising the Bar." One of those coined business terms that can put some people off due to the ambiguity of what it actually means, what makes a "bar-raiser" successful and how do you measure it. In my experience a "bar-raiser" is someone who challenges conventional practices and might even be knowledgeable enough to make things better. One driven to ensure continuous improvement rather than settling for “good enough" and complacency. 

    When I first started working with my last employer two or so years ago, I was fully immersed into a literature focused environment. One that not only facilitated the production of literature but relied on literature for blueprinting a leadership strategy. There was a clear desire to re-create the Google SRE model and a "No-Ops" culture. The modern IT utopian goal. All the cool people were doing it already. The top down decision had to be made. Investments were committed. Agile methodologies and training had been delivered. DevOps titles were assigned. But something was missing...

    DevOps practices give teams accountability for their work, including quality of the product or service they deliver. With DevOps, we no longer need people outside the team to test for quality and to take independent ownership of ensuring quality and performance. The same can be said for security and compliance. However, if quality and security (and accessibility, resilience, and user-friendliness) are not binary, yes-or-no metrics, what drives the team to continuously improve on these fronts? A bar-raiser. 

    Consider this tidbit extracted from a tech leader resource on a similar topic:
    Do we not want continuous improvement in security, rather than “secure enough?” Do we not want continuous improvement in availability and quality, rather than choosing a level which we say is adequate (as in Google’s concept of an error budget)? Yes-or-no metrics made sense when we had gatekeepers or we specified requirements in advance (“the system must achieve an SLA of four nines of availability”). But in our agile, DevOps, continuous improvement approaches, is this what we want? Shouldn’t we constantly be striving to improve—if not the availability of our systems—then at least the availability we are able to get from a given level of spending?
    On the other hand, we want to maximize the amount of work not done. We want to build the simplest architecture that is acceptable and the minimum number of features that are acceptable. We encourage developers to stop work when their automated tests pass (although they do refactor to simplify their code after that point). Google’s error budget should be seen in this light. How can we reconcile the idea of continuous improvement with the idea of maximizing work not done?

    Perhaps the bar-raiser concept is the answer. What if we replace the independent QA function with quality bar-raisers, who are not gatekeepers (“your quality is not good enough”) but rather drivers of continuous improvement (“here is where you have an opportunity to raise the bar on quality and some ideas on how you might do it”)? Similarly for security: yes, you meet the security bar, but let’s look at how you can do even better, maybe without adding any significant amount of work?

    Raising the bar sometimes has a cost, but often it does not. I wonder if the idea of bar raising isn’t just the next step in a digital world where moving forward is both discrete (creating new features) and continuous (improving resilience and security).