Drive has 700+ articles for digital transformation leaders written by StarCIO Digital Trailblazer, Isaac Sacolick. Learn more.

What do three multibillion-dollar companies that have been around for over one hundred years have in common? There might be straightforward answers if they were in the same industry, but what if one is in media, another in financial services, and a third a food service distributor?

IT leaders from Wiley, OneMain Financial, and US Foods presented at the recent BigPanda’s Resolve ‘21 and Pandapalooza event about how they’re modernizing their IT operations with AIOps. I’ve already shared insights from this event, including 3 AIOps secrets that boost quick business impact and seven lessons from IT leaders on operating at digital speeds with AIOps. This post explores how companies that must continually reinvent themselves use data and machine learning to deliver great IT service management experiences.

Enterprises - IT Ops - Data - AIOps - Isaac Sacolick

Keep in mind that information technology wasn’t around when these three companies were founded, and they introduced many of the systems running their businesses over decades. But at the event, their leaders were presenting how they were leveraging machine learning and automations to improve the mean time to recovery (MTTR) from IT incidents and increase the reliability and performance of their systems.

I was most interested in seeing how these leaders used AIOps and leveraged data in IT Operations.

Use DevOps to Improve Data Quality

Didier Le Tien, VP of Application Development at US Foods, explained how having clean operational data was critical to support production applications. He states, “Changing your process through tools gives you an opportunity to collect the better quality data needed to prove or disprove you are on the right track. It’s one of the key elements to be more data-driven. This data has allowed us to think outside of the box when it comes to our operations, for example, having the visibility to identify production issues faster, use data to improve troubleshooting, and then address potential bugs. Because you have the data, concepts like AIOps became a reality for us.”

I love these comments because they illustrate

  • The importance of creating and cleansing data when instituting new processes and tools
  • How having cleansed operational data helps teams think outside of the box
  • Their targeted improvement metrics using AIOps and open-box machine learning capabilities

Reduce Alert Fatigue – Automation and Machine Learning

Sam Chatman, VP of IT Ops at OneMain Financial, explains the impact of levering AIOps is, “Being able to understand what is released, when it’s released, and the potential impacts of that release. We are overcoming alert fatigue, and BigPanda will be our Watson of the Enterprise Monitoring Center (EMC) by automating alerts, opening incident tickets, and identifying those actions to improve our mean time to recovery. This helps us keep our systems up when our users and customers need them to be.”

For other organizations, it might help to visualize what naturally happens to IT operations’ monitoring programs over time. Every time systems go down and IT gets thrown under the bus for a major incident, they add new monitoring systems and alerts to improve their response times. As new multicloud, database, and microservice technologies emerged, they add even more monitoring tools and increased observability capabilities.

Having more operational data and alerts is a good first step, but then alert fatigue kicks in when tier-one support teams respond and must make sense over dozens to thousands of alerts. OneMain has broken that cycle by establishing an EMC, investing in AIOps, focusing on customer experience, and addressing alert fatigue.

OneMain Financial’s EMC is relatively new, and they’ve already made significant business impacts. Sam shares one best practice – that overcoming alert fatigue not only requires better data, it also requires tools for automating aspects of the response. The automation improves communications and frees up time so that IT operations can focus on troubleshooting and restoring service. As Sam points out, the shift from tasks to problem-solving helps change everyone’s focus on improving customer and end-user experience.

Enable Actionable Insights – Improve Signal to Noise Ratios

If automation is part of how IT Operations improve recovery times, then reducing noisy alerts to a correlated and manageable number of incidents is another best practice. Kiran Venkatesan, Architect at Wiley, shares a core practice in improving the signal to noise ratio in the data used by IT Ops for incident management.

Kiran says, “If there is a lot of noise, then there is no benefit. We have started measuring compression rates in how much noise is generated by event monitoring tools. How many alerts are duplicated, can be aggregated, or are correlated? How much of an actionable incident is produced based on all of the enrichment that is going in within the context of the particular business service?”

So improving IT operations needs more than cleansed and correlated data, as it must lead to actionable, accurate, and at least partially automated responses. One important step is to map incidents to the impacted business services, define service level objectives, and improve communications.

Better Data Enables Automatic Incident Triage

The next step in the journey goes beyond reducing alert noise, correlating monitoring data, and enabling response automations. In the middle of the incident management process are bridge calls, war rooms, and other group efforts between subject matter experts. Their goal is to work collaboratively with all the available data and aim to troubleshoot issues, identify root causes, and prescribe courses of action.

Even as the operational data quality improves, the triage process can be the longest, most painful step in the incident pipeline.

BigPanda customers talked about ways their IT operations take advantage of automatic incident triage. Context is automatically added to each incident, including identifying the impacted business services, the teams who must stay informed, and the type of issues that need addressing. With this context added to the incident, first-level teams can then route the incident to the appropriate support teams. The approach should eliminate the “all hands on deck” concepts prevalent in IT Ops teams that haven’t invested in AIOps. Helping IT operations triage incidents is very promising for IT leaders looking beyond improving MTTR. Proactive leaders also aim to reduce the number of monthly incidents and enhance IT support personnel’s work-life balance.

When you see that hundred-year-old enterprises recognize the importance of high system reliability and enable IT operations with AIOps tools to improve service levels, you sense how important both customer and employee experiences are to these companies. When you listen to their leaders, then you get the sense that many IT organizations have much to gain by improving IT operational data and investing in AIOps.

This post is brought to you by BigPanda

The views and opinions expressed herein are those of the author and do not necessarily represent the views and opinions of BigPanda.

Published on:

Leave a Reply


StarCIO

My company, StarCIO, provides leadership, learning, and advisory programs for companies looking to accelerate delivering business value from digital transformation. Contact me if you’d like to learn more about partnering opportunities.


Isaac Sacolick

Join us for a future session of Coffee with Digital Trailblazers, where we discuss topics for aspiring transformation leaders. If you enjoy my thought leadership, please sign up for the Driving Digital Newsletter and read all about my transformation stories in Digital Trailblazer.


Coffee with Digital Trailblazers hosted by Isaac Sacolick

Digital Trailblazers! Join us Fridays at 11am ET for a live audio discussion on digital transformation topics:  innovation, product management, agile, DevOps, data governance, and more!


Join the Community of StarCIO Digital Trailblazers

About Drive

Drive Agility, Innovation, Transformation

Drive is the blog for digital transformation leaders brought to you by StarCIO and Isaac Sacolick.

Agility, Innovation, and Transformation are the three primary digital transformation core competencies that every StarCIO Digital Trailblazer must champion in their organizations. Learn more About Drive.


About the StarCIO Digital Trailblazer Community

StarCIO Digital Trailblazer Community

Revolutionizing traditional learning, networking, and advising experiences.

Visit the community


About StarCIO

StarCIO

About Isaac Sacolick

Isaac Sacolick

Author, 1,000+ articles, keynote speaker, Chief StarCIO Digital Trailblazer. Full bio


Driving Digital Newsletter

Driving Digital Newsletter

StarCIO Guides

StarCIO Agile Planning Guides

Digital Trailblazer

Digital Trailblazer by Isaac Sacolick

Driving Digital

Driving Digital by Isaac Sacolick

Driving Digital Standup

Driving Digital Standup

Coffee with Digital Trailblazers

StarCIO Coffee With Digital Trailblazers

Recognition

InfoWorld 2025 Judge
InfoWorld Technology of the Year 2024 Judge
Thinkers360 Top 10 in IT Leadership
Thinkers360 Top Agile Thought Leader
Thinkers360 Top DevOps Leader
Thinkers360 Top in Digital Transfomation
Thinkers360 Top in Analytics
Thinkers360 Top in Product Management

Discover more from StarCIO Digital Trailblazer Community

Subscribe now to keep reading and get access to the full archive.

Continue reading