Welcome!

Web Performance is a Journey, Not a Destination

Mehdi Daoudi

Subscribe to Mehdi Daoudi: eMailAlertsEmail Alerts
Get Mehdi Daoudi via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Blog Feed Post

Performance Monitoring with an Incident Response Orchestration System

This guest post was written by Karine Margaryan of OpsGenie.

In today’s constantly growing and highly competitive ecommerce market, monitoring website performance, health, and availability is imperative. Every millisecond counts when you run an ecommerce, ebanking, eticketing, or similar website — and user experience principles show that timing is one of the most important aspects in avoiding frustration for your website visitors and customers. Still, current marketing efforts make websites heavier: loading a variety of page elements, such as audios, videos, high-resolution images, A/B testing, long-step transactions, and other activities that can result in slow-performing websites, revenue loss, and harm to an organization’s brand image and general business.

No one is secure from system outages or website downtimes. System, network, application, and other failures can affect your digital experience — without you even knowing about them. Monitoring your systems helps you avoid such failures by catching them before there has been any damage. Nowadays, most organizations try to implement effective performance monitoring that will trigger actionable alerts when something goes wrong. When you look for alerting systems, the main differentiators in the market are the use of dynamic thresholds and the ability to integrate with other industry leader tools (such as Atlassian HipChat or Slack).

Another important point to consider is alert fatigue. You want to eliminate false positives and make sure that the right people are involved in the issue resolution process. You still look for the use of flexibly-defined dynamic thresholds when you are considering monitoring or alerting systems because systems with flat, static thresholds produce more false positives and may send notifications at periods that don’t necessarily relate to incidents as they are occurring. You won’t miss any critical issues with alerting systems which provide dynamic thresholds.

To effectively accelerate the issue resolution process, you can even set up productive cross-team collaborations, by choosing to concurrently notify only the people in charge of the different departments addressing a type of problem.

With incident response orchestration and management services, you can improve your incident resolution process by reducing the mean time to repair (MTTR). What is your response time for critical alerts vs. others? How often do you escalate issues? You can study and answer such questions in case your system tracks and reports data from every step in your issue resolution process.

OpsGenie, an Incident Response Orchestration platform,  has API-level integrations with many monitoring tools, including Catchpoint, allowing Catchpoint to automatically create alerts in OpsGenie, and route the alerts to the right people at the right moment — based on the severity level of the alert, on-call schedules, escalation policies, available communication channels, and other configurable routing rules.

Key takeaways:

  • Alerting tools must integrate with your IT monitoring tools, and OpsGenie enhances your monitoring tools by ensuring actionable alerts, helping you analyze data to avoid unnecessary future escalations, and reducing your MTTR.
  • Monitoring tools usually distribute alerts without classifying them based on severity levels such as warnings or critical alerts. With alerting tools, you can prioritize notifications based on the severity level of an alert, so responders receive only the important ones and are not distracted by notifications that can wait.
  • Monitoring tools usually have limited communication methods; they usually send only email messages or SMS/push notifications in case of outages. Alerting tools support such communication channels, but they can also notify alert recipients by phone calls, mobile applications, and chat/collaboration platform integrations (such as Atlassian HipChat or Slack) to let you instantly reach out the people who need to be informed.
  • OpsGenie supports Call Bridge functionality which eases communication especially your teams are working remotely — providing unified visibility into incidents and facilitating real-time, cross-team collaboration.

 

To learn more about the synergy between monitoring and alerting tools, watch our webinar: How Overstock Leverages Catchpoint and OpsGenie.

The post Performance Monitoring with an Incident Response Orchestration System appeared first on Catchpoint's Blog - Web Performance Monitoring.

Read the original blog entry...

More Stories By Mehdi Daoudi

Catchpoint radically transforms the way businesses manage, monitor, and test the performance of online applications. Truly understand and improve user experience with clear visibility into complex, distributed online systems.

Founded in 2008 by four DoubleClick / Google executives with a passion for speed, reliability and overall better online experiences, Catchpoint has now become the most innovative provider of web performance testing and monitoring solutions. We are a team with expertise in designing, building, operating, scaling and monitoring highly transactional Internet services used by thousands of companies and impacting the experience of millions of users. Catchpoint is funded by top-tier venture capital firm, Battery Ventures, which has invested in category leaders such as Akamai, Omniture (Adobe Systems), Optimizely, Tealium, BazaarVoice, Marketo and many more.