Web Performance is a Journey, Not a Destination

Mehdi Daoudi

Subscribe to Mehdi Daoudi: eMailAlertsEmail Alerts
Get Mehdi Daoudi via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Related Topics: Time Series Journal

Blog Feed Post

Combining Catchpoint with OpenTSDB for Global Visibility and Alerting

The article below was originally written and published by NS1, a Catchpoint customer and integration partner.

At NS1, we’re big fans of Catchpoint, the performance monitoring company who’s also one of our awesome customers. On the one hand, we deliver global DNS and traffic management for Catchpoint’s products. And on the other hand, we lean heavily on their globally distributed monitoring nodes to understand the behavior of our own systems, especially when it comes to tuning our hyper-optimized anycasted Managed DNS network. The visibility we achieve using Catchpoint’s nodes into how networks around the world are reaching our anycasted IP space helps us pinpoint localized BGP issues, identify problem carriers or providers, and plan performance improvements – constantly, as the data flows in.

Catchpoint telemetry for one of our systems, aggregated by continent in Grafana

And “constantly” is a good description for how we like to consume our data. NS1’s entire platform is built on the idea that data drives better decisions in real time. That notion goes beyond the capabilities we offer our customers to drive DNS routing, and into how we operate our own platform. Every service, server, subsystem, tool – every component of NS1’s infrastructure and operation – is instrumented, measured, monitored, and analyzed constantly, and we’re always thinking about what to instrument and measure next.

To keep an eye on things, we use several tools to collect and gain insight into data about our platform’s performance. We combine external tools like Catchpoint with heaps of internal technology we’ve developed to match the scale and breadth of our infrastructure.

One of the systems we operate that’s critical to our operational mentality is OpenTSDB, a powerful open source time series database deployed atop Hadoop’s HBase. OpenTSDB is a repository for literally millions of NS1’s metrics, from system, server, and network telemetry, to deep DNS traffic analytics. Our stack leverages OpenTSDB as a store for customer-facing metrics exposed through our APIs and UIs, for internal operational dashboards (which we usually build with Grafana), to drive high frequency pattern matching and alerting (often crafted with Bosun), and many other applications.

Alerting in particular is a fascinating area in a system as distributed as NS1’s. Alert too often, and you’re drowned in a deluge of noisy network flaps – in a platform at global scale, we see lots of false positives, and even for real issues, our systems mostly automate around hiccups. But we’re in the mission critical path for our customers and failure is not an option, so we can’t ignore real potential issues. In a system like ours, measuring, monitoring, and alerting on the delivery of our actual service is, of course, most important.

A key path to enabling good alerting on our services, then, is to plug the best data about how effectively we’re delivering DNS globally into the powerful dashboarding and alerting frameworks we’ve built so we can get real-time visibility and notice potential issues instantly. So, we built a tool for that.

It’s nothing big or complicated, but because it might be useful to others, we’ve open sourced our Catchpoint to OpenTSDB bridge– just a simple server that listens for data from Catchpoint’s Push API, and shoves the data into OpenTSDB. Internally, we’ve hooked the raw data into our Grafana dashboarding to get a great global view of performance and reachability issues in our Managed and Dedicated DNS networks, and we’ve also plugged in Bosun to quickly generate alerts on aggregate data by continent, country, and with specific problematic ISPs.

The post Combining Catchpoint with OpenTSDB for Global Visibility and Alerting appeared first on Catchpoint's Blog.

Read the original blog entry...

More Stories By Mehdi Daoudi

Catchpoint radically transforms the way businesses manage, monitor, and test the performance of online applications. Truly understand and improve user experience with clear visibility into complex, distributed online systems.

Founded in 2008 by four DoubleClick / Google executives with a passion for speed, reliability and overall better online experiences, Catchpoint has now become the most innovative provider of web performance testing and monitoring solutions. We are a team with expertise in designing, building, operating, scaling and monitoring highly transactional Internet services used by thousands of companies and impacting the experience of millions of users. Catchpoint is funded by top-tier venture capital firm, Battery Ventures, which has invested in category leaders such as Akamai, Omniture (Adobe Systems), Optimizely, Tealium, BazaarVoice, Marketo and many more.