Improving the process of debugging JavaScript errors in Production for better end user experience and happier developers

Video size:

Abstract

JavaScript can be very difficult to debug, especially errors impacting end users in Production.

The way that JavaScript applications are developed and debugged pre-production often means that good processes are not put in place to monitor errors in production.

Most teams only learn about errors when customers call in with a problem.

Reducing the time taken from the time and user first experiences and error to the point where a developer understands the root cause of the error can transform the experience of your end users, and give developers time back to work on new feature development

Learn how to improve how quickly you can fix errors in Production

Summary

Jamaica real time feedback into the behavior of your distributed systems. Errors in real time allows you to not only experiment with confidence, but respond instantly to get things working again. Today's session is titled improving the process of debugging JavaScript errors in production for better end user experience and happier developers.
Finbar Fleming is the lead customer engineer at Rollbar. Rollbar is a platform to group, respond to and gain insights from errors. In real time applications send data to rollbar. That data is analyzed and the errors are given an identity or a fingerprint.
34% say that losing users is the biggest risk of errors in their software applications. 32% are spending up to 10 hours a week fixing bugs. 16% spend up to 15 hours per week fixing Bugs. Any improvement that you can make in how they work with existing errors and bugs in existing code dramatically increases their ability to develop new features and new functionality.
The first three steps take a lot of time and are a source of much unhappiness for end users. This really is where error monitoring helps greatly. The immediate benefits of enabling an error monitoring solution in your code base is immediately you get visibility into all errors.
If possible, try to add error monitoring to your application. You can make dramatic improvements quite quickly, and you can very quickly make it a part of your team's culture. Your customers and your colleagues will thank you.
I hope you take the next step in your journey to improve your capabilities around debugging production errors in your Javascript applications. You can try us out directly on rollbar. com or this here is a specific promo code for this conference and for the attendees of this conference.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Jamaica real time feedback into the behavior of your distributed systems and observing changes exceptions. Errors in real time allows you to not only experiment with confidence, but respond instantly to get things working again. Cause so today's session is titled improving the process of debugging JavaScript errors in production for better end user experience and happier developers. My agenda for today is as follows. So first of all, I'm just going to set the stage, introduce myself, talk a little bit about what rollbar is about and what I do for Rollbar. Then I'm going to talk about the impact of production JavaScript errors, and then talk a little bit about some of the particular difficulties that JavaScript developers have, maybe that other developers of other languages don't have. And then I'm going to talk about real time error monitoring, which is part of the solution that rollbar offers and how that can help you to dramatically improve your error response process. My goal today is that first of all, you learn about new error monitoring technology that's available today that you might not have been aware of, and that you take a step to improve your current process. So my name is Finbar Fleming. I'm the lead customer engineer at Rollbar. My background is I was a developer and a development manager for about 15 years with a particular interest in helping developers to improve their processes. So spent a lot of time working to develop continuous integration processes and test driven development processes. And more recently helping developers with my job at Rollbar to manage errors more effectively and to help them build out error response processes in their applications. Written lots of code and written lots of code with errors, and I help teams to improve their errors response capabilities. So I work with them so that they get better visibility into their errors, can differentiate between the important ones and the ones that are of lower importance, and then at a more advanced level, which I'm really not going to talk about much today, is just the ability to make either semiautomated or automated response decisions in their SDLC based on this error data. I see a lot of errors. See about 30% of the people I work with are JavaScript developers. So I see a lot of JavaScript developers and I talk with a lot of developers. So developers that use rollbar and also developers who don't use rollbar, and I learn from them about what they do differently. So just going to give a high level description of what rollbar is. So rollbar is a platform to group and respond to errors and to give developers insights about those errors. Rollbar is a platform to group, respond to and gain insights from errors. And in real time applications send data to rollbar. That data is analyzed and the errors are given an identity or a fingerprint. So think the code flow associated with an error, or the stack trace associated with an error is given a fingerprint and it's given a fingerprint in a way that's very accurate and done very quickly. It's instant and it's done in a way that's designed to allow for many code changes, so that over time, even though your code is changing, maybe you're upgrading versions of open source libraries. The same error is still giving you the same fingerprint. Then based on that, it enables certain workflows. So it might be that an error is occurring for the first time in an application that's never been seen before and you want to notify the team. Or it may be that an error in a release that you just deployed ten minutes or 15 minutes ago is impacting ten different users every minute longer teams, or over longer time horizons. The data that customers send a rollbar enables them to identify risk in their code bases, so can identify maybe particularly risky files or parts of the code based modules, and then hopefully to give developers insights into maybe where they can improve their processes. So I'm going to talk about the cost of errors to your business. This data here is taken from the rollbar state of software code report, which interviewed about 950 developers. And these here are some of the stats that I think are important for our topic here today. So first of all, 34% say that losing users is the biggest risk of errors in their software applications. 28% actually reported losing a significant number of users to errors. So errors in your applications are costing you dramatically, increasing the risk that you do lose customers. And many of you actually are losing customers. The other statistic that I thought was really, really interesting was that 88% of the people that were interviewed said that bugs and errors are reported by customers first. And that's a little shocking. And I suppose if you take nothing else from this session, it really doesn't have to be that way in teams of the cost to your developers. Again, some more statistics from that report. 32% are spending up to 10 hours a week fixing bugs and 16% are spending up to 15 hours per week fixing bugs. And if you think that on average that a developer might be actually writing code 25 27 hours a week, any improvement that you can make in how they work with existing errors and bugs in existing code dramatically increases their ability to develop new features and new functionality. So it dramatically improves the throughput of new features and new functionality. If you can make these types of improvements in your code basis. A few other the people doing the survey were having a bit of fun. 21% of developers would rather go to the dentist and 26% would rather spend time paying bills. But I suppose what you want to take from that is that developers really don't want to spend their time fixing bugs in existing code. They'd much prefer to be working on new features and new functionality, which coincidentally is what the business wants. Also, developers in JavaScript actually have a particular challenge. So first of all, sort of by definition, I suppose I should clarify, I'm really focused on client side errors in this presentation. End users are always impacted, okay, on back end projects you may get away with it not being obvious that an error is occurring, or you might be able to run something, or something might be done asynchronously on a UI, your users are being impacted. Second thing to note is that the tools that developers and people who work with JavaScript have access to actually are a little bit too convenient and allow you to set up bad habits. And if we think of like the developer tools or the browser developer tools and the console log developers know how to use them, QA teams know how to use them, technical support teams know how to use them, and it's just too easy to use that as a location to debug your errors. And the problem with that is that once you go into production and the people using the application are no longer your internal developers, your internal QA staff, or your internal technical support teams, you lose that visibility. And there's no formalization of the errors response process before you put these applications into production, so that when you do deploy to production you really don't have anything. The other thing to note is that who owns client side errors, and many companies will have an SRE team, and they're responsible for ensuring site reliability. Often that doesn't include client side monitoring. Errors that are occurring in a strange version of a browser that's experiencing slight intermittent network issues isn't really the responsibility of an SRE team. So what often happens is that JavaScript applications get to the point where they're in production, but there's no process that has been put in place to observe what's occurring and to monitor what's occurring and taken, obviously. Lastly, there can be a lot of noise in JavaScript errors, there can be network issues, there can be browser specific issues, third party code issues, there can also be a culture of look, it's just easier to let the customer contact us rather than deal with these noisy errors. And that's a slightly flawed way to look at the problem. Just kind of another convenient number here. I did a conference recently and I spoke with probably a few hundred developers, specifically JavaScript developers, and 75% of the people I spoke with had no visibility into production code errors. Of the ones that did have something, most of them were logging generally to using the logging functionality, maybe of an APM solution or else a dedicated logging solution. And the others ones that were doing something were using an errors monitoring system like rollbar or something else, something similar. The other thing, I suppose that was of note there was that many of them weren't aware that this technology exists. This ability to analyze stack trace, give it an identity as soon as the error occurs, and then use that identity to let developer teams know if it's new and has never been seen before, or if it's a reoccurrence of an existing problem. There are two categories of tools that I see developers using primarily. One is logging, and they are often using the logging functionality, maybe even the APM, and they get the error detail. They can often put structure around it so you can add additional key value pairs. Really the focus is on being able to consume the logging data rather than identifying this error in real time. And it's really around getting the logs into the system and giving you the ability to query after the fact. So maybe you learn about the error still from your customer, but at that point you can go into the logs. The errors identification functionality tends to be very basic and sort of bolted on, so you don't get this very accurate identification of errors that come with error monitoring and error monitoring. And I said this here is a component of what rollbar does. There are a number of vendors in this space at a high level that the process is similar. You're intercepting the on errors event, and so for uncalled errors on the page you can capture that data, the stack trace associated with that, send it to some system, analyze the stack trace, give the error an identity, and based on the stack trace the error is given a fingerprint. And this happens immediately when the errors occurs and the sameness or two errors being considered the same from that point forward is based on the fingerprint and not based on some query that you configure after the fact. If you want to kind of know a bit more about how rollbar differentiates, consider ourselves different from other vendors in the error monitoring space. It's really around the accuracy and the speed of this grouping and the fact that it can stand up to code changes and focused on giving developed teams the data about the errors so that they can use that to improve their SDLC and identify risk and manage it better. So this here is the lifecycle of a production Javascript error, or at least for the 88% of people who learn about the errors first from their customers. So there are sort of five steps I've broken it down to. And the thing to note here is that the first three steps take a lot of time and are a source of much unhappiness for end users. So if you think customer experiences the error, hopefully the customer is calling in. At that point, technical support person is trying to preproduction the issue, maybe on their own or with the help of the customer. And at that point it's given to a developer who will hopefully have what they need to reproduce the error. And then at that point then they will understand the root cause. Just a few notes about this process customers obviously aren't happy. They may be on social media telling their friends that they're not happy. At a minimum, maybe you're experiencing service level agreement breaches, possibly transactions aren't getting completed, and this unhappiness has both short term and long term impacts for your business. So even though you might resolve the error and you might move on, the customer will still hesitant to potentially use you again or to recommend you to somebody else. Once you understand the error and once you have the stack trace associated with an error, you can really generally very quickly deduce the root cause and get a fix out there. And this really is where error monitoring helps greatly. By capturing the data in a structured form, presenting the stack trace back to you, presenting the other metadata associated with the error, like the telemetry data on the page, so the events that fired on that page before the error occurred, you have all the information you would want in order to quickly reproduce it, understand the root cause, and get a fix out. The immediate benefits of enabling an error monitoring solution in your code base is immediately you get visibility into all errors. You can see which errors are occurring right now, you can see the data you need to reproduce the errors, and if this grouping process is done very accurately, you have the data you need to do to decide which error needs to be fixed first. So you'll have much more accurate information around. How many times did this error actually occur? How many people were impacted by this specific error? Just on the right hand side there, I have a few images from Rollbar fix, which is the rollbar product focused on error monitoring and just you have immediate visibility into what the pattern of occurrences has been for this error over time, how many people have been impacted over time. And in this specific error, we see here that the vast, vast majority of people experiencing this errors are using Google Chrome. So immediate visibility into the business impact. Just as a side note, the process of setting up error monitoring is generally very quick. It's a case of importing can SDK, turning on errors tracking, and then you have immediate visibility into your errors. What we see most customers doing is very quickly they realize that there are errors occurring that they didn't know about. They go through a process to reduce those errors and very quickly, dramatically reduce the number of errors that are coming through the system. So last week I worked with someone who, the first month of working with rollbar were sending 40,000 errors a month, and now they're down to 2000. And that's in six weeks of using the product. That's visibility they didn't previously have. They were able to identify, actually we should be fixing these errors and now they're down to a very, very manageable number of errors. And if any new error occurs, it will be much clearer that it's a new error and they will then be able to go ahead and get that fixed. Error monitoring is a journey. It's not something you just turn on. What I see with the customers I work with is that there are sort of five phases to the journey, and the first is initial installation. Immediately you're getting real time visibility, understanding the business impact of errors. Then they go through that process that I mentioned on the previous slide where they'll fix a bunch of low hanging errors and low hanging issues that are causing problems. And then they'll turn on notifications. So they'll say, we care about this category of error. We don't care as much about this category of error. Then they will again further along that journey, add additional context to the errors. So it might be they want to know what geography and errors is coming from. They might want to know what category of user is being impacted by the errors. Is it impacting our free tier customers or is it impacting our highest paying customers? Then they'll go through a phase of where they're doing additional integration. So integrating with the source control system, maybe some testing framework tools that they work with, session recording tools. Now they learn about their errors in real time from rollbar and taken. They can easily link out to replay a session, maybe a logging system, actually, where they're on the back end where they're seeing that the non error data associated with the session that caused this error. And then final step is more advanced automated response. And it could be that they're using this data in an automated way, in a build pipeline to say don't allow a build to continue, or it may be that based on errors, let's say in a canary deployment, they want to trigger an automatic rollback. So the first step of that journey is very easy. Initial installation generally takes no more than 30 minutes, 60 minutes. You will immediately get additional visibility and much more advanced capabilities to understand the errors that are occurring in production and to debug them and to get fixes out there quickly. So I'm just going to finish up here now. And first of all, thank you very much for taking the time to attend this session. You take one thing from this here, do something, take a step along this journey. Your customers and your colleagues will thank you. One thing I will say particular to Javascript developers is try to make the improvement early in your apps development. So as soon as you start working on a brand new code base, put something in, start to build that culture around. We have a centralized location to manage errors from all environments and this here is where we're going to process errors because it will make it much easier. While it's convenient to use the console log and the developer tool in pre production environments, once you go into production and there are other users experiencing these errors, you lose that visibility into what's going on. If you only have access to logging or the logging in your APM system, use it. It's much improved visibility. Having some visibility is better than none. If possible, try to add error monitoring to your application. Definitely from the rollbar side it's easy to set up. You can make dramatic improvements quite quickly, and you can very quickly make it a part of your team's culture. And then as you use and as you begin to become more experienced with error monitoring, build out your solutions, add to it as needed based on feedback from your QA teams, from your SRE teams, from your customer engineer, your customer success or customer service staff, and build it out as you need to best fit the needs of your team and your application. So earlier in the session I referenced statistics from the Rollbar State of Software code report. Here are two links. One is to download the report and the other is to get access to a blog post about it. Definitely check those out. I hope you take the next step in your journey to improve your capabilities around debugging production errors in your Javascript applications. If that journey takes you to rollbar. We'd love to have you. We'd love to have your errors. You can try us out directly on rollbar.com or this here is a specific promo code for this conference and for the attendees of this conference. And if you sign up with this promo code, we'll turn on a few extra features that you wouldn't normally get in a trial account and also give you early access to some upcoming features. So definitely try that out. I do hope that you take the next step to improve how you work with your errors. On behalf of Rollbar, I'd just like to say that we're delighted to be part of this conference. We love being part of the developer community where developers help each other developers, and where they learn from experts in the industry. So thanks again and enjoy the rest of the conference.

See all 35 talks at this event!

Conf42 JavaScript 2022 - Online

November 17 2022

Improving the process of debugging JavaScript errors in Production for better end user experience and happier developers

Video size:

Abstract

Summary

Transcript

Finbar Fleming

Customer Engineer @ Rollbar

Join the community!

Featured event

2025

2024

Info

Conf42 JavaScript 2022 - Online

November 17 2022

Improving the process of debugging JavaScript errors in Production for better end user experience and happier developers

Video size:

Abstract

Summary

Transcript

Finbar Fleming

Customer Engineer @ Rollbar

Join the community!