Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, welcome to our presentation today. I'm Ranjan, a software
developer, and I'm Sylvia, my student at Carleton University.
Today we'll be taking a session on debugging JVM performance issues.
The goal of this presentation is to provide an overview of performance
issues that we have encountered over the past few years in the
context of Java applications and how to go about debugging
them. A few disclaimers before we proceed this is a fragment of
our experience. It's not meant to be a comprehensive guide, at least not yet.
It covers a brief overview of the memory layout and
the garbage collection process, followed by the triage process for the
performance issues and aspects of the tools that
are used in the triage process. What it does not cover are detailed
instructions on how to use those tools. Links are provided for that
purpose, followed by fine tuning the JVM performance
nor fixing the performance issue. Those are topics for another
day. The agenda of the presentation can be broadly classified
into three portions. The first part is about a
JVM internals recap, followed by the performance issues
that we'd be mentioning and talking about as to how to go
about triage. And then we'd be wrapping it up and concluding in the JVM internal
section. Sylvia will be talking about the memory layout followed by
the garbage collection process, and then she'll
move on to the performance issue section where she'll be talking about a high cpu
usage and a high garbage collection activity scenario.
And then I'll take over and we'll be talking about a high memory usage
and an application crash scenario. Then we'll summarize the triage
process for the aforementioned performance issues, followed by the
future work we have on our mind. There's a question slide
which contains links to wikipages that talk
about all the topics mentioned in the slides in much more detail,
as well as the open source library that we'll be talking about in this
presentation. So any questions, any concerns at that point of time, feel free
to reach out to us. Our email addresses will be provided at the description of
the video, so we would appreciate questions,
feedback, or any other comments that you may have. Now, I'll hand it over
to Sylvia to start with the JVM internals recap. So thank
you Ranjan for the introduction. I know many of you already know what JVM
is and how its memory is divided into. For those who don't know,
I'll give you guys a brief explanation on what it is. JVM stands for Java
virtual machine. It's a lightweight virtual machine that runs Java applications.
Its memory is divided into five sections, so the first section is the prom
counter or pc register, and this is where we store the address of
the current instruction being executed. Next we have the stack,
and in the stack we have the stack frames. When a method is called
a new stack frame will be created for that method, and each stack frame contains
the local variables and the local variables are stored in an order
set and depending on its type, it's going to be stored into
one entry onto two entries. So for example, if it's going to be an
integer, it's going to be stored into one entry, but if
it's a long or a double, it's going to be stored into two entries.
And inside the stack frame we also have the operand stack and as
its name suggests it's a stack and this is where the intermediate
operations are going to be stored. Finally, we have the frame data, and the
frame data contains the references to the constant pool and the
exception table. So the constant pool is a table that contains information about
the values of the constants and as well as their types.
And the exception table is a table that allows the
JVM to know where an exception can be thrown and where
it can be handled. Then we also have the native method stack, and the native
method stack is similar to a stack, but it will behave differently depending
on the native language. So if the native language is c, then the native
method stack will behave like a c stack. Inside the Javier memory
we also have the method area, and the method area
contains class level information, so that includes the static
fields as well as the code method. And this is where the constant pool
is going to be located. The method area is logically part of the heap,
and the heap is where we store the Java application objects. It is divided
into three sections. The first section is the joint generation, and the joint
generation contains objects that are newly created.
It has two parts. The first part is Eden, and Eden is
where the newly created objects are going to be located. And then we have the
survivor spaces, and the survivor spaces contains objects that have
survived at least one DC cycle or garbage collection
cycle. Inside the heap we also have the old generation, and the
old generation is where the objects that have survived
multiple GC cycles are stored.
Then we also have the metaspace, and the metaspace is
where we store class level information. And this is where the method area is
located. When the hue runs on memory we need to have a process that would
allocate and use objects, and this is where the garbage collector comes into play.
So the garbage collector is a tool that collects and use Java
objects. It works by first marking and then collecting.
When the garbage collector marks, it will try to find all the reference objects
and then mark them as alive. It will start looking from
the easy roads, and the easy roads are objects that
are referenced by the JVM at that specific moment.
So that includes the static objects. Once the garbage collector
finds all the reference objects, it will try to collect all the unreferenced
objects. And there's three main ways in which the garbage collector can
do it. The first way is by sweep. So when the
garbage collector swips, then we just deallocate the memory. So that can cause heap
fragmentation. And hip fragmentation occurs when we deallocate memory.
And that can lead to small gaps of free memory in between used
memory. And this is a huge issue for the performance of the
application. So a way to prevent this, the garbage collateral can also sweep and
compact. So after
reallocated the unused object, you can just compact the reference object
into a contiguous block. Another way to prevent hue
fermentation is by copying. So when the garbage
copies, it will copy all the reference objects internal
region and then it will delete the old region. So the garbage
collector can process different parts of the heap and depending on where it process,
it can be called by different names. So the
recruiter can be called minor GC if it process the
joint generation. So the minor GC
is triggered whenever the joint generation runs out of memory. And this
is the most frequently called GC. And that comes from the idea
that the most objects have a short lifespan, so it will be
more effective for the application that
the minor GC is run more frequently compared to the old gcs. Then we have
the major DC, and the major DC occurs in the old generation.
This can be triggered when the old generation runs out of memory,
or it can also be triggered by the minor DC. Then finally we
have the full DC, and this occurs when the garbage collector process the entire heap.
So before I continue with a different implementation of the garbage collector,
I have to talk about what stop the word pause is. So stop the
word pause is triggered by the GC, and when that occurs,
all the Java application threads will be frozen so
they won't be run. So the garbage collector can process
the heap without worrying about any change made by the Java
application. Now that we're going to discuss about
the different implementations of the garbage collector here, we're going to
discuss about the four most important ones.
So the first one that we're going to look into is zero garbage collector.
So this is a single thread garbage collector that creates a stopped wordpause
when marking and collecting. Looking at this example, we have three Java
application threads that are running and once the heap runs out of memory the
zero garbage collector will start and this will trigger stop the word pause
and after the GC thread terminates,
then the Java application thread will resume their execution. Then we
have the parallel GC and this is similar to the zero GC,
but it is different in the sense that instead of being single thread,
it's going to be multi thread GC. Looking at this example,
we have three Java application threads that are running. Once the Hebrew runs out of
memory, the parallel GC is going to
start and this will trigger stop the work pause. And once
all the GC threads finish, then the Java application threads
will resume the execution. Another implementation of the garbage collector is
a concurrent mark and sweep, or CMS for short. So this is a multi thread
concurrent marking and sweeping GC look in this example we have three
Java application threads that are running and when the CMS is
triggered it will trigger a stop the word pulse for initial
marking. So during the initial marking, the garbage code will mark
all the easy routes. So as I explained before, the easy routes are objects that
are referenced by the JVM at a specific moment and that includes
the static objects. Once the initial marking is done, then the application thread
will resume, while at the same time the easy threads for marking
are also executed. After a period of time. Then the CMS
will trigger stop the word pause for the final marking, and during the final
marking the etC will ensure that all the changes made during
the concurrent marketing are also detected. Once the final marking is done,
then the application thread will resume the execution,
but at the same time the ETC threads will sweep the
heat. One of the main downsides of the CMS is
that it's not very good at preventing heat fermentation. So when
that occurs, the CMS needs to trigger stop the work pause for
compacting, and during compacting the CMS will compact all the reference
objects into a contiguous block and after compacting, the Niger application thread
will resume the execution. So as I trained before, the concurrent market suite
is not very good at preventing fermentation.
So in this next implementation of the garbage code it
will prevent memory fermentation.
So garbage first is a multitrack concurrent market and copying GC
that collects the garbage first. So what this means is that
the garbage first will process the regions
that has more garbage or unreferenced objects first.
One difference between this GC compared to the previous ones is
that it divides the heap into regions of equal sizes while
also maintaining the old generation and the joint generation. Looking at this picture,
this is how the garbage first look at the heap.
In this example, we want to process these
three highlighted regions. What does the garbage first will do? It will
copy all the reference objects in those regions and then move them into
where the arrow is pointed. So to the blue region, and once all the
reference objects are copied, then the old regions will be deleted.
Looking at this other example, we have three Java application threads that are running and
when the garbage first starts it will trigger stop the word pause for
the initial marking, and after initial marking. Then the Java application threads will
resume their execution while at the same time the garbage first thread will
stop, concurrently marking, and after a period of time the garbage first will trigger a
stop the word pause for the final marking, and once the final marking is
done, then the garbage first will copy at the
same time the Java application
will run. So now that I've talked about the
different implementation of the garbage collector as well as how the JVM memory is
divided to now we're going to look at some performance
issues that we think are very common. So the
first scenarios that we're going to look into is when there's a thread that consists
more cpu compared to other threads in the same application. So for this
scenarios, we have created a multitread program with a thread that consumes
a lot of cpu compared to other threads in the same application. In this code
sample, we have the thread that will consume the most cpu, and this
thread will increase the counter continuously. In this order sample
code we have the normal thread and its behavior
is similar to a cpu hover thread, but instead of increasing
the counter continuously, it will slip in between iterations.
We will be using the normal thread as a baseline to compare to a cpu
hover thread. For this scenarios, we have created three normal threads
and one cpu hover thread. One way to debug this scenario is by using
jstack. So jstack is a command line tool that gets information
about the running threads. It's included
in Oracle, JDK, and OpenJDK. We have retrieved three samples using
jstack, and each sample was retrieved every 30 seconds.
So the first sample that we have collected,
we have collected it at the 11 seconds
mark. Then we have calculated the normal first
average cpu time and the cpu hover cpu time. Just by looking
at this data, the cpu hover cpu time is much higher than the normal threads
average the second sample we have collected is at conf
fourty two seconds mark. And by looking at the data from in
the second sample we have calculated the normal thread average
cpu time as well as cpu horizon time.
And still the cpu horizon time is still much higher than the normal thread
average. The third sample we have collected it at the 72 seconds mark.
And just by looking at the cpu time for a normal thread compared
to a cpu hover, the cpu hover is still much higher than
a normal thread. And yes, by looking at
the data collected throughout the whole samples, we can clearly see
that a cpu hover is a thread that consumes the most
cpu. Another way to trash this scenario
is by using visual VM. So visual VM is a visual
tool that allows you to get information about the
Java applications that are running. So looking at this screenshot, we can see
the running time of the cpu horror thread
and the normal threads. And the cpu horror thread has a
longer running time than the normal threads and this can be
screened by the state of the thread throughout the sampling.
So the cpu hover thread has been running continuously, which is denoted
by the green bar, while the normal threads have been sleeping
throughout the whole sampling which is denoted by the purple bar.
Another way to try this scenario is facing the
Java fly recorder and mission control. Java Fly recorder is a tool
that allows you to get information about the Java applications in the JVM.
A mission control is a visual tool that allows you to see the output from
the Java fly recorder. We have used Java fly recorder to record
the first 60 seconds on the application, and on the right side of the screenshot
we can see the state of the normal thread and also of the cpu
hover throughout the first 60 seconds. While the normal threads
have been throughout most of the sampling,
the cpu hover has been
running continuously. Another way to try this scenarios
is based in Jwadia. So Jwadia is an open source library that gets information
about the JVM, and in this sample code we're
getting information from Jwadia about the threads,
and then we print that information as shown in the sample output.
I also need to mention that Jwadia can get more information about the threads,
but for the purposes of this presentation, we're only showing a portion
of the data. But if you want to know what other threat information jy
can get, you can just look into the link at the end of this presentation
or at the description of this video. The second
scenarios that we're going to look into is whether is a high gc activity or
a high garbage collection activity. So for this scenarios
we have created a program that will create a large number
of objects and a portion of those objects are going to be referenced
and the other portions are going to be unreferenced. So the object that
we're going to create is called number and is around 16 bytes
in size. In this sample code we're creating a large number of number
objects and then we're only adding one fifth of them
to a list. So the reason that we're doing this is because we want
to see the difference between the minor GC versus the major
GC. For this scenario we have created 500 million
number objects. We have also used JSTA to get information
about the etc activity. We have collected a total of three samples,
and each sample we have collected it every 30 seconds.
The first sample we have collected it at 25 seconds mark. And if we only
focus on one EC thread, in this case the EC thread number zero,
we calculate the cpu time as well as tp usage.
So the second sample we have collected is at the 32 seconds mark.
It has been around 32 seconds since the last retrieval. So if we focus
on the EC three number zero, we can see that the cpu time has increased
as well as the cpu usage. So the third sample we have
collected is IB 89 seconds mark. It has been around 31 seconds
since the last retrieval. And if we focus on the etc three number zero,
the cpu time and the cpu usage have increased,
but the cpu usage is slowly starting to maintain
around 96%. We also can use visual
vm to get information about the etc activity. So on
the left side of the screenshot we can see the CPU usage
as well as the eTC activity. And on the right side we can see the
memory allocation. So if we use these two
graphs we can see that as more memory is used,
the EC activity will be steady but it will lower after
completing the etc cycle. We also use the Java
flight recorder and mission control to get information about the DC activity.
We have recorded the first 6 seconds of the application and look at this screenshot
of the javafly recorder. We can see the memory allocation which
is denoted by the purple lines, and we can
see the etC activity which is denoted by
the orange region. And by looking at this data we can see that
as more memory is allocated, the longer the
etC activity is but the less frequent it is. We can
also use Jwadia to get information about the eTC activity.
We have used this code sample to get information about the
EtC threat. Using Jwadia with the information we have
divided it based on if it's going to be in the young generation versus
the old generation. And using this data, we have plotted this graph
of collection count versus timestamp. And by comparing the
minor EC versus a major EC,
we can see that while the minor EC has increased continuously,
the major DC maintains at around zero during
the first half, but is also increased from the second half.
We also have plotted the speed time versus timestamp using the data retrieved
from Jwadia, and by comparing the minor EC versus the
major, etc. We can see that the behavior for both of them
is very different, but the minor EC has increased continuously until it reaches
a maximum 5 seconds. The major EC maintains at around 0
second during the first half and it starts to increase
from the second half until reaching a maximum of 43 seconds.
And the reason of this behavior for both the previous
graph and this graph for the major TC is mostly because
the old generation is running out of memory. So the major
is triggered in this region. So now I will give the presentation back to
Vania so he can spray to the next scenarios. Thank you Sylvia.
So moving on to scenario three, a high memory usage.
Please take a look at this snippet of code. Try to pause and see
if you can find out what's wrong or what is causing the issue in this
piece of code. There are two key aspects over here. One is we
have a method that generates a random string of a specified length,
followed by a main method which calls it and generates a string,
a random string of a 32 million characters in
count. So in Java, since we know that each character takes four bytes,
we can make an estimation saying that, okay, if we were
to generate 32 million characters, it'll take at least 32
million into four, which is 128 megabytes. So that's the minimum amount
of memory it would take. We still haven't considered all the local
variables or helper variables or other memory locations
used for calculation as a part of this. So we can say with confidence
that at least 128 mb will be needed. What happens
if we run the same program with an XMX value
of 64 megs? So we're saying that, okay, the maximum allocatable
heap space is 64 megs. Based on our calculation,
we know for a fact that at least 128 megabytes are needed.
So what do you think will happen? The JVM will just go boom.
It'll cite an out of memory error and it will say that hey, I'm not
able to allocate the necessary memory to do what you asked me to
do. So there are several ways to go about trialing this one
such way is to use a tool called JMAP, as mentioned by Sylvia
that's included with the jdks. And as long as we know
the process id of the Java application that has a high memory usage,
we can use the Jmap command and generate a heap dumps from it,
which is nothing but a snapshot of the heap memory at
that point in time. A word of caution, whenever a heap dump is generated,
the JVM will be haunted, as in all the threads will be suspended
so that the state of the heap memory can be copied over to a
binary file. So sometimes all the time
this causes a slight lapse in application availability. Sometimes it
may take a bit longer than anticipated. So using a heap
dump generation through any means or any tools should be done with a
bit of caution. Once the heap dump is generated, you will see a H Prof.
File which denotes the generated heap dump that can be opened
with tools like memory analyzer, eclipse memory analyzer to
see what it contains, how the memory heap memory looked at looked like
at that point in time. So you could see what objects occupied
most of the memory, what threads are referencing those objects,
and then go to those code parts and see if you could make any optimizations
to make it use lesser memory. Another approach is to use a tool called
visual JVM. This is a tool that connects to the Java process and
gives you live thread and memory usage information.
So what you could do is when the memory usage is high, you could try
connecting to the process and see what threads are running at that point and try
to correlate. So when you know what threads are running at that point,
you could look at the stack traces of the thread, what is a code path
that the thread is following, and investigate that to see what could
have caused a high memory usage over there. And if you want to go a
bit deeper where the memory usage is high, you could also generate
a heap dump by a click of a button in the store and analyze that
heap dump to look at objects that are hugging the memory or threads that
are referencing the objects that are hugging the memory. So another situation
over here for triaging this particular scenario is by using the
flight recorder and mission control. As Sylvia had mentioned before,
the flight recorder can be enabled in the JVM by adding
a few VM options, by specifying what duration and things
like that. So once it finishes recording the additional instrumentation
or debug information, it generates a JFR file where it dumps
all that information and collected, and that JFR file can be opened in
tools like mission control, where we can take a look
at how the memory usage threads usage and
so and so all the instrumentation data for that application in that duration.
So consider a scenario where we generate one string, one random
string of 32 million characters, and we maintain the same string throughout
the lifetime of the program. We don't dispose of it, we don't
create any other objects, but we just have a single string that we keep referencing
till the end of the program. In such a scenarios, how do you think the
memory usage trend would look like? It would basically be a gradual rise
in memory usage that denotes the portion of the code where
we generate the random string as random characters are being generated.
The peak of that slope indicates the point where
the string has been generated, and then we see a dip, a dip
over there. That dip basically denotes the time frame
when the string has been generated. All the additional memory that
has been used to generate the string is no longer referenced. So garbage collector
kicks in, it cleans up all of that, and then throughout the
rest of the duration of the program, we see the memory usage stagnant.
So that denotes that that's the memory needed for
the program to not only run, but also for the generated string to be
stored throughout the duration of the program. And by looking at the garbage collection
activity in that time frame, it is evident that that dip in
memory usage is signified by an orange block, which denotes
that garbage collection activity took place at that point in time.
So a slight variation in that scenario where we're not generating
just one string and maintaining it throughout the duration of the program.
Instead, we're constantly generating strings
and not using them anywhere. In such a scenario, strings are
generated and dereferenced generated dereferenced.
So the garbage collection should kick in and dispose of all
the dereferenced or unreferenced strings. So let's see how the graph would look
like. We see a gradual increase denoting the string generation
process, and then we see a sharp drop from the
peak, which denotes that garbage collection kicked in. Realize that the string
that was generated recently is not referenced anywhere, and it can be cleaned up
and it cleans it. And that is what we see as a shout drop.
Since we constantly keep generating and then dereferencing,
generating dereferencing, after which the garbage collector sweeps it,
we see a sawtooth like waveform. And in this scenario,
if you notice, just like the mission control graphs earlier, the peak
sound of the same size. This is indicative that garbage collection doesn't kick
immediately. As soon as an object is dereferenced that would tell us
a lot of things, right? So it's not necessary that the garbage collection is periodic
or immediate to act at all scenarios. There may be a lot of things
preventing it from acting immediately when an object is dereferenced.
It could be a sheer cpu thread contention, as in a lot
of other threads are running, so the garbage collection threads don't have cpu time.
Or it's just that the sheer garbage collection mechanism is like that,
where it takes a bit of time to kick in in some scenarios. So if
you look at the garbage collection activity over here, we can see thick
orange lines corresponding to the drop in memory usage. This is indicated
that garbage collection has kicked in at that point and cleaned up all the dereference
strings. Now, moving on to triaging with Jydir
for this particular scenario, with just about four or five lines of
code, we can generate a heap dump, and this is in the runtime
of the program. You could add code to your application to generate a heap
dump if a memory usage exceeds a particular threshold, or if
the user puts a particular input. So this is pretty useful as you
don't need any other application to connect to it or to
basically attach to it and generate a heap dump through
that. You could just use your application as is to generate heap dumps when
needed. This will also end up generating HProf file, which can be analyzed
using the eclipse memory analyzer tool. A word of caution again,
generating the heap dump halts the VM. It causes a momentary lapse
in the availability of the application, so it should be used
with caution. Heap dumps should be generated with caution another
aspect of information that we can get using Jwyder is
basically the memory usage trend. We can see how the committed used
and the maximum memory values have fluctuated over time,
not only for the heap memory, but also for the non heap memory.
This is also useful in observing what kind of fluctuations are happening at what time
it has hit the peaks, and then we could look at other logs in
that time frame to see what has been running at that point in
order to see what would have contributed to such a high memory
usage. So for the scenarios where we are
generating one string and maintaining it throughout the lifetime of the program,
as we can see over here, the memory usage is constant, which is denoting
that the single string is occupying a constant memory throughout.
No other memory is allocated or dereferenced for the garbage collector to
clean. So the memory usage is constant compared to the
graph in mission control where we saw a gradual rise,
a small bump and then a constant memory usage.
We don't see that slope or the small bump here simply because the
time frame of this is much larger than what we saw in mission control.
Because of that, the rapid rise in memory and the small drop
is not as evident over here. Now moving on to the second
variation in that scenarios where we say we're not generating one
string and maintaining it throughout, instead we're generating dereferencing.
Generating dereferencing. The graph is very, very similar to what
we saw in mission control. Keep in mind all these graphs are generated
by values taken from JydR, and in this
scenario we can see it hit peaks and then it drops down when garbage
collection kicks in. And as I mentioned before, all the
peaks aren't of the same height, simply because of the fact that garbage collection
doesn't always kick in when the Maheep memory has
an unreferenced object. Sometimes it may take a bit more time to kick in.
Moving on to scenario number four, or the application crash,
please take a look at the snippet of code and try to pause and see
if you can identify the reason for the crash. It does two main things.
One is it tries to access a private variable within the unsafe class
called the unsafe. And this variable can be
accessed through reflection. That is the exact reason why we say f dot
set accessible to true, which means that, okay, even if it's a private entity,
let me access it through reflection. Once we do, we basically
use that and we try to write to an invalid or
inaccessible memory location. And that is what causes the JVM to
go boom. It says, hey, I'm not able to access this memory location
and I'm not able to do what you're asking me to do. So it
doesn't know how to proceed with it, so it just goes and crashes.
It produces a segmentation fault and it crashes. So whenever
a java application crashes like this, it generates gist of
the crash. And that gist is written to std out
or STDR. And if you have redirected those streams to any
file, you would find it in those files. A gist of a crash would look
like this. It contains four key things. One,
the reason for the crash, which in this case is a segmentation fault,
or a SIG segvi, followed by the Java version that was running
when in the Java version that the application was
running in when it crashed, followed by if a code dump
was generated. If so, where it can be found,
and last but not least, where the crash log or the error report
can be found. Now to dig in further to
understand what part of the code might have caused the crash, we can look at
the crash log, and we are interested in a very, very particular
portion of the crash log. The crash log contains a lot of information.
It's not only the JVM or the thread related information,
but also system information, information on the system in
which the application crashed. So we're going to look at
two key aspects over here. The first aspect is basically the stack triage
of the thread that caused the crash, followed by what threads were
running at the time of the crash. When we look at the stack trace of
the thread that caused the crash, we can clearly see that hey,
crash main was called which ended up calling crash crash,
which ended up calling a method with an unsafe, to be specific put
address, and after which it ended up calling a sequence of
native methods which caused a crash.
So considering this, it is easier for us to go and dig into
the code to see hey in this particular call stack, what could
have gone wrong. Let's see what code was there, what would have happened
before that code got executed, and what values could have caused this
crash. So that is a very vital piece of information over here.
Aside that we also have the list of running threads during
the time of the crash. This is very, very useful because sometimes
what happens is it's not necessarily that
that thread that caused the call that made the crash was
responsible for it. Some other threads that might have been running concurrently could
have also caused a state change or some
other value change which would have caused this thread
to crash, to initiate the crash. So it's very useful to also know
what threads were running at the same time or concurrently at the time
of the crash. So using all this information, we investigate the code
base and we see hey, okay, this is the path that triggered the crash,
and what are the values that could have triggered this? And then we go about
failing gracefully or writing code that
would end up avoiding the crash. Or we make a conscious decision
saying hey, okay, we want the application to crash in such scenarios.
Now, now that we are done with this, let's summarize
the triage process for all the previously discussed performance
issues. It can be done so as follows. There three
primary symptoms, a high memory usage or a high cpu usage,
or an application crash where the application of rupee starts.
So in a high memory usage scenario, we are interested in the memory usage
trend along with the thread information. Thread information of the threads
that have been running the trend on that. So the
steps to triage that would be to first look at the XMX for the maximum
allocatable heap size. And if based on our application
design, we also have to understand whether for that load that
the user is putting on it or the sheer data
that we feed into the application is putting on it, that XMX value is a
reasonable amount or not. If it is not a reasonable amount,
and we know for a fact that hey, the load is very high and we
are expecting this memory to be used for this kind of an input load,
then we go ahead and increase the amount of heap that's allocatable.
If that's not the case and we say hey, it shouldn't be occupying so much
of memory for this load, then we go into that investigation mode. We try
to correlate to see what threads are running when the memory usage is high,
or if we get a heap dump. Even better, we look at what objects are
occupying most of the memory and then we try to correlate it to threads that
are referencing those objects. Once we are able to do that, then it's only
a matter of code investigation. We have the path of the code that are key
contributors to the memory usage. Then we look at those parts
of the code to say hey, how do we reduce the memory usage of this
area and come out of the possible fix? The other symptom is a high
cpu usage where we are primarily interested in the
thread information trend. And one
of the first things we check is if the high cpu usage is
caused by a garbage collection thread, one or more garbage collection threads. If it
is, then we take the high memory usage route simply
because an increase in memory usage is what triggers the
garbage collection thread. And if no luck is
yielded from that route, then we look at other potential garbage collection mechanisms.
For example, if the application is using a concurrent mark suite,
then we could think of possibly moving it to a g one to get
a much much better garbage collection approach and lower the
cpu usage. If it's not a garbage collection thread,
then what we do is we look at the threads that have been causing the
high cpu usage and what was the stack
trace at that point of time when it was causing the high cpu usage?
We investigate that code path and we try to optimize it so that
it doesn't take up as much cpu time or it is bumped to a lower
priority if it is not meant to hog the cpu.
Now that's done, let's move to the third and final symptom,
which is an application crash. The symptom of this is that the application of
rupee stops and is accompanied by a gist of the crash,
a core dump, and or a crash log. So in the
crash log, when we take a look, we have so much of information, not only
the thread that caused the crash, all the threads that were running during the time
of the crash, information on the system in which it crashed, and so on.
So we have provided a link that you could use in order to understand
more as to how to use the crash log to triage this particular
symptom. One other important thing I would like to mention over here is high
garbage collection activity often manifests itself as
one of two symptoms, either a high memory usage or a high cpu
usage. Because only when the memory usage is increasing,
garbage collection kicks in and tries to clean. And for some reason
it's not able to clean because of a cpu contention
or because of the fact that all the memory used are referenced.
Then it'll try to be a bit more aggressive and it will spawn more threads
and try to garbage collect more frequently. So that will cause
a high garbage collection activity which will show symptoms of
high memory usage or a high cpu usage. That's the reason why it
doesn't have a separate row over here as it falls into one of those two
rows. Now that we have summarized the triage process for the previously
discussed performance issues, let's talk a bit about the
current work that we have done so far, followed by the future work
that we have planned. So right now we have summarized all the
performance issues that we have encountered and identified root
causes for them, and also documented the process of root cause. Even now
that was done, we went ahead and created an open source library
that lets us get all the information or metrics or diagnostic
information we need to make that root cause much more easier,
the root causing process much more easier, and that library is called JVide.
The third milestone which we'll be embarking on is fine tuning the
JVM and looking into actually fixing the performance issues. Not only
fine tuning to be very specific, but also adding
JVM options so that we get additional debug, logging or more
information that will make milestone one much more easier and
quicker. After this, we also want to investigate other reasons for the
application crash. So some people may debate
that application crashes aren't a performance issue,
but it can also be counterargued, saying that it causes a
lapse in availability when the application goes down. So in
a way, you are degrading the performance of your application cluster.
That's the reason why we have included that as a scenario for this presentation.
So investigating other reasons for crash, such as out of memory error
or garbage collection, overhead limit exceeded error,
or so many other variety of reasons for which an application could crash, would be
our next milestone. Last but not the least, we would want to make our
open source library a bit smarter. The sheer amount of diagnostic information
that this library can generate can easily surpass several hundred
megabytes in a few minutes, so continuously transmitting that
information over a network or storing it on the disk can become
very expensive very quickly. So we want to make the library a bit
smarter so that it only generates diagnostic information when
it's needed. For example, when the cpu usage of certain threads
exceed a threshold, say 70%, then we would want to store
all the information on those threads onto a log,
or onto a file, or transmit it over a network. Or if the
memory usage exceeds a particular threshold, then we want to look at the
memory usage trend, or the garbage collection trend,
or even generate a heat down. So by making it a bit smarter
in this aspect, we are saving quite a bit of data,
collecting unnecessary information, as some of the
detail of information are useful only when we want to triage performance issues.
Now that's done, we are in the questions section of our presentation. Since this
is a recorded video, we would greatly appreciate if you
have any questions or feedback to reach out to us using the email addresses
provided in the description below. And we have also provided links in
this presentation to the Wikipages which talk in much more detail
about the topics that we have mentioned in this presentation,
along with the open source library JyBL,
which we have created with the intention of collection of
diagnostic data. So now that's done,
we are officially finished with the presentation. Thank you so much for your time
and patience. We would love to see you now in the presentation
and hear from you if you have any questions or queries.
Thank you. Yeah, thank you.