Transcript
            
            
              This transcript was autogenerated. To make changes, submit a PR.
            
            
            
            
              You. Good morning, good afternoon,
            
            
            
              good evening everyone. Welcome to Conf fourty two or Java.
            
            
            
              Today I'm going to be sharing some really cool stuff about garbarge collection,
            
            
            
              so let's get on to it. These are some important disclaimers
            
            
            
              saying that this prestigious for information purposes. So this
            
            
            
              is the rough outline of what I'm going to be going through with you today.
            
            
            
              IBM going to start with the garbage collection overview. Going to start slowly and
            
            
            
              then go into them the more details with the OpenJ nine GC
            
            
            
              algorithms and I'm going to finish off with advanced structure
            
            
            
              called double mapping. A little bit about me I'm a software developer here at IBM.
            
            
            
              IBM almost for almost two years. I graduated from
            
            
            
              University of Waterloo. My master is in 2019 and
            
            
            
              one of my interests are systems, compilers,
            
            
            
              machine learning and AI. And a fun fact about me is that I'm a tennis
            
            
            
              aggregate. All right, if you want to get the builds for
            
            
            
              either hotspot OPG nine, I recommend going to adopt OpenJDK.
            
            
            
              There you can get both builds and terms.
            
            
            
              So you sure whats you have the best jvm
            
            
            
              and garbage collection which is the main part of the talk. Objects nine is
            
            
            
              a fairly mature project. It went to the open 2017.
            
            
            
              A fun fact about it is that it started as an embedded
            
            
            
              jvm. If familiar with Java Macro edition. That's how OPG
            
            
            
              nine started, which is why it has some really good memory performance
            
            
            
              when it comes to memory and it's open community.
            
            
            
              We welcome anyone who's interested and wants to contribute to the project.
            
            
            
              All right, garbage collection. So what is it? This is a
            
            
            
              more by the book definition. Garbarge collection is a form of automatic
            
            
            
              memory management. The garbage collect attempts to reclaim memory occupied by
            
            
            
              objects that are no longer used by the application. That's opposed to
            
            
            
              other programming languages like C Plus plus where you have to manually allocated
            
            
            
              memory with new and malloc and free those memory with free
            
            
            
              and delete. So what are the responsibilities of garbarge collection? They are allocate
            
            
            
              memory, identify the liveness of data and everything else is going to be considered garbage.
            
            
            
              Therefore the garbage collection is going to reclaim. So these are the most important
            
            
            
              responsibilities of the garbarge collection positives GCs.
            
            
            
              They are automatic managed automatic memory management.
            
            
            
              So it frees the burden of the developer to manually have to
            
            
            
              deal with memory. Also helps reduce some certain categories of
            
            
            
              bugs like dangling pointers and double freeze. On the other hand,
            
            
            
              there is the negative part, right? It requires some additional resources like
            
            
            
              additional cpu cycles. More memory can cause unpredictable pauses.
            
            
            
              You will see that they are known as stop the road pauses.
            
            
            
              It may introduce runtime costs and application. Whats little control
            
            
            
              of when memory is reclaimed. Here are some of the digital algorithms.
            
            
            
              I'm going to be explaining each of these as I go
            
            
            
              through the presentation. One, and that I won't be talking about, is reference counting,
            
            
            
              but in simple terms. Imagine I'm an object and I have
            
            
            
              two objects that's pointed to me. My reference count is going to be two.
            
            
            
              As soon as these objects deletes the references to me,
            
            
            
              my reference counts goes to zero. So pretty much I'm counting how many
            
            
            
              references I have, and once that count reaches zero, I'm considered garbage.
            
            
            
              Therefore I can be reclaimed. That's reference counting in 30 seconds.
            
            
            
              All right, this is what I'll be going through with you today.
            
            
            
              These are the different policies regarding open, unite first one's
            
            
            
              op throughput, the stop the world pilot collector, the second Gencon
            
            
            
              generational copy collector. Gencon Cs is an improvement of
            
            
            
              a Gencon's apostles collector. Metronome is an incremental,
            
            
            
              soft, real time collector and balance, the region based collector balance
            
            
            
              is really close to g one and John cone is really close to
            
            
            
              cs from hotspot optroput. So optroput is a parallel garbarge
            
            
            
              collection. GC operations are completely stop the world pauses,
            
            
            
              meaning all the application threads are paused. Marksweep and
            
            
            
              optional copat collector. I'll explain whats in more detail a little bit.
            
            
            
              And all GC operations are completed in parallel, meaning multiple threads. Garbarge collection
            
            
            
              the heat and GC native memory overhead for macmap and work
            
            
            
              packets. These are metadata needed by the GC
            
            
            
              to collect some garbage. So whenever you start an application,
            
            
            
              we allocate objects in what we call the heat.
            
            
            
              Normally is a contiguous block of memory, but it doesn't necessarily whats
            
            
            
              to be, but over 90% of the time it's going to be a contiguous block
            
            
            
              of memory. So we have a flat heap, but for optuput we further
            
            
            
              divide into two logical spaces, small object area and large
            
            
            
              object area. Most objects are allocated in the small object area,
            
            
            
              so this is how an application behaves under this policy.
            
            
            
              The white arrows represent the application thread and the red arrows represent the
            
            
            
              GC thread. So application starts right, and as soon as we
            
            
            
              hit an allocation failure, we deplete the memory of
            
            
            
              the heap. We start a GC cycle. That's the red arrows there. So GC
            
            
            
              collects the heap while all application threads stop. That's why
            
            
            
              it's called stop the road, and then the application threads can continue from
            
            
            
              there. So the GC is divided into three phases, marking, sweeping,
            
            
            
              collection, so marking where it's where we find all the live objects.
            
            
            
              So just go through the object graph, finding all the live objects.
            
            
            
              Then we sweep the dead objects, meaning all those objects
            
            
            
              that are not marked are dead objects, therefore we sweep them.
            
            
            
              And then there's a compaction phase because the heap might get fragmented. So what does
            
            
            
              it mean, a heap to get fragmented? So imagine through the lifetime of an application
            
            
            
              when we have many DC cycles, our heap might
            
            
            
              get like a swiss cheese, meaning it's going to have very little
            
            
            
              small free spaces, but maybe not enough contiguous free space to allocate
            
            
            
              an object. Therefore we need to compact that heap. This is what this slide
            
            
            
              is trying to explain, where we take all those live objects, put in some
            
            
            
              part of the heap, and now we have all that contiguous memory that
            
            
            
              we can work with. Jenco if IBM familiar with cs
            
            
            
              from hotspot. It's really similar to the generational copy collector,
            
            
            
              where objects have edges provides a significant reduction
            
            
            
              in the assist of the repulse times. Introduces right barrier for a member set and
            
            
            
              we have a cocurrent global marking phase where now we're going to be marking objects
            
            
            
              concurrently while the application track is running. So here, instead of dividing the
            
            
            
              heap into small object area and large object area, we divide the
            
            
            
              heap into nursery and tenure. In other words, nursery is a new space,
            
            
            
              tenure is old space. Furthermore, we divide nursery into allocate
            
            
            
              and survivor, and all objects are actually locating
            
            
            
              allocated space. This is how it behaves though. Don't panic. So again,
            
            
            
              the white arrows represent the application threads and red,
            
            
            
              the GC thread. Scavenge here represents a copy collector.
            
            
            
              So what is a cop collector? Cop collector means it's copying objects from
            
            
            
              one space to the next. So it's copying objects from allocate to survivor.
            
            
            
              So when we hit the limit of the allocate space,
            
            
            
              we start the scavenge, which is top the word copy collector. And now we start
            
            
            
              copying objects from scavenge to survivor. That's what scavenge is
            
            
            
              doing. Global mark face is concurrent
            
            
            
              mark phase, where it's actually marking the tenure space,
            
            
            
              because scavenge is only dealing with a new space. So how do we deal with
            
            
            
              the old space? That's what global mark phase is doing, marking the
            
            
            
              tenure space concurrently. And at the end, at the global end,
            
            
            
              we sweep the tenure space to collect those original tenure space.
            
            
            
              So because of this concurrency, we need what we call write
            
            
            
              barriers. So imagine we are trying to store a reference
            
            
            
              into an object field like that, and imagine
            
            
            
              we have the following object graph. We are in the middle of marking
            
            
            
              objects and applications running. So GC has already processed the roots.
            
            
            
              GC has already processed object a. As you see with the green tick,
            
            
            
              GC is in the middle processing object b and hasn't seen c yet.
            
            
            
              Then this happens. Mutated thread comes along and
            
            
            
              deletes pointer from b to c and adds a reference from
            
            
            
              a to c. And then the GC thread comes along and okay, b doesn't
            
            
            
              have any pointers, so I'm done marking. Therefore this is my resulting
            
            
            
              graph where I found roots. Object a, object b,
            
            
            
              object C is normal, therefore it's considered garbage. So IBM going to collect erroneously.
            
            
            
              So that's a problem. So how do we remember, how can we remember object c?
            
            
            
              We do that through write barriers. So how this is
            
            
            
              implemented. So we have a simply check there if it's concurrent
            
            
            
              disease is active. We remember that object. So we put that in a car
            
            
            
              table so that at the beginning of the GC cycle
            
            
            
              we actually go to the car table to look for these.
            
            
            
              Sorry, it's not the beginning. At the end we go through the car table to
            
            
            
              remember these references that we might have missed.
            
            
            
              But we do need an extra barrier because
            
            
            
              we remember we have new space and old space,
            
            
            
              but we might have references going from old space
            
            
            
              to new space. And scavenger is not aware of that. So in order to
            
            
            
              capture these references coming from old space to new space, we need another right
            
            
            
              barrier. And this right barrier looks like this. All we do is check if a
            
            
            
              is in old space, a is tenured and c is not. Meaning a is an
            
            
            
              old space, c is in a new space. Then we also remember that since
            
            
            
              we have two barriers here, we can put that in the same check to save
            
            
            
              some cpu cycles there. And that's the resulting right barrier for
            
            
            
              Gencon. Cool. Gencon concurrent scavenger,
            
            
            
              that's an improvement upon Gencon. This is a puzzleless collector. If you're familiar with
            
            
            
              Shenandoah and ZGC, those are also puzzles collection.
            
            
            
              They are some differences there, but concurrent scavenger is very similar to Gencon
            
            
            
              here. The generation cup collector introduced a readbearer because now
            
            
            
              we're going to be moving objects concurrently. Not just marking, but also
            
            
            
              moving objects. The heap structure is the same, right? We have
            
            
            
              the new space, old space, allocate, survivor, nothing new there. So the
            
            
            
              biggest difference here is the scavenger, which is instead of being stopped,
            
            
            
              the world now is going to be concurrent. The CS there stands for concurrent scavenger.
            
            
            
              So application starts instead of having a stop the world scavenger. We're going
            
            
            
              to have two small pauses, one at the beginning, one at the
            
            
            
              end for like synchronization, marking the roots. And the bulk of the
            
            
            
              work is going to be done concurrently with the application, which is where
            
            
            
              you see the yellow arrows there which represent application threads running concurrently
            
            
            
              with the application. And global marking phase here is still the
            
            
            
              same, right. The only thing is that the cop collector now is running
            
            
            
              concurrently with the application. Now we have a problem
            
            
            
              here, right, because the GC thread is trying to move objects while mutator
            
            
            
              is trying to access those same objects. So let's step
            
            
            
              back a little bit. Let's think about the parallel scavenger, right?
            
            
            
              Parallel scavenger is a stop toward scavenger. So I have multiple GC threads
            
            
            
              trying to move an object, right? So we can deal with that with an atomic
            
            
            
              operation, cas cop and swap, right? So let's see what happens in this
            
            
            
              case. So we have two DC threads trying to move an object,
            
            
            
              right? The green and brownish thread here. So what would
            
            
            
              happen in this case? So they both try to copy the object,
            
            
            
              right. So they race to cop the object. So in this case
            
            
            
              the brown one wins the race. Copy the object, install the
            
            
            
              foreign pointer, the green thread is going to be. Okay, I lost all good.
            
            
            
              I'll just continue do my work and cop some other objects. So far
            
            
            
              so good. Now what if we had a mutator in a DGC thread trying
            
            
            
              to do that? So let's keep this same Cas
            
            
            
              operation and see what happens. Right.
            
            
            
              So we have the same cas there, but instead of two GC threads we have
            
            
            
              a GC thread which is the green one, and a motel thread. Okay,
            
            
            
              so let's see what happens. So in this case,
            
            
            
              let's assume the GC thread won the race which copies the
            
            
            
              object. And then the motra thread is going to come along say
            
            
            
              okay, I lost the race, but I see a foreign pointer is going to
            
            
            
              follow that to access that object in the two space. But since
            
            
            
              GC thread's not done copying the object,
            
            
            
              the motel thread is going to access some garbage memory
            
            
            
              and that's a problem. So how do we fix that? That's how we fix it.
            
            
            
              So what we do is we unconditionally copy
            
            
            
              the object, right? So we copy object. So both
            
            
            
              is going to copy the object and what they do is they erase to
            
            
            
              install the foreign pointer. So how does this work actually?
            
            
            
              So both is going to copy the object, right? So that's going to be the
            
            
            
              resulting at the beginning. Both is going to have their own copy. And here at
            
            
            
              this point they're going to race to install the foreign pointer in the
            
            
            
              foreign space copy. So in this case, the GC thread won. The race to install
            
            
            
              the foreign porter material thread is going to be okay, I lost, but no matter,
            
            
            
              I'm just going to go to the foreign pointer to access that
            
            
            
              object. But since the G thread have already copied
            
            
            
              the object, the object, everything is fine because the
            
            
            
              object's there. The GC thread is done on copying the object, so everybody's
            
            
            
              happy. So that's what we call a read barrier. You're familiar with Shenandoa
            
            
            
              version two? They use the same barrier as here.
            
            
            
              Okay, metronome. Metronome is an
            
            
            
              incremental, soft real time collector. It provides an upper bound on GC
            
            
            
              post times. What I mean by upper bound is that imagine
            
            
            
              a window of ten milliseconds on that window.
            
            
            
              We guarantee that 7% of that time is going to
            
            
            
              be dedicated to the application, while the other 30% is going to be for
            
            
            
              the GC. So it's up to the GC to do with those 30%. Whatever he
            
            
            
              wants to do can do some GC work or give some time
            
            
            
              back to the application. Here uses a different write barrier called snapshot
            
            
            
              at the beginning. And because of this barrier,
            
            
            
              it has a lot, a high percentage of floating
            
            
            
              garbage, and you're going to see why. So metronome heap is not divided into new
            
            
            
              and old space. We actually divide the heap into regions.
            
            
            
              Here is a fixed size region 64,
            
            
            
              bigger the heap, the more regions we have. Regions are assign a cell size.
            
            
            
              So here it's a very nice characteristic
            
            
            
              is that objects on the same region have the same size. And if
            
            
            
              an object is larger than 2,
            
            
            
              can actually span multiple regions with the exception
            
            
            
              of arrays. So if we have an array that's bigger than
            
            
            
              a region, we actually is going to divide whats
            
            
            
              array into chunks and put a chunk here, another chunk here,
            
            
            
              and have the array header points to these different chunks.
            
            
            
              Because normally if we have an object that's logging region,
            
            
            
              we're going to trigger a stop the world GC, right?
            
            
            
              For arrays, we can just chunk that array and put this
            
            
            
              in different chunks. That's what we call arraylets. I'm going to explain a little bit
            
            
            
              more what arraylets are at the end of the presentation.
            
            
            
              This is how metronome behaves. Very, very simple, if we
            
            
            
              will remember opt throughput. From the beginning of the presentation,
            
            
            
              it had one huge top to roll, right?
            
            
            
              So imagine that huge top, the road pause divided into little
            
            
            
              increments. That's what metronome is, right? It does a little
            
            
            
              bit of work here, a little bit of work here, but never going
            
            
            
              over that 30% threshold, right. Remember those ten milliseconds
            
            
            
              window, only 30% is going to be dedicated to the GC. And that's what these
            
            
            
              increments are. A little bit of GC work here, here and here and here until
            
            
            
              it's done. So the right barrier that I mentioned to you was
            
            
            
              snapshot at the beginning, right? And it causes a lot of
            
            
            
              floating garbage because the barrier is actually performed before the store.
            
            
            
              So imagine we have the following object graph. We are marking object concurrently,
            
            
            
              right? And the GCA has already processed the
            
            
            
              roots, has already processed object a, is in the middle of processing object
            
            
            
              b, it hasn't visited object c, neither object D.
            
            
            
              The mutated thread comes along and does this, right. Deletes the reference
            
            
            
              from b to c and adds a reference from a to d. And then if
            
            
            
              we didn't have a barrier, this will be the resulting object graph and we
            
            
            
              would erroneously collect object D. So how does
            
            
            
              snapshot at the beginning actually work? It is implemented this way,
            
            
            
              meaning we remember the reference before we actually do the
            
            
            
              store. So we put it in a temporary field. Remember that if the berries
            
            
            
              active. So if you come back to the object graph here, the berry
            
            
            
              is actually triggered. Whenever we delete the reference from
            
            
            
              b to c, that's when the berry is triggered. So instead of
            
            
            
              having a resulting object graph like this, we would have something like this.
            
            
            
              In this case we would actually remember object C, which that is
            
            
            
              considered garbage because we don't really need object C.
            
            
            
              It just happens to be in the path of object D. That's why we have
            
            
            
              a lot of floating garbage with this type of barriers. Now, our last
            
            
            
              but not least, DC policy is balanced. If you're familiar with
            
            
            
              g one from hotspot, it's very similar. Both are generational,
            
            
            
              both are region based. One difference that I see is that the g
            
            
            
              one from hotspot has only three ages and we
            
            
            
              have 23 to 27 ages.
            
            
            
              Whats we have a lot more ages. Like I said, they are both region based.
            
            
            
              Balance provides significant rendition. Max stop the world
            
            
            
              poses. We introduce right barriers similar to the one from Gencon,
            
            
            
              because you might have reference from different regions coming to my
            
            
            
              region. And we have an incremental heap defragmentation. So similar to metronome,
            
            
            
              the heap is divided into regions. But here the bigger the heap,
            
            
            
              the larger the region, because we try to have between 1000 2000
            
            
            
              regions in this policy, objects are located in what we call addon regions,
            
            
            
              which is our new regions. And if an object is
            
            
            
              larger than a region, we just throw an out of memory error, with exceptions
            
            
            
              of arrays, same thing as metronome. We divide the arrays into chunks
            
            
            
              and put them into different regions. This is how a balanced
            
            
            
              application behaves. The balanced policy. PGC stands for partial
            
            
            
              garbage collection. It's a copy collector. So what it's doing is copying
            
            
            
              the live objects from one region to the other and tries to
            
            
            
              find those regions with the highest return of investment. So what does
            
            
            
              that mean? So imagine we have a region with only one live object.
            
            
            
              All we have to do is cop that object over. And there we go,
            
            
            
              we have an entire free region to work with.
            
            
            
              So PGC tries to find those regions which has the most
            
            
            
              garbage, which is going to do the least amount
            
            
            
              of work to free that region. And GMP here
            
            
            
              is a global marking phase. It doesn't collect
            
            
            
              objects, it actually helps PGC.
            
            
            
              So one problem about PGC is that it only has
            
            
            
              local knowledge about each region.
            
            
            
              As time goes on, it loses this local knowledge.
            
            
            
              So GMP is there to help update this local knowledge
            
            
            
              of PGC. And at the end of the GMP, it updates PGC
            
            
            
              information so that PGC can do its work more
            
            
            
              effectively. So GMP does not
            
            
            
              reclaim memory, right. It prefers a marking phase only.
            
            
            
              It's scheduled to run between pgcs and
            
            
            
              it gives an accurate mark map of the entire heap.
            
            
            
              And it's going to use this mark map to update the
            
            
            
              PGC. So this is more of a visual
            
            
            
              representation of how PGC actually looks. So on the X axis we
            
            
            
              have time and the Y axis we have free heap. So lines
            
            
            
              going up is the PGC doing its work and lines going down
            
            
            
              is the application consuming memory. So you can see this trend going
            
            
            
              down is the heap getting fragmented because of the
            
            
            
              PGC. And that happens because PGC loses this local
            
            
            
              information. So GMP is smart enough to trigger at some point here so
            
            
            
              that it finished in a timely manner to update PGC so
            
            
            
              that it can defragment the heat. So you can see around there where
            
            
            
              I have affected the fragmentation. That's where we just updated PGC
            
            
            
              information. And you see that trend going up. And that's when PGC has the most
            
            
            
              recent information of every region, so it
            
            
            
              can better collect and defragment. And with time that trend
            
            
            
              going down, it's going to start up again and GMP is going to kick in
            
            
            
              again. So you can update BGC. And that's how a balanced policy
            
            
            
              behaves. So in balance, we do have a right
            
            
            
              barrier. It's very similar to the Gencon barrier because we
            
            
            
              might have multiple references coming from different regions to this region,
            
            
            
              and PGC doesn't really know how to keep track
            
            
            
              of that. So we need a right barrier for that. So actually here we do,
            
            
            
              unconditional barrier if you think about it. So we just study the card
            
            
            
              whenever we set a field of an object and then at the beginning of the
            
            
            
              PGC we go through this card to see if there's an iter region reference.
            
            
            
              So you see here, whenever we think about, read and write barriers,
            
            
            
              it's always a matter of who should we put the burden on.
            
            
            
              Should we put the burden on the motel thread or should we put the burden
            
            
            
              on the GC thread. So here we chose to put the burden on the GC
            
            
            
              thread so that we don't penalize motel threads as much.
            
            
            
              Now, arraylets, so you probably heard me talking about
            
            
            
              arraylets a few times here. So arraylets are a
            
            
            
              cleverer way to store larger rates in
            
            
            
              a region based GC. So these arrays have a spine,
            
            
            
              right? And this spine is going to be pointing to the data allocated
            
            
            
              with this array. Each leaf of this array, which is
            
            
            
              are the different chunks of the array, consumes the entire region. So how does
            
            
            
              it look? Actually, if I were to graph forward to
            
            
            
              visualize this, so we have the heap here, and the bottom we have the
            
            
            
              array header. Arrayoids represent the pointers to
            
            
            
              the array data. So imagine if you want to
            
            
            
              access element one of this array. What we do is we follow reference,
            
            
            
              the first reference here, the first arrayoid, and you can see we go to the
            
            
            
              last blue region and access the element that way.
            
            
            
              So you can see here, it's really nice way to store large
            
            
            
              arrays in a heap as opposed to having this array
            
            
            
              span multiple regions. But there's a problem.
            
            
            
              This array that's divided into regions cannot
            
            
            
              work with APIs that require a contiguous representation
            
            
            
              of this array. So how do we deal with this before? So this
            
            
            
              is the case of Jni APIs, Java native interface
            
            
            
              critical APIs. They actually need a contiguous review
            
            
            
              of the array, which is not the case of array, lets. Right. So whats we
            
            
            
              do? So we actually make a copy of this array, right, and then
            
            
            
              copy element by element to this temporary array, pass that array,
            
            
            
              copy to the J nine API. Then the J API is going to do whatever
            
            
            
              it does with the array, maybe modify some of its elements,
            
            
            
              and then we need to copy everything back. So you can see there
            
            
            
              are like two, imagine we have large arrays like 1050 megabytes
            
            
            
              array. As you can see, that's really expensive. So how can
            
            
            
              we deal with that? That's where double mapping comes in. So we can make
            
            
            
              this large arrays discontinuous arrays look contiguously.
            
            
            
              So we take the idea that physical memory is limited,
            
            
            
              but virtual memory is not. On the contrary, virtual memory address space
            
            
            
              is very very large in 64 bit systems to the power 64
            
            
            
              in fact. So what we do is we map two virtual memory addresses to
            
            
            
              the same physical address. And the nice thing about this is that any
            
            
            
              modifications to any of the virtual address is reflected in the
            
            
            
              other virtual address. So ZGC does a similar trick to keep track
            
            
            
              of the different faces of the GC. So every object is
            
            
            
              going to have actually multiple copies in the vitro address space, but they actually point
            
            
            
              to the same physical address. Here we just do it for arrays.
            
            
            
              How would this look like if you were to visualize? So in the
            
            
            
              bottom part there we have the actual arraylets, right, the different regions of the arrays
            
            
            
              and then we map that to a second virtual address and we
            
            
            
              do in such a way that the second vitro memory is going to look contiguously
            
            
            
              in the vitro space. All we have to do now is pass the address of
            
            
            
              that second virtual memory to the gen nine API and every
            
            
            
              modification to that virtual memory is going to be reflected in the original
            
            
            
              virtual address. That way we don't have to copy everything to the temporary array and
            
            
            
              copy everything back. So there's a really nice trick that we do in our
            
            
            
              region based DC and with that we actually boosted array operations
            
            
            
              by 30 times, which is pretty cool. So what's next? Double map
            
            
            
              has some downsize rights, only available Linux and 64
            
            
            
              bit systems. But can we do better? Some platforms doesn't
            
            
            
              really support this double mapping trick. So what we're actually working
            
            
            
              right now is on a technology called off heap where
            
            
            
              we're actually going to store large objects at a secondary heap
            
            
            
              and that way we can do some more tricks and try to
            
            
            
              optimize this large object even further. So stay tuned.
            
            
            
              This is a quick GC policy guide.
            
            
            
              So on the left we have the GC policy and on the right we
            
            
            
              have some characteristics. For instance, some of them,
            
            
            
              some of these gcs are generational, right Gencon and Gencon
            
            
            
              cs and balanced. Some of them is going to have some concurrent faces.
            
            
            
              The only one that has no concurrent face at all is of throughput but of
            
            
            
              throughput has the highest throughput and that's because it
            
            
            
              doesn't have to deal with read or write barriers. And these different policies
            
            
            
              have different heap layouts, right? Gencon, Genco has the
            
            
            
              new and old separation right balance and metronoma region
            
            
            
              based and opt throughput has a small object area and large object area.
            
            
            
              This is another table where it shows for which domain is
            
            
            
              which policy is better. This is not, you shouldn't follow this by
            
            
            
              the foot, right? You should actually test the policy because
            
            
            
              even though you might have like a desktop application, depending on your workload,
            
            
            
              another policy might work best. So here for such a general tip.
            
            
            
              Gencon are good for web servers, desktop applications,
            
            
            
              balanced. RG one is good for very large heaps, and metronome is
            
            
            
              good for systems that have softer real time
            
            
            
              constraints. And optropod is very good for small heaps.
            
            
            
              Or if you don't care about GCP poses at all. And you can see here
            
            
            
              different policies are going to have different throughput and different gcpuzzles. You can
            
            
            
              see Genco Cs has not the best throughput,
            
            
            
              but whats the fewer GCPU on the other hand of throughput
            
            
            
              has the highest but the largest GCP.
            
            
            
              And that's because of concurrency and how we deal
            
            
            
              with that. That's because read barriers and write barriers has
            
            
            
              overhead on the GC. And you can see here all
            
            
            
              policies are going to have some type of additional memory.
            
            
            
              Maybe they have like a mark map or something to keep
            
            
            
              track of these objects, because we need a way to keep
            
            
            
              the DC needs these extra data structures
            
            
            
              to help collect the object. And that's what
            
            
            
              this middle column is telling you about. This is
            
            
            
              like a diagram showing the similarities between the different
            
            
            
              and the most common GCS policies out there. So on the top there
            
            
            
              in the yellow we see the most common gcs. So Gentcon is very
            
            
            
              similar to cms from hotspot. I think I've been saying cs,
            
            
            
              I apologize. So Gencons is similar to cms from hotspot.
            
            
            
              So both are gencon, both have neo and old space balanced,
            
            
            
              and G one very similar to each other. Right? Both are
            
            
            
              rigid based, both are generational, right? Then we have
            
            
            
              on the blue section there we have the puzzler gcs,
            
            
            
              which their objective is to have the least amount
            
            
            
              of poses and the shorter puzzles, right? So you
            
            
            
              can see the thickness of the lines there are thinner
            
            
            
              because even though they share some similarities, not as similar as the ones
            
            
            
              above, right? So ZGC, Chin and door, they're both region based,
            
            
            
              but they are not generational. Gen conference is not region based, but it
            
            
            
              is generational. Metronome is region based, but it's
            
            
            
              not generational. And remember, ZGC uses
            
            
            
              that multimapping kind of technology, right? Shindo version
            
            
            
              two uses the same read barrier as Gencon CS Shillando
            
            
            
              version one uses Brooks Pointer, which is kind of a different
            
            
            
              technology. And on the green side there we have
            
            
            
              Azul C four, which also puzzle legacy, but you require
            
            
            
              like a special hardware or some special software.
            
            
            
              Now this is the summary of the talk here.
            
            
            
              If there's one thing that I want you to get out of this is that
            
            
            
              there's no perfect fits, no perfect GC that fits everything,
            
            
            
              all workloads, right? And if you want a GC that
            
            
            
              has perfect throughput and you don't care about puzzles at all, you should go with
            
            
            
              the perfect stop, the wood GC, because no matter what, that's always going to have
            
            
            
              a higher throughput as opposed to a perfect puzzleless GC. And that's because the
            
            
            
              read and write barriers overhead, even though they're very important for
            
            
            
              the currency of these puzzles, gcs, there is an overhead of,
            
            
            
              because for every object read or write you
            
            
            
              need to make sure that the GC has that knowledge whenever
            
            
            
              we are changing those references along with the application,
            
            
            
              right? So if you don't care about the pauses, go for perfect
            
            
            
              stopwdc. If you do care about deposits, go for puzzles
            
            
            
              GC. But if you care about both, or you want to balance
            
            
            
              between the two, can go for the balanced GC or G one, right? Or Gencon
            
            
            
              Semas. So you have the options, right? Both extremes,
            
            
            
              perfect stop the road, perfect puzzlers and something somewhere in the middle.
            
            
            
              And these different policies deal with heap fragmentation
            
            
            
              differently, right? Depend they might have
            
            
            
              the heap configuration different, so therefore they're going to collect objects
            
            
            
              differently, right? Remember the barriers, right? Snapshot at the beginning,
            
            
            
              barrier against just a regular write barrier, right? So there
            
            
            
              are a lot of things here to playing when it comes
            
            
            
              to a perfect GC policy. But again,
            
            
            
              there's not going to be a perfect GC for every policy.
            
            
            
              You should, whatever workload you have, you should try three, four different
            
            
            
              policies and to see what's going to work best for that workload.
            
            
            
              So here are some links. OPG nine is a very cool
            
            
            
              project. We do recommend people to come check it out. That second link there,
            
            
            
              CMD line migration is for those people that are familiar
            
            
            
              with hotspot, right. And want to try out OPG. So I really recommend looking
            
            
            
              at that. Adopt opendicates, the website whats will get the binaries.
            
            
            
              And Omar is another project that has many different
            
            
            
              components to build a runtime. And Opgnan uses a lot of
            
            
            
              these components. For instance, uses GC component and compiler
            
            
            
              component. It's really cool. Also open source project and we welcome
            
            
            
              contributors there as well. So that's it for me.
            
            
            
              I'm very happy if you can connect with me LinkedIn,
            
            
            
              Twitter and before I open to
            
            
            
              questions, I want to show this book here. If you want
            
            
            
              to know everything there is about garbage collection, this is the book. Everything you want
            
            
            
              to know about is in there. And these are some
            
            
            
              of the common options that if you're
            
            
            
              interested to go through. If you want to start, like tweaking your
            
            
            
              application, you can start with these ones. They are most,
            
            
            
              let's say simple. The first one there is only available on
            
            
            
              Obajni. And the nice thing about verbose GC
            
            
            
              option is that you can actually visualize how your memory is
            
            
            
              behaving under an application with a tool called GcMV.
            
            
            
              And you can only visualize this if you're running an application with
            
            
            
              the Openj nine VM because with hotspot they have a different variables.
            
            
            
              So if you're running with Openj nine, you can use that verbals GC to
            
            
            
              capture the lock of the GC and visualize that with GcMB, which is really
            
            
            
              cool. If you remember that graph I showed you balanced, I use GCMB
            
            
            
              with that. I put some colors into it, but that's pretty much what you're going
            
            
            
              to get with GcMB, which is really cool. So I think that's it for
            
            
            
              me. I'll stop here and I'll check it out for any questions.
            
            
            
              Thank you so much.