I thought I knew C# : Garbage collection & disposal. Part-I

Yeah, that’s right, time to be a garbage guy. And if this line didn’t make you laugh, you probably know how bad of a stand up comedian I ever would make.

How things goes when anyone asks about these:

Now this is going to be long. And I’m going to jump into what I want to talk about right away. Let’s start with a regular Joe who writes C#. Let’s tell him to write a “safe” looking block of code that would essentially open a gzip file, read it as byte array, decompress the byte array using a buffer, write it over a memory stream and return it when he is done doing the whole thing.

This is something you might expect in return. This would be the block where the aforementioned byte array gets decompressed:

    public class Solution
    {
        public static byte[] Decompress(byte[] gzip)
        {
            using (MemoryStream memoryStream = new MemoryStream())
            using (GZipStream gzipStream = new GZipStream(memoryStream, CompressionMode.Decompress))
            {
                const int size = 4096;
                byte[] buffer = new byte[size];
                using (MemoryStream writeStream = new MemoryStream())
                {
                    int count = 0;
                    do
                    {
                        count = gzipStream.Read(buffer, 0, count);
                        if (count > 0)
                        {
                            memoryStream.Write(buffer, 0, count);
                        }
                    } while (count > 0);
                    return writeStream.ToArray();
                }
            }
        }
    }

And this would be a possible segment where you see the file being read:

    class Program
    {
        const int _max = 200000;
        static void Main()
        {
            byte[] array = File.ReadAllBytes("Capture.7z");
            Solution.Decompress(array);
        }
    }

And yes, the code sample credit goes to dotnetperls . The reason I started with an example before any explanations is I want to build up on this. I believe when you have a place to go, sometimes the fact that you know where you would end up reinforces what you will learn in the process. But it is very important here to know why you would end up here.

Breaking the code bits:

Let’s start breaking the code up into bits. Now, the first question you would ask the regular Joe like me is how do you claim this code block is “safe” and what do you mean when you say this is “safe”. The first answer would essentially be that there is a beautiful using block there which would essentially dispose the resources when it is done being used.

I focused three words here, throughout this articles I would bold out words like this and whenever I do that, if you don’t know exactly what I’m talking about, put these words in a dictionary in your brain and in turns I will explain all these. That also means if you think you know all of these, your journey ends here.

Words of wisdom:

The first word of wisdom here is dispose. And I would essentially start with some basics for it. Before going into dispose, we need to dive back on some proper backgrounds on .net garbage collection process. If you are totally new to this, garbage collector is an automatic memory manager, it lets you develop your application without having the need to free the memory every time. If I drove you inside more confusion, that means you need to know what happens when you allocate some memory, essentially which happens every time you declare and initiate any value or reference you write when you are writing C#.

Every time you write the new keyword to initialize an object in C#, you essentially allocate the object in a managed heap. And garbage collector automatically deallocates the objects that are not being used anymore from the managed heap “some time in the future”.  If you are already curious what is a managed heap, fret not. I will explain that too. But before that, lets talk about some fundamentals on memory.  When we essentially write C#, we essentially use a virtual address space. Since you have a lot of processes in the same computer who shares the same memory (read your RAM here) and you would essentially need them not to overlap with one another. Each process then needs to address a specific set of the memory for them and thus you have your virtual address space mapped for each process. By default 32 bit computers has around 2GB user mode virtual address space. When you are actually allocating memory, you allocate memory on this virtual address space, not the physical memory. For this, the garbage collector works on this virtual space and frees up this virtual memory for you automatically. Neat, huh?

I need memory:

What actually happens when we essentially write something like the following:

    static void Main()
    {
        Cat cat = new Cat("Nerd");
    }

Looks like we are initiating a harmless cat with the name Nerd. When you compile this C Sharp compiler will generate a common intermediate language (IL/CIL) code so the JIT compiler in CLR can compile those for any possible machine configuration. You see, I said a lot of jargons, I didn’t bold them out because I’m not going to talk about them here. Now the intermediate code that is being generated here kind of looks like this:

IL_000a:        newobj instance void CilNew.Cat::.ctor (string)

It looks about right, we only care about the newobj instruction here. This specific instruction needs to do three things.

  1. Calculate the total amount of memory you require for the object.
  2. Look for space in the managed heap for space.
  3. When the object is created, return the reference to the caller and advance the next object pointer  to the next available slot on the managed heap.

Im quiet sure number 1 is very very easy to understand here. Why would we need to look for space in the managed heap then? Lets look at this first.

managed-heap

If you look at the example here, now it should be pretty clear to you what I meant. If the next object pointer doesn’t find enough space to fit the next object in, you would expect a OutOfMemoryException . This can also happen when you don’t have enough physical memory either. This picture also can mislead you. I will come to that now. You might think now Virtual address space is contiguous always. Well, it’s not. Virtual address space can be fragmented. This means that there are free blocks or holes among used blocks and the virtual memory manager has to find a big enough free block to allocate so you can instantiate your variable. So, even if you have 2GB virtual address space, this does not mean you have 2GB contiguously. If you ask the virtual memory manager for 2GB of space, it could fail due to the fact you dont have that amount of contiguous address space. But for regular explanations that picture will suffice well.

Now we know how objects are allocated and we spent some time on what is the managed heap and how objects are allocated on the managed heap. The reason we discussed about this is to make you understand why you need garbage collection and when it is triggered.

States of virtual memory:

There are three states of the virtual memory. Free state says this block of memory is available for allocation. When you request for allocation, it goes to Reserved state. Much like booking a hotel. Now your memory block is reserved for you but not used yet. And no one else can use this block either because you reserved it. When you finally use it, it goes to Committed state. In this state, the block of memory has a physical storage association.

The garbage collector kicks in:

There are definitely multiple conditions which are responsible for garbage collection. And you already know the very first one now. When you run out of space for a new allocation in the virtual address space. We are going to jump in and see what actually the garbage collector does in a very basic level.

Garbage collection happens in two stages. Mark and Sweep. The mark essentially searches for managed objects that are referenced in managed code. It will attempt to finalize objects that are unreachable. That is the first thing to do on sweep stage. The last work to do on sweep stage is to reclaim the memory of the unreachable objects now.

I know you are thinking what is managed code. We would come back to this in the journey. Don’t worry. For now, keep in your mind that garbage collector can only deal with managed code.

So, the technique is essentially to mark objects the program might be using and just clean off the rest one. But, how the garbage collector would know which objects it needs to clean? How would it decide which objects are unreachable. It does it using something called Object Graph which is not essentially under the context of this article. But I do have a nice representation to go with.

object-graph-pr-before-gc

Lets assume this is the situation in the heap. You have a managed heap like this and lets assume the garbage collector kicks in due to less memory. It would essentially look like the following after the collection.

object-graph-pr-after-gc

Now it should be evident to you what basically happens in a garbage collection from a birds eye view. Marked man needs finalization and then it leaves.

I still didn’t properly explain how you essentially get these marked objects.  To understand that properly, we need to understand about generations.

Generations inside the heap:

Generations inside the heap essentially dictates how long the object would be essentially needed. And thus it is divided into long-lived and short-lived objects. There are three generations here and the indexing starts from zero:

  1. Generation 0: This is the youngest generation and contains short-lived objects. Temporary and newly allocated objects live here. This is the part of the heap where garbage collection happens very frequently.
  2. Generation 1: This is essentially a buffer between generation 0 and generation 2. Generation 2 contains long-lived objects. Generation 1 essentially holds the objects who is still looking to be short-lived but survived generation 0.
  3. Generation 2: This is the generation of long-lived objects. These objects are usually objects that stays for long time in the process. Statics come first in mind. And a new object can be allocated straight to generation 2 instead of generation 0 if it’s really big. Like a big array with a lot of space allocation.

Garbage collections are generation specific but the collection is recursive up until the younger generation. So it clears generation 1, it would also clear generation 0. If GC clears generation 2, it would also go down to clear generation 1 and generation 0.

I used the word survive  a moment ago. What I wanted to say is if an object doesn’t get reclaimed/cleaned up during a sweep operation over a generation it gets promoted to the next generation. If survival rate is higher in a generation, GC tries to increase the threshold of allocation of that specific generation. So in the next cleanup the application gets a big size of memory freed.

One more thing to remember here. Garbage collector would stop any managed thread to work. So, it has to be quick and efficient unless you are looking at performance penalties.

Back to the code bits:

If you have survived up until now, you deserve to go back to the code bit at the beginning. The first thing I’m going to clear up is the managed vs unmanaged resources. Managed resources are directly under the control of the garbage collector. It is a result of managed code which would eventually compile to intermediate language. Unmanaged resources are resources your garbage collector don’t really know about. That includes, open files, open network connections and of course unmanaged memory. Now, if you are using C# classes to do these, most of the times these are almost managed. That means the managed code does the “dirty work” inside and you don’t have to clean up these yourselves. The garbage collector would clean up the managed wrapper and the managed wrapper would clean up the unmanaged code in the disposal process.

Now, lets go back to the word dispose here. How would I dispose something off my code. Is there a method somewhere, something I could use? Indeed there is. A dispose method implementation has essentially two variations. Since your garbage collection can’t handle unmanaged resources, you need to wrap them. The first technique is to wrap them under any class derived from SafeHandle class and use IDisposable  interface to make it properly disposable. This very interface would expose the dispose() method you need and you would use that to dispose the resources yourself.

I explained in details how to do that in the next part.

Advertisements