Creating a nano scripting language using ANTLR and Roslyn

For everyone of us who wakes and codes everyday somewhere in this world, we often find ourselves pretty attached to the programming language we love most. All of us feels that point of idiosyncrasy when we try to break our boundaries and try to learn something new since this field is always expanding and evolving like a universe on steroids.

For those who still are newbies and is trying to find ways to understand how a programming language works, what could be better than trying to make one of your own?¬†ūüôā This post is essentially a small proof of concept where we will go for a small nice fun looking esoteric programming language.

Before we jump into the very tidbits inside, allow me to explain what we will do here. We will write a small programming language that is parsed and lexed by ANTLR, transpiles to C# and the transpiled code is fed into Roslyn C# script api.

If the aforementioned words are looking too wordy for you, let me clear it right up for you. Like every language we use to talk every day, programming language comes with a grammar itself. It should be clear as a daylight to you if you have written a single line of code in your life. Every programming language follows a pretty defined structure and a dancing parade of words following that. So, since we are creating one, the first thing we need is that grammar for our language. Instead of doing it from scratch, our tool of choice is ANTLR, which stands for ‚ÄúAnother Tool for Language Recognition‚ÄĚ. ANTLR will help us to define the grammar, generate the lexer and parser for it and we will eventually be able to reuse those components to transpile our code to C#. Remember ANTLR do support for javascript, java, python too. So, if you want to have your lexer and parser defined in those languages, please don‚Äôt refrain yourself from using this.

Now, even before we start talking about languages and everything else, we need to understand a bit about compiler theory. Now, I can go into gory details of lexer and parser but since this is an age of google, I will let that responsibility fall in your hand. We will take somewhat of an exploratory way to understand how everything works and take it from there.

Let’s assume that we have a language like this:

derp a = 20 ':)' #Initialization

# basic if-else
a > 2 ???
yep ->
    a = 5 ':)'
kbye

# print
dump a ':)'

The first thing we need is a grammar. To understand the programming language we need something that understands all the words here. And that guy is the lexer. A lexer eats up the whole code block we send to the compiler, chops that into little lexemes (read strings/words here). So we can understand when we have hit a specific keyword. Like for our dummy language here, we used the popular internet lingo derp to initialize a variable.

The next thing in line to do is making some meaningful excerpt from the words. Like when we write a if  block, the compiler needs to know what strings of characters or words should stand together to construct a meaningful statement or expression. That is the job of a parser. If you see closely, you will see that every block of code can be expressed as a tree. For example, the initialization block here  looks like the following:

Assignment

This is definitely a single branch tree, it says you need to write the word¬†derp to start the statement and you should end it with a smiley ( ūüôā ) to finish the statement. In the middle you use an identifier (name for your variable) and an ASSIGN operator (=) to define the flow of assignment from right to left. That tree is actually generated straight from the grammar we are going to use today.

So, why wait? Let’s have a look at the grammar we are going to be using today:

grammar Profane;

compilationUnit: statement* EOF;

statement:
        printstmt
        | assignstmt
        | ifstmt
		| setstmt;

printstmt	: 'dump' expr? SMILEY;
assignstmt	: 'derp' ID ASSIGN expr SMILEY;
setstmt		:  ID ASSIGN expr SMILEY;

ifstmt  :
        conditionExpr '???'
        'yep ->'
            statement*
        'kbye';

conditionExpr: expr relop expr;
expr: term | opExpression;
opExpression: term op term;
op: PLUS | ASSIGN | MINUS;
relop: EQUAL | NOTEQUAL | GT | LT | GTEQ | LTEQ;

term: ID | number | STRING;

number: NUMBER;

// Keywords
ID: [a-zA-Z_] [a-zA-Z0-9_]*;
SMILEY: ':)';
WS: [ \n\t\r]+ -> skip;

PLUS    :'+';
EQUAL   : '====';
ASSIGN  : '=';
NOTEQUAL: '!!==';
MINUS   : '-';
GT      : '>';
LT      : '<';
GTEQ    : '>=';
LTEQ    : '>=';

fragment INT: [0-9]+;
NUMBER: INT ('.'(INT)?)?;
STRING: '"' (~('\n' | '"'))* '"';

Now, my first advice here will be to not get confused by the size of the grammar and start taking out bits we do understand at first. If you refer to the first image in this and have a look at the grammar you will see that the rule assignstmt is defined the exact same way depicted in the picture. If you find the words ID, ASSIGN and SMILEY, you will also see that all of them are defined in the grammar below where they have their literal form. These tokens helps the parser to understand what you have written. And the assignstmt is called a rule, which define relations among  different tokens or rules. See? this is how we build rules for our grammar. Because a grammar is nothing but a set of well defined rules.

By the very first rule named compilationUnit,our language is nothing but a set of statements ending with an EOF. Every statement can either be a printstmt (for printing), ifstmt (if-else) or the aforementioned assignstmt. Neat huh? You can even go down and find out how the rest of the tree is built.

I strongly suggest using Visual Studio Code and its ANTLR4 extension for writing ANTLR grammar files. You will have nice looking railroad diagram like the one I posted for every rule you write!

Time to get our hands dirty. Fret not since the whole project is publicly available over github so I will only point out excerpts of the code that we need to visit. And I’m going back to the same¬†assignstmt. First thing we need to do is download ANTLR¬†. Go to the download page and download the latest .jar. ANTLR is written in Java so you will need to have java installed in your machine. We will use .net core here for the rest of the work so you can download that for your OS too. When you are done downloading ANTLR you can generate the target code for C# from your grammar file. In our case the grammar file name was Profane.g4 and to generate the C# lexer and parser based off the code all we had to do is invoke :

java -jar antlr-4.7-complete.jar -Dlanguage=CSharp Profane.g4

It will generate the C# classes you need for walking through the grammar. You will see a lexer and a listener class being created. You don’t really need to touch the lexer class. The listener class is used to handle events that are fired when ANTLR encounters a rule. So, to make use of our assignment statement we need to handle the event that is fired when ANTLR encounters our¬†assignstmt¬†rule. But how will we ever understand which event is that? Here comes the ANTLR magic.

ANTLR will generate the event you need for all your rules every time you create a lexer and parser from your grammar. That means every time you change your grammar you need to regenerate the C# targets again, the lexer and listener class. I added the command in the build target for the sample project so it will automatically do it on build.

So, lets create a class inherited from ProfaneBaseListener  class which was auto generated by ANTLR and name it ProfaneListener. Since our rule was named assignstmt ANTLR will generate a method called EnterAssignstmt inside it which will be invoked every time ANTLR encounters that type of statement or that rule. Our target here is to generate equivalent C# code from it so

derp some = 10 ':)'

will be transpiled in C# equivalent code like the following.

dynamic some = 10;

And we are going to do that in the method¬†EnterAssignstmt which ANTLR has built out for us. Let’s have a look at it.

        public override void EnterAssignstmt([NotNull] ProfaneParser.AssignstmtContext context)
        {
            string target = context.ID().GetText();
            dynamic value = this.ResolveExpression(context.expr());
            this.Output += "dynamic " + target + " = " + value + ";\n";
        }

This is really simple work here. We are asking the context class to give us the ID and the expression so we can generate the equivalent C# code string. The nice thing you can notice here is ANTLR has generated class for our assignment statement context itself. But what is that ResolveExpression method. If we remember the rule it was :

Assignment
Rule assignstmt

Here expr is another rule. Which looks like.

expr
Rule expr

That means, expr can either be a rule named¬†term or a rule named¬†opExpression.¬†Let’s have a look at both of them.

First is the rule called term. A term can be an identifier (the name for your variable), a number (int or float) or a string. Either one of these.

term
Rule term

That means that term can also be a valid value for rule expr. Since expr is either term or opExpression

The opExpression on the other hand is a tad complex one. This is an example where you can reuse multiple rules to create a complex rule.

opExpression

The only thing that we don’t know here is the¬†op rule. the¬†opExpression rule says it is structured as term followed by a rule¬†named¬†op and there has to be another¬†term in the end.

op
Rule op

The¬† op¬†rule defines an OR relationship between PLUS, MINUS and ASSIGN which stands respectively for ‘+’, ‘-‘ and ‘=’. That means this rule says we can write things like

derp some = 10 + 2 + someOtherDerp ūüôā

Amazing! Isn’t it?

Before we look into the¬†ResolveExpression¬†method, let’s see a nice looking tree on how that previous statement actually gets parsed by all these rules.

assigngraph

This will hopefully help you to understand how things are happening here. Now, lets see the ResolveExpression method which transpiles the rule expr. 

        private dynamic ResolveExpression(ProfaneParser.ExprContext exprContext)
        {
            var opExpression = exprContext.opExpression();
            if (opExpression != null)
            {
                return ResolveOpExpression(opExpression);
            }
            else
            {
                return ResolveTerm(exprContext.term());
            }
        }

        private dynamic ResolveOpExpression(ProfaneParser.OpExpressionContext plusContext)
        {
            var leftTerm = plusContext.term().First();
            var rightTerm = plusContext.term().Last();

            var left = ResolveTerm(leftTerm);
            var right = ResolveTerm(rightTerm);

            return left + plusContext.op().GetText() + right;
        }

        private dynamic ResolveTerm(ProfaneParser.TermContext termContext)
        {
            if (termContext.number() != null)
            {
                return termContext.number().GetText();
            }
            else if (termContext.ID() != null)
            {
                return termContext.ID().GetText();
            }
            else if (termContext.STRING() != null)
            {
                Regex regex = new Regex("/\\$\\{([^\\}]+)\\}/g");
                var contextText = termContext.GetText();
                var replacedString = regex.Replace(contextText, "$1");
                return replacedString;
            }
            else return default(dynamic);
        }

We also see two new methods the¬†ResolveExpression method uses named¬†ResolveTerm and¬†ResolveOpExpression. They generate the transpiled C# code for term rule and opExpression rule. We keep adding the generated code in a string named output and when it’s done we have our C# transpiled code ready to be executed.

Executing C# as a script using Roslyn:

Now that we have our C# code to be executed, we will use another tool called Roslyn. It is a compiler tool for .net that gives you rich set of features regarding code analysis and compilation. We will specifically be using the C# scripting api.

First we need to generate the AST for our code. If you are scratching head on what an abstract syntax tree is, you already saw something like it in the tree we posted before. To generate the tree, you need your lexer. Your lexer will generate tokens. Your parser will use those tokens to start the execution unit you need to compile your code which is the root node for your tree. This is still done using ANTLR.

When you have your tree, you need to walk on it. When you walk on the AST, the listener fires the events we need to generate the transpiled code like we did minutes ago. So the listener output is our C# code which we need to feed to roslyn.

Roslyn C# script engine comes with a nifty class named CSharpScript which will do the trick for you. All we have to do is to feed the C# code and load the assemblies we will need. Then, it will do its magic and we will have our own scripting language talking to us.

So, our transpiler class looks like

    public class ProfaneTranspiler
    {
        private ProfaneListener listener;
        private static readonly MetadataReference[] References =
        {
            MetadataReference.CreateFromFile(typeof(object).GetTypeInfo().Assembly.Location),
            MetadataReference.CreateFromFile(typeof(RuntimeBinderException).GetTypeInfo().Assembly.Location),
            MetadataReference.CreateFromFile(typeof(System.Runtime.CompilerServices.DynamicAttribute).GetTypeInfo().Assembly.Location),
            MetadataReference.CreateFromFile(typeof(ExpressionType).GetTypeInfo().Assembly.Location),
            MetadataReference.CreateFromFile(Assembly.Load(new AssemblyName("mscorlib")).Location),
            MetadataReference.CreateFromFile(Assembly.Load(new AssemblyName("System.Runtime")).Location)
        };

        public ProfaneTranspiler()
        {
            this.listener = new ProfaneListener();
        }

        public ProfaneParser.CompilationUnitContext GenerateAST(string input)
        {
            var inputStream = new AntlrInputStream(input);
            var lexer = new ProfaneLexer(inputStream);
            var tokens = new CommonTokenStream(lexer);
            var parser = new ProfaneParser(tokens);
            parser.ErrorHandler = new BailErrorStrategy();

            return parser.compilationUnit();
        }

        public string GenerateTranspiledCode(string inputText)
        {
            var astree = this.GenerateAST(inputText);
            ParseTreeWalker.Default.Walk(listener, astree);
            return listener.Output;
        }

        public async Task&amp;amp;amp;amp;lt;TranspileResult&amp;amp;amp;amp;gt; RunAsync(string code)
        {
            var result = new TranspileResult();

            if (string.IsNullOrEmpty(code))
                return result;

            Stopwatch watch = new Stopwatch();
            watch.Start();

            try
            {
                ScriptOptions scriptOptions = ScriptOptions.Default;
                scriptOptions = scriptOptions.AddReferences(References);
                scriptOptions = scriptOptions.AddImports("System");

                var resultCode = this.GenerateTranspiledCode(code);
                if (resultCode == null)
                {
                    watch.Stop();
                    result.TimeElapsed = watch.Elapsed.ToString();
                    return result;
                }

                var outputStrBuilder = new StringBuilder();
                using (var writer = new StringWriter(outputStrBuilder))
                {
                    Console.SetOut(writer);
                    var scriptState = await CSharpScript.RunAsync(resultCode, scriptOptions);
                    result.output = outputStrBuilder.ToString();
                }
            }
            catch (Exception ex)
            {
                result.output = ex.Message;
            }
            finally
            {
                watch.Stop();
                result.TimeElapsed = watch.Elapsed.ToString();
            }
            return result;
        }
    }

I uploaded the full sample code in github here. You need to build and run the Profane project which is a console app. It will host a small web api in port 5000. If you POST your code to the endpoint as plain text in the POST body, you will get back the output of your code. Postman can be a nice client to do so.

Hope this was fun. Knowing the internals of your daily programming language essentially boosts up the confidence while you write it. So make your own esoteric language if you have time. It’s always fun to make em.

Confessions

Before I start writing this, I must say this is probably the first time I’m treating this blog like I want it to be treated. Most of the time I wrote things people only cares about when they want their problem fixed. For the last couple of months or a year I didn’t even write much. It seems that socials has taken a big chunk of my time and now I’m trying to get out of that addiction.

Money, papers with vengeance

Around 4 years ago, I gave into the concept of working for money. Money talks, doesn’t it? It talks louder than anything else. So, that was it for me. I went out with a little help of someone I know, went into an interview, practically said I would do anything to get some money and started working.

I started working on perl. It’s a nice little language (believe me, it took a long time for me to come to this verdict). Nice and amusing as it is, it was indeed mischievous. For a guy who doesn’t know much about linux or any unix per say, it was a hard time to cope up and get going. Boy, I hated it in the beginning. I used to mail my superior almost everyday to make sure that someday he ports me somewhere I can write something C#.

The fact I liked Windows so much wasn’t a fun party either. The internet is right now is full of sages. Sages who are so opinionated about everything so that unless you don’t come in terms with what they believe, they’d literally treat that act as a blasphemy. Boy, I even had a facebook page dedicated for trolling me and one of my very close mentors. Fun times, I still don’t know who was behind that. Wish I knew. But that’s another story for another day.

Present is a gift, a confusing gift indeed

Right at this point of my life, I’m probably the most confused as ever. When I was a kid, i saw my mom and dad work, work like hell. Just trying their best to put everything in place and make the ends meet. And the worst part of this is now I understand the necessity of it. I really didn’t at that time. Now I don’t have an excuse for being this person. Now, I don’t know what I am supposed to be.

It’s really funny. When I was a preteen or a teen all I ever wanted to become is a musician. That is of course before life came in and hit me in the nuts. That is when your dreams go down the drain and you start doing the thing you’re good at and the one that makes money. I was no different at all. I’m lucky that I’m passionate about it. Otherwise I don’t know what would happen.

Know thyself

When you start to understand who you are, two things will happen. You’d know your limits, you’d be a good friend to yourself. The worst part of all of this is, it would make you aware of everything. It would take away the spontaneous sets of emotions since now you can pretty much predict what your brain would do.

 Love and its baggage

Boy, this shit confused me more than anything else ever did in my life. I lost and gained a lot through love. For me I think it had to do a lot with finding your own purpose. When one person becomes a lot more important than any goal you ever chased, when that very person becomes your prime directive. And your brain goes to a hyperdrive. It starts to calculate all possible memories you could make with her. Its a tree your brain loves to parse. But beware, it also comes with attachments. For me it was even scarier, my childhood was insanely lonely and I basically grabbed anyone I saw that I imagined can play a part in my life. Scary, huh? Well you haven’t heard the scariest part yet. The moment you drop your guards all the insecurities will show their ugly heads. A person you want you around would see you as you truly are. That is sometimes perceived wrong on the other side. At some point you’d get scared to open up. Because you don’t know what it will cost you. It would only make you feel no one wants to be around you just because it is you. And that thought itself would isolate you from the rest of the world just like that.

To obligations and beyond

Obligations are the shittiest thing you’d ever encounter. They’d make you pay, I tell you. You look at your family, you see obligations. You look at the person you love most, you feel a drive to make things better for her. Fuck me, I can’t find anyone with a pair of eyes I don’t have to pay anything. I don’t owe anything. For hell as I sure,I don’t owe anything to this world. I dont. People talk about priorities. Trust me priority is a two way street. It only meets when both of them starts to work towards each other. Much like a bidirectional a star search.

Freedom from life itself

Rules never worked on me. Probably never will. My brain is a child with always asking “Why”. And I hate the fact that it does that. Not that it helps it anyway , it still does that. Youd think Kurt Cobain was wrong. I don’t see it that way. I don’t know why I have a twisted sense of judgement. The pursuit of happiness is said to be started when you leave your expectations behind. What if I expect to find something that will unbind me from expectations?

Paradox indeed.

I thought I knew C# : Garbage collection & disposal. Part-II

Okay, we are back in to collect “garbage” properly. If you haven’t the first part of this post you might want to read it here. We are following norms and conventions from the first post here too.

 Starting things from where we left off:

This write up continues after the first one where we ended things with SafeHandle class and¬†IDisposable¬†¬†interface. I do see how intriguing this is of course. But before that, like a unexpected, miserable commercial break before the actual stuff comes on your TV, let’s go have a look on something called finalization.

What you create, you “might” need to destroy:

If you know a little bit of C++ you’d know that there is a thing called¬†destructor and the name implies it does the exactly opposite of what a constructor does. C++ kind of needs it since it doesn’t really have a proper garbage collection paradigm. But that also raises the question of whether C# has a destructor¬†and if it does what does it do and why do we even have one since we said managed resourced would be collected automatically.

Lets jump onto some code, shall we?

class Cat
{
    ~Cart()
    {
        // cleanup things
    }
}

Wait a minute, now this is getting confusing. I have¬†IDisposable::Dispose() and I have a destructor. Both looks like to have same responsibility. Not exactly. Before we confuse ourselves more, lets dig a bit more in the code. To be honest, C# doesn’t really have a destructor, it’s basically a syntactic sugar and inside c sharp compiler translates this segment as

protected override void Finalize()
{
    try
    {
        // Clean things here
    }
    finally
    {
        base.Finalize();
    }
}

We can pick up to things here. First thing is the destructor essentially translates to a overridden¬†Finalize() method. And it also calls the¬†Finalize()¬†base class in the finally block. So, it’s essentially a recursive¬†finalize().¬†Wait, we have no clue what is¬†Finalize().

Finalize, who art thou?

Finalize is somebody who is blessed to talk with the garbage collector. If you have already questioning where the topics we discussed in last post comes in help, this is the segment. Lets start getting answers for all our confusions on last segment. First, why¬†Finalize¬†is overridden and where does it gets its signature.¬†Finalize gets its signature from Object class. Okay, understandable. What would be the default implementation then? Well, it doesn’t have a default implementation. You might ask why. A little secret to slip in here, remember the¬†mark¬†step of garbage collector. He¬†marks a type instance for finalization if and only if it has overridden the¬†Finalize() method. It’s your way of telling the garbage collector that you need to do some more work before you can reclaim my memory. Garbage collector would¬†mark this guy and put him in a¬†finalization queue which is essentially a list of objects whose finalization codes must be run before GC can reclaim their memory. So, you are possibly left with one more confusion now. And that is, you understood garbage collector uses this method to¬†finalize things, i.e. doing the necessary cleanup. Now, then why do we need¬†IDisposable where we can just override finalize and be done with it, right?

Dispose, you confusing scum!

Turns out dispose intends to point out a pattern in code that you can use to release unmanaged memory. What I didn’t tell you about garbage collector¬†before is you don’t really know when he would essentially do that, you practically have no clue. That also means if you write an app that uses unmanaged resources like crazy like reading 50-60 files at a time, you are using a lot of scarce resources at the same time. And in the end you are waiting for GC to do his job but that guy has no time table. So, hoarding these resources is not a good idea in the meantime. Since releasing unmanaged resources are developers duty, putting that over a finalize method ¬†and waiting for GC to come and invoke that is a stupid way to go. Moreover, if you send an instance to a finalization queue, it means, GC will essentially do the memory cleanup in the next round, he will only invoke finalize this round. That also means GC has to visit you twice to clear off your unused occupied memory which you kinda need now. And the fact that you might want to release the resources NOW is a pretty good reason itself to not wait for GC to do your dirty work. And, when you actually shut down your application, Mr. GC visits you unconditionally and takes out EVERYONE he finds. I hope the need of¬†Dispose()¬†is getting partially clear to you now. We need a method SomeMethod()¬†that we can call which would clean up the unmanaged resources. If we fail to call that at some point, just to make sure garbage collector can call that we will use that same method inside¬†Finalize() so it is called anyhow. If I have not made a fool out of myself at this point, you have figured out the fact that¬†SomeMethod() is¬†Dispose(). Okay, so we know what we are going to do. Now we need to act on it. We will implement the dispose pattern we have been talking about. The first thing we would do here is we would try to write code that reads a couple of lines from a file. We would try to do it the unmanaged way, then move it to slowly to the managed way and in the process we would see how we can use IDisposable there too.

Doing things the unsafe way:

I stole some code from a very old msdn doc which describes a FileReader class like the following:

using System;
using System.Runtime.InteropServices;
public class FileReader
{
    const uint GENERIC_READ = 0x80000000;
    const uint OPEN_EXISTING = 3;
    IntPtr handle;
[DllImport("kernel32", SetLastError = true)]
    static extern unsafe IntPtr CreateFile(
            string FileName,                    // file name
            uint DesiredAccess,                 // access mode
            uint ShareMode,                     // share mode
            uint SecurityAttributes,            // Security Attributes
            uint CreationDisposition,           // how to create
            uint FlagsAndAttributes,            // file attributes
            int hTemplateFile                   // handle to template file
            );
[DllImport("kernel32", SetLastError = true)]
    static extern unsafe bool ReadFile(
            IntPtr hFile,                       // handle to file
            void* pBuffer,                      // data buffer
            int NumberOfBytesToRead,            // number of bytes to read
            int* pNumberOfBytesRead,            // number of bytes read
            int Overlapped                      // overlapped buffer
            );
[DllImport("kernel32", SetLastError = true)]
    static extern unsafe bool CloseHandle(
            IntPtr hObject   // handle to object
            );
    public bool Open(string FileName)
    {
        // open the existing file for reading
        handle = CreateFile(
                FileName,
                GENERIC_READ,
                0,
                0,
                OPEN_EXISTING,
                0,
                0);
if (handle != IntPtr.Zero)
            return true;
        else
            return false;
    }
    public unsafe int Read(byte[] buffer, int index, int count)
    {
        int n = 0;
        fixed (byte* p = buffer)
        {
            if (!ReadFile(handle, p + index, count, &amp;n, 0))
                return 0;
        }
        return n;
    }
    public bool Close()
    {
        // close file handle
        return CloseHandle(handle);
    }
}

Please remember you need to check your¬†Allow Unsafe Code checkbox in your build properties before you start using this class. Lets have a quick run on the code pasted here. I don’t intend to tell everything in details here because that is not the scope of this article. But we will build up on it, so we need to know a little bit. The¬†DllImport attribute here is essentially something ¬†you would need to use an external dll (thus unmanaged) and map the functions inside it to your own managed class. You can also see that’s why we have used the¬†extern keyword here. The implementations of these methods doesn’t live in your code and thus your garbage collector can’t take responsibility of clean up here. ūüôā The next thing you would notice is the¬†fixed¬†statement. fixed statement essentially link up a managed type to an unsafe one and thus make sure GC doesn’t move the managed type when it collects. So, the managed one stays in one place and points to the unmanaged resource perfectly. So, what are we waiting for? Lets read a file.

static int Main(string[] args)
{
    if (args.Length != 1)
    {
        Console.WriteLine("Usage : ReadFile &lt;FileName&gt;");
        return 1;
    }
    if (!System.IO.File.Exists(args[0]))
    {
        Console.WriteLine("File " + args[0] + " not found.");
        return 1;
    }
    byte[] buffer = new byte[128];
    FileReader fr = new FileReader();
    if (fr.Open(args[0]))
    {
        // Assume that an ASCII file is being read
        ASCIIEncoding Encoding = new ASCIIEncoding();
        int bytesRead;
        do
        {
            bytesRead = fr.Read(buffer, 0, buffer.Length);
            string content = Encoding.GetString(buffer, 0, bytesRead);
            Console.Write("{0}", content);
        }
        while (bytesRead &gt; 0);
        fr.Close();
        return 0;
    }
    else
    {
        Console.WriteLine("Failed to open requested file");
        return 1;
    }
}

So, this is essentially a very basic console app and looks somewhat okay. I have created a byte array of size 128 which I would use as a buffer when I read. FileReader returns 0 when it can’t read anymore. Don’t get confused seeing this.

while (bytesRead > 0)

It’s all nice and dandy to be honest. And it works too. Invoke the application (in this case the name here is ¬†TestFileReading.exe) like the following:

TestFileReading.exe somefile.txt

And it works like a charm. But what I did here is we closed the file after use. What if something happens in the middle, something like the file not being available. Or I throw an exception in the middle. What will happen is the file would not be closed up until my process is not closed. And the GC will not take care of it because it doesn’t have anything in the¬†Finalize() method.

Making it safe:

public class FileReader: IDisposable
{
    const uint GENERIC_READ = 0x80000000;
    const uint OPEN_EXISTING = 3;
    IntPtr handle = IntPtr.Zero;
    [DllImport("kernel32", SetLastError = true)]
    static extern unsafe IntPtr CreateFile(
            string FileName,                 // file name
            uint DesiredAccess,             // access mode
            uint ShareMode,                // share mode
            uint SecurityAttributes,      // Security Attributes
            uint CreationDisposition,    // how to create
            uint FlagsAndAttributes,    // file attributes
            int hTemplateFile          // handle to template file
            );
    [DllImport("kernel32", SetLastError = true)]
    static extern unsafe bool ReadFile(
            IntPtr hFile,                   // handle to file
            void* pBuffer,                 // data buffer
            int NumberOfBytesToRead,      // number of bytes to read
            int* pNumberOfBytesRead,     // number of bytes read
            int Overlapped              // overlapped buffer
            );
    [DllImport("kernel32", SetLastError = true)]
    static extern unsafe bool CloseHandle(
            IntPtr hObject   // handle to object
            );
    public bool Open(string FileName)
    {
        // open the existing file for reading
        handle = CreateFile(
                FileName,
                GENERIC_READ,
                0,
                0,
                OPEN_EXISTING,
                0,
                0);
        if (handle != IntPtr.Zero)
            return true;
        else
            return false;
    }
    public unsafe int Read(byte[] buffer, int index, int count)
    {
        int n = 0;
        fixed (byte* p = buffer)
        {
            if (!ReadFile(handle, p + index, count, &amp;n, 0))
                return 0;
        }
        return n;
    }
    public bool Close()
    {
        // close file handle
        return CloseHandle(handle);
    }
    public void Dispose()
    {
        if (handle != IntPtr.Zero)
            Close();
    }
}

Now, in our way towards making things safe, we implemented¬†IDisposable¬†here. That exposed¬†Dispose() and the first thing I did here is we checked whether the handle is IntPtr.Zero and if it’s not we invoked¬†Close(). Dispose() is written this way because it should be invokable in any possible time and it shouldn’t throw any exception if it is invoked multiple times. But is it the solution we want? Look closely. We wanted to have a¬†Finalize() implementation that will essentially do the same things if somehow¬†Dispose()¬†is not called. Right?

Enter the Dispose(bool) overload. We want the parameter less Dispose() to be used by only the external consumers. We would issue a second Dispose(bool) overload where the boolean parameter indicates whether the method call comes from a Dispose method or from the finalizer. It would be true if it is invoked from the parameter less Dispose() method.

With that in mind our code would eventually be this:

    public class FileReader: IDisposable
    {
        const uint GENERIC_READ = 0x80000000;
        const uint OPEN_EXISTING = 3;
        IntPtr handle = IntPtr.Zero;
        private bool isDisposed;

        SafeHandle safeHandle = new SafeFileHandle(IntPtr.Zero, true);

        [DllImport("kernel32", SetLastError = true)]
        static extern unsafe IntPtr CreateFile(
              string FileName,                  // file name
              uint DesiredAccess,              // access mode
              uint ShareMode,                 // share mode
              uint SecurityAttributes,       // Security Attributes
              uint CreationDisposition,     // how to create
              uint FlagsAndAttributes,     // file attributes
              int hTemplateFile           // handle to template file
              );

        [DllImport("kernel32", SetLastError = true)]
        static extern unsafe bool ReadFile(
             IntPtr hFile,                // handle to file
             void* pBuffer,              // data buffer
             int NumberOfBytesToRead,   // number of bytes to read
             int* pNumberOfBytesRead,  // number of bytes read
             int Overlapped           // overlapped buffer
             );

        [DllImport("kernel32", SetLastError = true)]
        static extern unsafe bool CloseHandle(
              IntPtr hObject   // handle to object
              );

        public bool Open(string FileName)
        {
            // open the existing file for reading
            handle = CreateFile(
                  FileName,
                  GENERIC_READ,
                  0,
                  0,
                  OPEN_EXISTING,
                  0,
                  0);

            if (handle != IntPtr.Zero)
                return true;
            else
                return false;
        }

        public unsafe int Read(byte[] buffer, int index, int count)
        {
            int n = 0;
            fixed (byte* p = buffer)
            {
                if (!ReadFile(handle, p + index, count, &amp;n, 0))
                    return 0;
            }
            return n;
        }

        public bool Close()
        {
            // close file handle
            return CloseHandle(handle);
        }

        public void Dispose()
        {
            Dispose(true);
            GC.SuppressFinalize(this);
        }

        protected virtual void Dispose(bool isDisposing)
        {
            if (isDisposed)
                return;

            if (isDisposing)
            {
                safeHandle.Dispose();
            }

            if (handle != IntPtr.Zero)
                Close();

            isDisposed = true;
        }
    }

Now if you focus on the changes we made here is introducing the following method:

protected virtual void Dispose(bool isDisposing)

Now, this method envisions what we discussed a moment earlier. You can invoke it multiple times without any issue. There are two prominent block here.

  • The conditional block is supposed to free managed resources (Read invoking Dispose() methods of other IDisposable member/properties inside the class, if we have any.)
  • The non-conditional block frees the unmanaged resources.

You might ask why the conditional block tries to dispose managed resources. The GC takes care of that anyway right? Yes, you’re right. Since garbage collector is going to take care of the managed resources anyway, we are making sure the managed resources are disposed on demand if only someone calls the parameter less Dispose().¬†

Are we forgetting something again? Remember, you have unmanaged resources and if somehow the Dispose() is not invoked you still have to make sure this is finalized by the garbage collector. Let’s write up a one line destructor here.

 ~FileReader()
 {
    Dispose(false);
 }

It’s pretty straightforward and it complies with everything we said before. Kudos! We are done with FileReader.

Words of Experience:

Although we are safe already. We indeed forgot one thing.¬†If we invoke¬†Dispose() now it will dispose unmanaged and managed resources both. That also means when the garbage collector will come to collect he will see there is a destructor ergo there is a Finalize() override here. So, he would still put this instance into Finalization Queue. That kind of hurts our purpose. Because we wanted to release memory as soon as possible. If the garbage collector has to come back again, that doesn’t really make much sense. So, we would like to suppress the garbage collector to invoke Finalize() if we know we have disposed it ourselves. And a single line modification to the Dispose() method would allow you to do so.

public void Dispose()
{
    Dispose(true);
    GC.SuppressFinalize(this);
}

We added the following statement to make sure if we have done disposing ourselves the garbage collector would not invoke Finalize() anymore.

GC.SuppressFinalize(this);

Now, please keep in mind that you shouldn’t write a GC.SupperssFinalize() in a derived class since your dispose method would be overridden and you would follow the same pattern and call¬†base.Dispose(isDisposing) in the following way:

class DerivedReader : FileReader
{
   // Flag: Has Dispose already been called?
   bool disposed = false;

   // Protected implementation of Dispose pattern.
   protected override void Dispose(bool disposing)
   {
      if (disposed)
         return; 

      if (disposing) {
         // Free any other managed objects here.
         //
      }

      // Free any unmanaged objects here.
      //
      disposed = true;

      // Call the base class implementation.
      base.Dispose(disposing);
   }

   ~DerivedClass()
   {
      Dispose(false);
   }
}

It should be fairly clear to now why we are doing it this way. We want disposal to go recursively to base class. So, when we dispose the derived class resources, the base class disposes its own resources too.

To use or not to use a Finalizer:

We are almost done, really, we are. This is a section where Im supposed to tell you why you really shouldn’t use unmanaged resources whenever you need. It’s always a good idea not to write a Finalizer if you really really don’t need it. Currently we need it because you are using a unsafe file handle and we need to close it manually. To keep destructor free and as managed as possible, we should always wrap our handles in SafeHandle class and dispose the SafeHandle as a managed resource. Thus, eliminating the need for cleaning unmanaged resources and the overloaded¬†Finalize() . You will find more about that here.

“Using” it right:

Before you figure out why I quoted the word¬†using here, let’s finally wrap our work up. We have made our FileReader class disposable and we would like to invoke dispose() after we are done using it. We would opt for a try-catch-finally block to do it and will dispose the resources in the¬†finally block.

FileReader fr = new FileReader();
try
{
    if (fr.Open(args[0]))
    {
        // Assume that an ASCII file is being read
        ASCIIEncoding Encoding = new ASCIIEncoding();
        int bytesRead;
        do
        {
            bytesRead = fr.Read(buffer, 0, buffer.Length);
            string content = Encoding.GetString(buffer, 0, bytesRead);
            Console.Write("{0}", content);
        }
        while (bytesRead &gt; 0);
        return 0;
    }
    else
    {
        Console.WriteLine("Failed to open requested file");
        return 1;
    }
}
finally
{
    if (fr != null)
        fr.Dispose();
}

The only difference you see here that we don’t explicitly call Close() anymore. Because that is already handled when we are disposing the FileReader instance.

Good thing for you is that C# has essentially made things even easier than this. Remember the using statements we used in Part-I? An using statement is basically a syntactic sugar placed on a try-finally block with a call to Dispose() in the finally block just like we wrote it here. Now, with that in mind, our code-block will change to:

using (FileReader fr = new FileReader())
{
    if (fr.Open(args[0]))
    {
        // Assume that an ASCII file is being read
        ASCIIEncoding Encoding = new ASCIIEncoding();
        int bytesRead;
        do
        {
            bytesRead = fr.Read(buffer, 0, buffer.Length);
            string content = Encoding.GetString(buffer, 0, bytesRead);
            Console.Write("{0}", content);
        }
        while (bytesRead &gt; 0);
        return 0;
    }
    else
    {
        Console.WriteLine("Failed to open requested file");
        return 1;
    }
}

Now you can go back to part-I and try to understand the first bits of code we saw. I hope it would make a bit better sense to you now. Hope you like it. Hopefully, if there’s a next part, I would talk about garbage collection algorithms.

You can find the code sample over github here.

I thought I knew C# : Garbage collection & disposal. Part-I

Yeah, that’s right, time to be a garbage guy. And if this line didn’t make you laugh, you probably know how bad of a stand up comedian I ever would make.

How things goes when anyone asks about these:

Now this is going to be long. And I’m going to jump into what I want to talk about right away. Let’s start with a regular Joe who writes C#. Let’s tell him to write a “safe” looking block of code that would essentially open a gzip file, read it as byte array, decompress the byte array¬†using a buffer, write it over a memory stream and return it when he is done doing the whole thing.

This is something you might expect in return. This would be the block where the aforementioned byte array gets decompressed:

    public class Solution
    {
        public static byte[] Decompress(byte[] gzip)
        {
            using (MemoryStream memoryStream = new MemoryStream())
            using (GZipStream gzipStream = new GZipStream(memoryStream, CompressionMode.Decompress))
            {
                const int size = 4096;
                byte[] buffer = new byte[size];
                using (MemoryStream writeStream = new MemoryStream())
                {
                    int count = 0;
                    do
                    {
                        count = gzipStream.Read(buffer, 0, count);
                        if (count &amp;amp;amp;amp;amp;gt; 0)
                        {
                            memoryStream.Write(buffer, 0, count);
                        }
                    } while (count &amp;amp;amp;amp;amp;gt; 0);
                    return writeStream.ToArray();
                }
            }
        }
    }

And this would be a possible segment where you see the file being read:

    class Program
    {
        const int _max = 200000;
        static void Main()
        {
            byte[] array = File.ReadAllBytes("Capture.7z");
            Solution.Decompress(array);
        }
    }

And yes, the code sample credit goes to dotnetperls . The reason I started with an example before any explanations is I want to build up on this. I believe when you have a place to go, sometimes the fact that you know where you would end up reinforces what you will learn in the process. But it is very important here to know why you would end up here.

Breaking the code bits:

Let’s start breaking the code up into bits. Now, the first question you would ask the regular Joe like me is how do you claim this code block is “safe” and what do you mean when you say this is “safe”. The first answer would essentially be that there is a beautiful using block there which would essentially¬†dispose the¬†resources¬†when it is done being used.

I focused three words here, throughout this articles I would bold out words like this and whenever I do that, if you don’t know exactly what I’m talking about, put these words in a dictionary in your brain and in turns I will explain all these. That also means if you think you know all of these, your journey ends here.

Words of wisdom:

The first word of wisdom here is dispose. And I would essentially start with some basics for it. Before going into dispose, we need to dive back on some proper backgrounds on .net garbage collection process. If you are totally new to this, garbage collector is an automatic memory manager, it lets you develop your application without having the need to free the memory every time. If I drove you inside more confusion, that means you need to know what happens when you allocate some memory, essentially which happens every time you declare and initiate any value or reference you write when you are writing C#.

Every time you write the new keyword¬†to initialize an object in C#, you essentially allocate the object in a managed heap. And garbage collector automatically deallocates the objects that are not being used anymore from the managed heap “some time in the future”. ¬†If you are already curious what is a managed heap, fret not. I will explain that too. But before that, lets talk about some fundamentals on memory.¬†¬†When we essentially write C#, we essentially use a virtual address space. Since you have a lot of processes in the same computer who shares the same memory (read your RAM here) and you would essentially need them not to overlap with one another. Each process then needs to address a specific set of the memory for them and thus you have your virtual address space mapped for each process. By default 32 bit computers has around 2GB user mode virtual address space. When you are actually allocating memory, you allocate memory on this virtual address space, not the physical memory. For this, the garbage collector works on this virtual space and frees up this virtual memory for you automatically. Neat, huh?

I need memory:

What actually happens when we essentially write something like the following:

    static void Main()
    {
        Cat cat = new Cat("Nerd");
    }

Looks like we are initiating a harmless cat with the name¬†Nerd. When you compile this C Sharp compiler will generate a common intermediate language (IL/CIL) code¬†so the JIT compiler in CLR can compile those for any possible machine configuration. You see, I said a lot of jargons, I didn’t bold them out because I’m not going to talk about them here. Now the intermediate code that is being generated here kind of looks like this:

IL_000a:        newobj instance void CilNew.Cat::.ctor (string)

It looks about right, we only care about the newobj instruction here. This specific instruction needs to do three things.

  1. Calculate the total amount of memory you require for the object.
  2. Look for space in the managed heap for space.
  3. When the object is created, return the reference to the caller and advance the next object pointer  to the next available slot on the managed heap.

Im quiet sure number 1 is very very easy to understand here. Why would we need to look for space in the managed heap then? Lets look at this first.

managed-heap

If you look at the example here, now it should be pretty clear to you what I meant. If the next object pointer doesn’t find enough space to fit the next object in, you would expect a¬†OutOfMemoryException¬†. This can also happen when you don’t have enough physical memory either. This picture also can mislead you. I will come to that now. You might think now Virtual address space is contiguous always. Well, it’s not. Virtual address space can be fragmented. This means that there are free blocks or holes among used blocks and the virtual memory manager has to find a big enough free block to allocate so you can instantiate your variable. So, even if you have 2GB virtual address space, this does not mean you have 2GB contiguously. If you ask the virtual memory manager for 2GB of space, it could fail due to the fact you dont have that amount of contiguous address space. But for regular explanations that picture will suffice well.

Now we know how objects are allocated and we spent some time on what is the managed heap and how objects are allocated on the managed heap. The reason we discussed about this is to make you understand why you need garbage collection and when it is triggered.

States of virtual memory:

There are three states of the virtual memory. Free state says this block of memory is available for allocation. When you request for allocation, it goes to Reserved state. Much like booking a hotel. Now your memory block is reserved for you but not used yet. And no one else can use this block either because you reserved it. When you finally use it, it goes to Committed state. In this state, the block of memory has a physical storage association.

The garbage collector kicks in:

There are definitely multiple conditions which are responsible for garbage collection. And you already know the very first one now. When you run out of space for a new allocation in the virtual address space. We are going to jump in and see what actually the garbage collector does in a very basic level.

Garbage collection happens in two stages. Mark and Sweep. The mark essentially searches for managed objects that are referenced in managed code. It will attempt to finalize objects that are unreachable. That is the first thing to do on sweep stage. The last work to do on sweep stage is to reclaim the memory of the unreachable objects now.

I know you are thinking what is¬†managed code. We would come back to this in the journey. Don’t worry. For now, keep in your mind that garbage collector can only deal with¬†managed¬†code.

So, the technique is essentially to mark objects the program might be using and just clean off the rest one. But, how the garbage collector would know which objects it needs to clean? How would it decide which objects are unreachable. It does it using something called Object Graph which is not essentially under the context of this article. But I do have a nice representation to go with.

object-graph-pr-before-gc

Lets assume this is the situation in the heap. You have a managed heap like this and lets assume the garbage collector kicks in due to less memory. It would essentially look like the following after the collection.

object-graph-pr-after-gc

Now it should be evident to you what basically happens in a garbage collection from a birds eye view. Marked man needs finalization and then it leaves.

I still didn’t properly explain how you essentially get these marked objects. ¬†To understand that properly, we need to understand about generations.

Generations inside the heap:

Generations inside the heap essentially dictates how long the object would be essentially needed. And thus it is divided into long-lived and short-lived objects. There are three generations here and the indexing starts from zero:

  1. Generation 0: This is the youngest generation and contains short-lived objects. Temporary and newly allocated objects live here. This is the part of the heap where garbage collection happens very frequently.
  2. Generation 1: This is essentially a buffer between generation 0 and generation 2. Generation 2 contains long-lived objects. Generation 1 essentially holds the objects who is still looking to be short-lived but survived generation 0.
  3. Generation 2: This is the generation of long-lived objects. These objects are usually objects that stays for long time in the process. Statics come first in mind. And a new object can be allocated straight to generation 2 instead of generation 0 if it’s really big. Like a big array with a lot of space allocation.

Garbage collections are generation specific but the collection is recursive up until the younger generation. So it clears generation 1, it would also clear generation 0. If GC clears generation 2, it would also go down to clear generation 1 and generation 0.

I used the word¬†survive¬† a moment ago. What I wanted to say is if an object¬†doesn’t get reclaimed/cleaned up during a sweep operation over a generation it gets promoted to the next generation. If survival rate is higher in a generation, GC tries to increase the threshold of allocation of that specific generation. So in the next cleanup the application gets a big size of memory freed.

One more thing to remember here. Garbage collector would stop any managed thread to work. So, it has to be quick and efficient unless you are looking at performance penalties.

Back to the code bits:

If you have survived up until now, you deserve to go back to the code bit at the beginning. The first thing I’m going to clear up is the managed vs unmanaged resources. Managed resources are directly under the control of the garbage collector. It is a result of managed code which would eventually compile to intermediate language. Unmanaged resources are resources your garbage collector don’t really know about. That includes, open files, open network connections and of course unmanaged memory. Now, if you are using C# classes to do these, most of the times these are almost managed. That means the managed code does the “dirty work” inside and you don’t have to clean up these yourselves. The garbage collector would clean up the managed wrapper and the managed wrapper would clean up the unmanaged code in the disposal process.

Now, lets go back to the word¬†dispose here. How would I dispose something off my code. Is there a method somewhere, something I could use? Indeed there is. A dispose method implementation has essentially two variations. Since your garbage collection can’t handle unmanaged resources, you need to wrap them. The first technique is to wrap them under any class derived from SafeHandle class and use IDisposable¬†¬†interface to make it properly disposable. This very interface would expose the dispose() method you need and you would use that to dispose the resources yourself.

I explained in details how to do that in the next part.

To switch or not to switch

Now, the first thing that comes to mind when you are reading the title of this blog is a very “insightful” question that potentially dictates my reluctance to learn switch properly in C# even after all these years. Fret not, I’m the same old uncool dude who learns stuff late and in the process gets his ass kicked.

Today I want to go through the age old switch statement we all have been using. There’s a good number among us who prefers switch over an annoying if-else block. And I claim no crime there, not at all. Up until a couple of days ago I was a happy chump to use switch whenever I’m handling compile time constants like enums and writing if else blocks for my logical operation checks. Simple, happy as ever. All of that changed when my skyrim-ridden dull brain asked “Why there are two prominent branching paradigm here in this statically typed language” and kind of made a good point to find that out. I don’t remember what point my brain made at that time, sorry. But it’s my brain and I can’t deny much like deadpool cant.

deadpool

Let’s stay on focus, shall we? Lets go ahead and write a simple switch snippet like the following:

    public class Program
    {
        enum WhoGivesACrap
        {
            I_do,
            Nope_I_dont,
            Maybe_NotSure
        }
        static void Main(string[] args)
        {
            WhoGivesACrap whoGivesACrap = WhoGivesACrap.I_do;

            switch(whoGivesACrap)
            {
                case WhoGivesACrap.I_do:
                    System.Console.WriteLine($"{whoGivesACrap}");
                    break;
                case WhoGivesACrap.Maybe_NotSure:
                    System.Console.WriteLine($"{whoGivesACrap}");
                    break;
                case WhoGivesACrap.Nope_I_dont:
                    System.Console.WriteLine($"{whoGivesACrap}");
                    break;
            }
        }
    }

Now, this is definitely the first day at C# programming grade code. Dull, blasphemous and quiet frankly not worth of any attention. Same thoughts as mine to be honest. Thus I kept looking, booted up¬†ildasm¬†and asked what we can we see inside. Now, I can post the full de-assembled MSIL code here but that won’t make much of sense. Lets look on the parts we really want to look.

//000014:
//000015:             switch(whoGivesACrap)
    IL_0003:  ldloc.0
    IL_0004:  stloc.1
    .line 16707566,16707566 : 0,0 ''
//000016:             {
//000017:                 case WhoGivesACrap.I_do:
//000018:                     System.Console.WriteLine($"{whoGivesACrap}");
//000019:                     break;
//000020:                 case WhoGivesACrap.Maybe_NotSure:
//000021:                     System.Console.WriteLine($"{whoGivesACrap}");
//000022:                     break;
//000023:                 case WhoGivesACrap.Nope_I_dont:
//000024:                     System.Console.WriteLine($"{whoGivesACrap}");
//000025:                     break;
//000026:             }
//000027:         }
//000028:     }
//000029: }
    IL_0005:  ldloc.1
    IL_0006:  switch     (
                          IL_0019,
                          IL_0049,
                          IL_0031)
    IL_0017:  br.s       IL_0061

    .line 18,18 : 21,66 ''
//000018:                     System.Console.WriteLine($"{whoGivesACrap}");

I opted for dumping with my C# source code. And the thing that catches my eye is the¬†switch¬† invocation with the three jump locations. And as my enums were adjacent it makes sense. CIL¬†switch¬†essentially create a jump table. The three arguments it takes are essentially jump locations which will be compared against my enums. Cool, at least now I have an answer why it is different than if-else-if-else. Remember I didn’t say¬†if-else block because it makes more sense to do¬†if-else in this fashion than checking else for no apparent reason.

If you are still not bored enough why don’t you go and have a look here?

Now, I also thought this would be the end of it. But the¬†voices in my head reminded me to do one more thing. And that’s using non-adjacent values. Thus I modified my enum in the following fashion.

        enum WhoGivesACrap
        {
            I_do = 17,
            Nope_I_dont = 57,
            Maybe_NotSure = 945
        }

And hooked up¬†ildasm again to figure out what happened this time. Now, my values are non adjacent. It doesn’t really make sense anymore to create around 900+ entry jump table.

//000014:
//000015:             switch(whoGivesACrap)
    IL_0004:  ldloc.0
    IL_0005:  stloc.1
    .line 16707566,16707566 : 0,0 ''
//000016:             {
//000017:                 case WhoGivesACrap.I_do:
//000018:                     System.Console.WriteLine($"{whoGivesACrap}");
//000019:                     break;
//000020:                 case WhoGivesACrap.Maybe_NotSure:
//000021:                     System.Console.WriteLine($"{whoGivesACrap}");
//000022:                     break;
//000023:                 case WhoGivesACrap.Nope_I_dont:
//000024:                     System.Console.WriteLine($"{whoGivesACrap}");
//000025:                     break;
//000026:             }
//000027:         }
//000028:     }
//000029: }
    IL_0006:  ldloc.1
    IL_0007:  ldc.i4.s   17
    IL_0009:  beq.s      IL_001e

    IL_000b:  br.s       IL_000d

    IL_000d:  ldloc.1
    IL_000e:  ldc.i4.s   57
    IL_0010:  beq.s      IL_004e

    IL_0012:  br.s       IL_0014

    IL_0014:  ldloc.1
    IL_0015:  ldc.i4     0x3b1
    IL_001a:  beq.s      IL_0036

    IL_001c:  br.s       IL_0066

    .line 18,18 : 21,66 ''
//000018:                     System.Console.WriteLine($"{whoGivesACrap}");

And I wasn’t wrong. Since the values are non-adjacent now¬†CSC opted for loading the enums separately and used beq.s which stands for¬†branch-if-equal in short form. Technically this looks like a vanilla if-else-if block. Although I can’t possibly say¬†CSC would generate opcodes following this and this only.

I know this whole thing might sound insanely boring while¬†performance¬†wise this would only make a minuscule difference. Still, it’s always fun to answer the¬†voices inside my head and they are much happier when they have something to hold against when they ask¬†why.

Until next time!

Porting your WebApi 2.2 app to Azure Service Fabric

Now, to start talking about this, you gotta know why this is spawned off instead of spawning off a “hello world” to service fabric. I’m back to writing after a long time and it’s only fitting I do share what I have been doing meanwhile.

Usually in a production scenario you’d end up with a case where you have your applications lying around in several tiers. To be honest, I’m pointing reference to a production environment which is microservices driven. If you’re reading this and questioning why I did that you probably want to go down here¬†to get yourself started with the basic concepts of Azure Service Fabric.

Now, I hope you have familiarized yourself with service fabric by now and I can start talking about how you can port your existing IIS hosted Asp.net Web Api 2.2 application to a self-hosted statless web api service in Azure Service Fabric.

To get yourself started please install Azure Service Fabric local development environment in your machine from here¬†and go through the set of instructions to get yourself started. When you’re done with these, you’d see Visual Studio come up with a new set of project types for service fabric.

And it would pretty much like the following:

servicefabricprojecttype

Now, before we click the that elusive OK button, we need to understand what are the tasks at hand here. We have to do two things here.

1. Make sure our web api 2.2 app is compatible with self host as it was using IIS.

2. Make sure we can salvage the same Visual Studio project we used for our web project.

Making sure the existing web api 2.2 app is compatible of self hosting:

  1. For number one, one might think it should be farely easy to port a IIS driven web api app to a self hosted app, the reality might not be that easy. First, make sure you have a OWIN startup class initiated. Usually you would have one if you are using OWIN. And you’d also need a Program.cs file which is standard one for a console application as now your web app would be ¬†a self-hosted application. So go ahead and add these two files to your web project and if you need reference, both of them are shown here.
  2. Now, although you have a startup.cs and Program.cs initiated already, you still haven’t converted your web app project to a console application. To do so, please go over your project properties and¬†¬†on the Applications section, select Output Type as Console Application¬†and set the Program class as the Startup object. Now, if you forget to add Program.cs as startup object which you’d only be able to do after you have created that file with a static void Main(string[] args)¬†initiated on it, as long as it is named Program.cs the project would invoke it automatically.
  3. If you are using anything from Microsoft.Owin.Host.SystemWeb library, please make sure that these wont work anymore for you. For example if you are using code segments like the following to resolve file path to your traditional web app deployment and subsidiary folders like App_data, it wont work anymore for you.
     string path = System.Web.Hosting.HostingEnvironment.MapPath(@"~/App_Data/EmailTemplates/");
    

    So for things like these you might have to resort yourself to solutions that resolves deployment path in a console application.

  4. Now, there are some other pranks too. If you are familiar with Asp.net Identity and use it for your default identity provider, you’re in for a treat. Usually when you develop a expirable token generation paradigm for email and passwords, you’d need to use a¬†IDataProtectionProvider which is usually the¬†MachineKeyDataProtectionProvider under the namespace of¬†Microsoft.Owin.Host.SystemWeb.DataProtection¬†which you’d probably ditch because you would be using owin self host/Katana now. So, expect a null there where you try something like¬†app.GetDataProtectionProvider() where app is your IAppBuilder. I ran into this myself and thanks to Katana being open source, you just pick up the MachineKeyDataProtectionProovider class from here.Just make sure you use it as ¬†IDataProtectionProvider like the following:
    using System;
    using System.Web.Security;
    using Microsoft.Owin.Security.DataProtection;
    
    namespace TaskCat.Lib.DataProtection
    {
        using DataProtectionProviderDelegate = Func<string[], Tuple<Func<byte[], byte[]>, Func<byte[], byte[]>>>;
        using DataProtectionTuple = Tuple<Func<byte[], byte[]>, Func<byte[], byte[]>>;
    
        ///
    <summary>
        /// Used to provide the data protection services that are derived from the MachineKey API. It is the best choice of
        /// data protection when you application is hosted by ASP.NET and all servers in the farm are running with the same Machine Key values.
        /// </summary>
    
        internal class MachineKeyDataProtectionProvider: IDataProtectionProvider
        {
            ///
    <summary>
            /// Returns a new instance of IDataProtection for the provider.
            /// </summary>
    
            /// <param name="purposes">Additional entropy used to ensure protected data may only be unprotected for the correct purposes.</param>
            /// <returns>An instance of a data protection service</returns>
            public virtual MachineKeyDataProtector Create(params string[] purposes)
            {
                return new MachineKeyDataProtector(purposes);
            }
    
            public virtual DataProtectionProviderDelegate ToOwinFunction()
            {
                return purposes =>
                {
                    MachineKeyDataProtector dataProtecter = Create(purposes);
                    return new DataProtectionTuple(dataProtecter.Protect, dataProtecter.Unprotect);
                };
            }
    
            IDataProtector IDataProtectionProvider.Create(params string[] purposes)
            {
                return this.Create(purposes);
            }
        }
    }
    

    And you’d need to do the same with MachineKeyDataProtector class and use it as a IDataProtector like the following:

    using System.Web.Security;
    using Microsoft.Owin.Security.DataProtection;
    
    namespace TaskCat.Lib.DataProtection
    {
        internal class MachineKeyDataProtector: IDataProtector
        {
            private readonly string[] _purposes;
    
            public MachineKeyDataProtector(params string[] purposes)
            {
                _purposes = purposes;
            }
    
            public virtual byte[] Protect(byte[] userData)
            {
                return MachineKey.Protect(userData, _purposes);
            }
    
            public virtual byte[] Unprotect(byte[] protectedData)
            {
                return MachineKey.Unprotect(protectedData, _purposes);
            }
        }
    }
    
    

    Then you can use it as your replacement of your old one from Microsoft.Owin.Host.SystemWeb.DataProtection. This is not really needed if you don’t use Asp.net Identity but of to good use if you land in the same problem.Now, hopefully your current web api app would be self hosted just fine if you try something like the following:

    // Start OWIN host
    using (WebApp.Start<Startup>(url: baseAddress))
    {
      Console.WriteLine("Press ENTER to stop the server and close app...");
      Console.ReadLine();
    }
    

Now, lets try salvaging that Visual Studio project as much as we can.

Project changes when it comes to Visual Studio

  1. Now, you can use the converted project but from my personal experience it would be easier¬†for you if you just click the new ServiceFabric project button now and create a new Stateless WebApi project. You can port/reuse most of your code and as the boilerplates are already written, you don’t really have to rewrite that.
  2. If you want to keep the old project intact, you can of course reference the old project in the new stateless web api service project and use your old startup class to hook up the OwinCommunicationListener instead of the one that it comes with. But beware, nuget might make it a nightmare for you if dependencies are not met/mismatched.

Hope you guys have fun with Azure Service Fabric and if you really want to look for more how you can do you own stateless web api from scratch take a look here. If I find time, I’d write one too.

Asynchrnous Programming in C# Cheat Sheet

Tip: Dont go for Async Void

You get possibly three return types for async methods and those would be

  • Task
  • Task < T >
  • void

Its always better if you can avoid the void.

Reason:

Async Void methods were made possible so one can have async event handlers. Exceptions thrown out of these are actually handled in the Synchronization context that was active at that time, meaning you wont be able to “catch” it with a “catch”. You can definitely catch those in App.UnhandledException or similiar all unhandled exception catching scenarios. Plus, its not testable too.

If you go for returning Task you will see your async methods are returning a Task object which essentially has the exceptions noted inside of it.

Workaround:

In case of stuff like async event handlers, the best thing to do would be is to put the actual code logic in a separate awaited method so you can test it whenever you want.

Tip: Once Async always go async

Async codes are written better if they are written from the top to the bottom in an all async paradigm, meaning its always better to wrap an async method in another async method all the way to the bottom. If you wrap up a little async code in your sync code context, you might end up in a deadlock.

public static class DeadlockDemo
{
  private static async Task DelayAsync()
  {
    await Task.Delay(1000);
  }
  // This method causes a deadlock when called in a GUI or ASP.NET context.
  public static void Test()
  {
    // Start the delay.
    var delayTask = DelayAsync();
    // Wait for the delay to complete.
    delayTask.Wait();
  }
}

This piece of code would end up in a deadlock in GUI or ASP.net apps. Because when you await a Task the current context is captured so it can be resumed on the current context after the awaiting is done. The context is usually the current SynchronizationContext which is essentially the TaskScheduler. In Asp.net or GUI app youre only supposed to run a single chunk of code at a time. In this case the DelayAsync await is completed it is trying to complete the rest of the code after await in the captured context. This context has already a thread in it and waiting for the async call to be finished. They are waiting on each other = deadlock.

In case of a console app, this would execute just fine. Because it has a thread pool SynchronizationContext so it would execute the rest of the async method in a separate thread.

A good thing to remember here is that if you await a async method only the first exception occured would be rethrown. If you block in synchronously you’d get an AggregateException with all the exceptions in it

 

Tip: Configuring await context

When you have a lot of incy wincy bits of async code, your GUI app might end up in a situation where most of the time its busy handling those async events. This can lead to performance issues.

Solution and Example:

Using ConfigureAwait at the end of your async call might help you in this case. Follow the following example:

async Task MyMethodAsync()
{
  // Code here runs in the original context.
  await Task.Delay(1000);
  // Code here runs in the original context.
  await Task.Delay(1000).ConfigureAwait(
    continueOnCapturedContext: false);
  // Code here runs without the original
  // context (in this case, on the thread pool).
}

You can definitely see here that we are denying to run the rest of the async method after the await in the same context. Now it would use a thread pool synchrnizing context for your code and esssentially would make your code deadlock free and a tad smoother. Very handy for UWP app people.

Warning

For GUI people, dont use ConfigureAwait everywhere! Please only use it when the rest of the code doesnt handle any GUI events (databound update, GUI updates) and for ASP.net people, dont use it if the rest of the code uses HttpContext of that time.

And finally, If you’re not a TLDR; guy and need more please read this