C# 8.0 feature proposals: Quick Review

This blog is written respective to the features C# 8.0 is suggesting to have. The base for this write up is the following video where Mads Torgersen explains Seth Juarez the newest C# 8.0 features proposed. Please bear in mind none of these features are released or finalized yet.

You can watch the video here.

Nullable Reference Types

I felt like this is somewhat of a misnomer since reference types are by default nullable as they can be null and a lot of developer would expect so. Although to be technically correct nullability (Is that a word? Who knows!) is not a mandatory need for all reference types. We are just used to see it that way and thus the naming is correct.

This specific feature points to a fact where someone is actually capable of declaring reference type instances which are not supposed to be null. Even in C# 7.0 we had a proposal where we would be able to declare explicitly non-nullable reference types with the damnit(postfix !) opeartor (I know, I know). By that definition a string! instance is explicitly non-nullable and if you attempt to assign a null to it, the compiler will throw an error. Same was suggested for the case where you are explicitly using a nullable reference type to use it’s value to assign something else. The obligatory Jon Skeet proposal from 2008 (!!) is here.

The problem along the lines were is to make sure no existing codebase ends up being broken. In the video Mads Torgersen points out the same fact somewhat too. The video suggests that in a case where we could explicitly say a reference type can be null.

public string? SomeMethod()
{
    return null;
}

One might argue that we can do that right now. What is the darn difference? Usually in cases where we have methods that can return null as a return value, our only saviour was our luck or the xml documentation for that associated method. This, essentially expresses that specific intent explicitly. Now you know that the method can and potentially will return null as a value in times. And instead of a strict compiler error, C# 8.0 will possibly opt for a warning in case of using nullable reference types. As example:

string test = SomeMethod();
Console.WriteLine(test.Length);

Here the Console.Writeline(test.length) will end up in a warning since we are accessible a possible null without a check. And the compiler is even able to do some fun flow analysis too. In case of the following example:

string test = SomeMethod();
if(test != null)
{
    Console.WriteLine(test.Length);
}

The compiler will be able to understand that the developer actually cared for checking possibly nullability and it will gracefully take the aforementioned warning off! And of course one can also opt to use the elvis operator:

string test = SomeMethod();
Console.WriteLine(test?.Length);

And Mads also talks about the existing codebase being broken, which I like. He offers a warning when we do things like:

string test = null;

The latent intention is to move to a world where regular reference type instantiations will be considered as non-nullable by default.

What about forcefully telling the compiler that you know better when you are accessing a reference type instance which is nullable. The damnit operator comes handy in here.

string? test = null;
if (string.IsNullOrWhiteSpace(test))
{
    Console.WriteLine(s!.Length);
}

Here, we are explicitly telling the compiler that we know better and we can expect a run time failure if we end up being wrong.

Although I love all of these propositions, I still am a tad skeptical on the usage of Nullable struct for the value types. Sometimes it gets really confusing because the nullable value type instances take a null as value but can not act like an actual null natively.

Default Interface Implementation

This one made me a tad confused too. Although Mads said that this is definitely influenced from Java anonymous classes and I somewhat agree but to me what it more resembles is Scala’s traits. The best usability I see here is definitely being able to define a default implementation of a method in an interface. But I think I would have been happier if we had something totally alike anonymous classes in Java.

Just for the sake of some code sample I’m stealing a code section from one of the articles I already have referenced in this write-up. And the code sample itself is self explanatory for any C# developer so I won’t pile on it. You can create default implementations of your method in the interface itself.

And if you do, you don’t have to explicitly implement that method in the classes you implement that interface.

public interface ILogger
{
    void Log(LogLevel level, string message);

    void Log(LogLevel level, string format, params object[] arguments)
    {
        Log(level, string.Format(format, arguments));
    }

    void Debug(string message)
    {
        Log(LogLevel.Debug, message);
    }

    void Debug(string format, params object[] arguments)
    {
        Log(LogLevel.Debug, string.Format(format, arguments));
    }
}

Nevertheless, this is a good approach since it doesn’t break any code written before. Good job there C# team. And of course the run time team too since this actually needed some run time modifications after a long long time.

Asynchronous Enumerables

I think this has been tried before by developers themselves. C# 5.0 was the pioneer of bringing async and await into the world and the beauty of that was to be able to write true asynchronous code with the least of hassle and without any callbacks. What was lacking though is asynchronous enumerations or generators. To get back a sequence of result, your only hope is to get them as a list from the async method. Once again, stealing a snippet:

IAsyncEnumerable<SearchResult> results =
    searchEngine.GetAllResults(query);

foreach await (var result in results) { // ... }

From the very first look, it looks so like Reactive Extensions which started from a Microsoft Research project. And this was actually brought up in the video too. Apparently the biggest difference between Rx and this is the model it works. This is a pull model instead of a push one Rx uses. Rx is responsible for letting the consumers/observers know when there’s something new for an observable. Interesting

I really liked it to be honest. And even more to my liking, the video described a nifty little usecase of combining a push and pull based model. Where you are pulling data asynchronously and also consuming/buffering your data that you are getting from the push based source. Let’s assume we are enumerating a set of tweets every time. If the data source was an observable and we are enumerating on the list of the tweets we have, while we are processing a bunch through our pull based enumeration, we could also be gathering the rest we got from the observable in a buffer. Amazing use case in my opinion.

Extension Everything

Currently C# only supports extension methods. And not to forget all you extend is static methods. I think for small decorations this works well since we don’t have to go through the full decorator pattern ourselves. And C# team this time tried to do something entirely different. Allow me to steal a snippet again before we go into this.

extension Enrollee extends Person
{
    // static field
    static Dictionary<Person, Professor> enrollees =
        new Dictionary<Person, Professor>();

    // instance method
    public void Enroll(Professor supervisor) =>
        enrollees[this] = supervisor;

    // instance property
    public Professor Supervisor =>
        enrollees.TryGetValue(this, out var supervisor)
            ? supervisor
            : null;

    // static property
    public static ICollection<Person> Students => enrollees.Keys;

    // instance constructor
    public Person(string name, Professor supervisor)
        : this(name)
    {
        this.Enroll(supervisor);
    }
}

We can clearly see C# finally took the extends keyword from Java but used it on a more literal context. The keyword extension defines a class to be an actual extension to a class mentioned after the extends keyword. It is basically a simplified sugary decorator. It looks a lot like Partial Classes and it is also brought up in the video where Mads nicely differentiates the two things here. Instead of Partial Classes which needs to be in the same assembly to work on since the compiler will merge them before execution. This Extension Class on the other hand will truly work like an extension. It doesn’t need to be in the same assembly. It bears member methods. Although Mads carefully warned not to expect stateful changes to the original instance from extension classes since that might end up in problematic scenarios. And I kind of agree with him and I already like the syntax proposal. And this actually generates a new type. And as a bonus Mads also talked about being able to extend constructors and overloaded operators. Since the extension is a type itself and we can even implement interfaces in that extension type. For an example, if there is a Person classs and we wrote a extension type PersonExtension for it. it might be possible to implement an interface IEmployee with the type PersonExtension since Person doesn’t do that. And that way we can improvise on how Person could behave it was IEmployee. Amazing indeed. Although these features are still in plan and might not make in C# 8.0 but they are definitely on the roadmap.

I didn’t cover the Asynchronous Dispose since I need a bit more to understand that properly. You can find the gist in the following section. And the records proposal was already there for C# 7.0 proposals. Thus I didn’t cover them.

More To Read:

Building your own REST query language: An experiment with Azure Cosmos Db

For all the REST resources we deploy out every day, one of the most common scenario a developer ends up handling is designing a nice and effective search/filter/query endpoint for these aforementioned resources. Usually in these scenarios conventions play a big part and that starts from query filter parameters in the query string to using a full blown search engine like ElasticSearch, Lucene or Algolia. Vinay Sahni did a phenomenal job for putting out the gist for basic REST practices here. If you are new into REST and want to get yourself started talking REST, it’s worth a look.

Today, we will do a simple yet fun experiment. Our goal today is to create a simple nifty looking query language for a sample REST resource. We will try to mimic a couple of features Twitter Search Api provides and our sample data storage today will be Azure Cosmos Db: Document Db which previously was known as just Azure Document db. Recently it moved to the Azure Cosmos Db family and if you have missed it, here is the full details.

Therefore, the first thing we need is a set of sample data we can use as our REST resource. Since we are going to mimic the twitter public search api, we definitely need some sample data first. And of course github comes to the rescue. Download the sample data and put it to your own document db instance. If you don’t know how to create a document db instance, follow the quick start here. I’m referencing the link to the .net quick start, but there are node.js, Python, Java and Xamarin resources right by the side of it. For those of you who do not want to test your work against an Azure hosted document db and want to test Azure document db locally, you can always opt for the azure document db emulator.

Now, for the sake of this experiment I ported the github twitter sample data to an azure document db instance. It has 10 entries in that small data set and hosted here. If you want to browse the data, azure portal does allow it. But I suggest looking at this open source app here named Azure DocumentDB Studio. You can use the endpoint and the key written in the sample code in github to connect. The database is public just for you guys so you can test the sample code I have hosted in github and will list out in the end of this article. TL;DR people, this is your cue to go to the end of the article, but I highly suggest to stick around.

Features to be built:

Out of all the features twitters search api provides, I will port the to:UserAccount, from:UserAccount and “exact text” search capabilities. That means, We will be able to search tweets sent from one account to other and we will also be able to search tweets by mentioning a string we like to be present in the tweets. For an example if we want to find tweets

  1. from an user account named @terminator
  2. to another user account named @robocop 
  3. or any tweet where the words “I’m back”
  4. or the hashtag #HastaLaVista is present,

our sample query language excerpt will be

http://our-awesome-api/search/q?to:robocop AND from:terminator OR “I’m back” OR #HastaLaVista”

If you have an eye for detail, you will notice that this is not exactly like the twitter search api since doesn’t use an and operator. But for the sake of the simplicity in this example this should be enough.

Tools we are going to use:

Our api stack will be written in asp.net core. We will use ANTLR as our query language lexer-parser. If you need an ANTLR primer and another experiment I did with it, please have a look here where I tried to gobble up a simple scripting language. If you are not really accustomed to any of .net stacks, fret not. All of these are totally doable in any other tech stack you will possibly prefer to use. Since ANTLR has targets for multiple languages, api can be built in basically any language and even for our sample storage solution azure document db here, you have access to multiple client SDKs.

The data:

As I mentioned above, we have a little data set of 10 entries. I’m posting a sample entry here so the rest of the tutorial makes sense. One entry points to a single tweet made by a sample user.

{
  "entities": {
    "user_mentions": [
      {
        "indices": [
          3,
          15
        ],
        "id_str": "178253493",
        "screen_name": "mikalabrags",
        "name": "Mika Labrague",
        "id": 178253493
      }
    ],
    "urls": [],
    "hashtags": [ "#Confused" ]
  },
  "in_reply_to_screen_name": null,
  "text": "RT @mikalabrags: Bipolar weather #Confused",
  "id_str": "210621130703245313",
  "place": null,
  "retweeted_status": {
    "entities": {
      "user_mentions": [],
      "urls": [],
      "hashtags": []
    },
    "in_reply_to_screen_name": null,
    "text": "Bipolar weather",
    "id_str": "210619512855343105",
    "place": null,
    "in_reply_to_status_id": null,
    "contributors": null,
    "retweet_count": 0,
    "favorited": false,
    "truncated": false,
    "source": "http://ubersocial.com",
    "in_reply_to_status_id_str": null,
    "created_at": "Thu Jun 07 06:29:39 +0000 2012",
    "in_reply_to_user_id_str": null,
    "in_reply_to_user_id": null,
    "user": {
      "lang": "en",
      "profile_background_image_url": "http://a0.twimg.com/profile_background_images/503549271/tumblr_m25lrjIjgT1qb6nmgo1_500.jpg",
      "id_str": "178253493",
      "default_profile_image": false,
      "statuses_count": 13635,
      "profile_link_color": "06544a",
      "favourites_count": 819,
      "profile_image_url_https": "https://si0.twimg.com/profile_images/2240536982/AtRKA77CIAAJRHT_normal.jpg",
      "following": null,
      "profile_background_color": "373d3a",
      "description": "No fate but what we make",
      "notifications": null,
      "profile_background_tile": true,
      "time_zone": "Alaska",
      "profile_sidebar_fill_color": "1c1c21",
      "listed_count": 1,
      "contributors_enabled": false,
      "geo_enabled": true,
      "created_at": "Sat Aug 14 07:31:28 +0000 2010",
      "screen_name": "mikalabrags",
      "follow_request_sent": null,
      "profile_sidebar_border_color": "08080a",
      "protected": false,
      "url": null,
      "default_profile": false,
      "name": "Mika Labrague",
      "is_translator": false,
      "show_all_inline_media": true,
      "verified": false,
      "profile_use_background_image": true,
      "followers_count": 214,
      "profile_image_url": "http://a0.twimg.com/profile_images/2240536982/AtRKA77CIAAJRHT_normal.jpg",
      "id": 178253493,
      "profile_background_image_url_https": "https://si0.twimg.com/profile_background_images/503549271/tumblr_m25lrjIjgT1qb6nmgo1_500.jpg",
      "utc_offset": -32400,
      "friends_count": 224,
      "profile_text_color": "352e4d",
      "location": "Mnl"
    },
    "retweeted": false,
    "id": 2.106195128553431E+17,
    "coordinates": null,
    "geo": null
  },
  "in_reply_to_status_id": null,
  "contributors": null,
  "retweet_count": 0,
  "favorited": false,
  "truncated": false,
  "source": "&amp;amp;amp;lt;a href=\"http://blackberry.com/twitter\" rel=\"nofollow\"&amp;amp;amp;gt;Twitter for BlackBerry®&amp;amp;amp;lt;/a&amp;amp;amp;gt;",
  "in_reply_to_status_id_str": null,
  "created_at": "Thu Jun 07 06:36:05 +0000 2012",
  "in_reply_to_user_id_str": null,
  "in_reply_to_user_id": null,
  "user": {
    "lang": "en",
    "profile_background_image_url": "http://a0.twimg.com/profile_background_images/542537222/534075_10150809727636812_541871811_10087628_844237475_n_large.jpg",
    "id_str": "37200018",
    "default_profile_image": false,
    "statuses_count": 5715,
    "profile_link_color": "CC3366",
    "favourites_count": 46,
    "profile_image_url_https": "https://si0.twimg.com/profile_images/2276155427/photo_201_1_normal.jpg",
    "following": null,
    "profile_background_color": "dbe9ed",
    "description": "protège-moi de mes désirs  23107961 ☍",
    "notifications": null,
    "profile_background_tile": true,
    "time_zone": "Singapore",
    "profile_sidebar_fill_color": "ffffff",
    "listed_count": 2,
    "contributors_enabled": false,
    "geo_enabled": true,
    "created_at": "Sat May 02 13:55:49 +0000 2009",
    "screen_name": "yoursweetiethea",
    "follow_request_sent": null,
    "profile_sidebar_border_color": "91f50e",
    "protected": false,
    "url": "http://yoursweetiethea.tumblr.com",
    "default_profile": false,
    "name": "Althea Arellano",
    "is_translator": false,
    "show_all_inline_media": false,
    "verified": false,
    "profile_use_background_image": true,
    "followers_count": 306,
    "profile_image_url": "http://a0.twimg.com/profile_images/2276155427/photo_201_1_normal.jpg",
    "id": "37200018",
    "profile_background_image_url_https": "https://si0.twimg.com/profile_background_images/542537222/534075_10150809727636812_541871811_10087628_844237475_n_large.jpg",
    "utc_offset": 28800,
    "friends_count": 297,
    "profile_text_color": "fa3c6b",
    "location": "Christian's Heart"
  },
  "retweeted": false,
  "id": "210621130703245300",
  "coordinates": null,
  "geo": null
}

The ANTLR grammar:

The ANTLR grammar we are going to use here is:

grammar Search;

expr: term (op term)*;
term: exactText | hashText | toText | fromText;
op: AND | OR;

toText: 'to:'ID;
fromText: 'from:'ID;
hashText: '#'ID;
exactText: EXACTTEXT;

// lexer rule
EXACTTEXT: '"' ~'"'* '"';
OR: 'OR';
AND: 'AND';
ID: [a-zA-Z_] [a-zA-Z0-9_]*;
WS: [ \n\t\r]+ -> skip;

It’s a small and simple grammar like the one we used in our scripting language tutorial. Since our REST resource query is essentially an expression, our entry rule will be expr. The railroad diagram for expr looks like:

expr_rest
Rule expr

This means we can have a single search term or multiple search term chained by a op  rule. The op rule is nothing but the two relational operator we support: AND and OR.

relop
Rule op

Along side of the op rule we also need to know what the term rule stands for. The term rule stands for the search term format we are allowed to use. In our sample query stated above, we have terms like  from:terminatorto:robocop, “I’m back” or a hashtag like #HastaLaVista. That’s why we have 4 rules defining all of these cases and the term rule is a OR relationship between them.

term_rest
rule term

I’m not going to post the railroad diagram for toText, fromText, hashText and exactText rules since they are pretty self-explanatory if you have a cursory look at the grammar.

So, what are we waiting for? Let’s start writing our little codebase that will parse this query string and translate it to an azure document db SQL that we can use to fetch the tweets. For that, we need a small repository that will connect to our desired database and collection in document db and will let us fetch some items. I only added methods that will allow us to read the tweets and connect to the database. I ignored the rest since you can always have a look at those in the quick start for azure document db.

Here’s our small rough database repository. Please remember to keep your endpoint and key strings somewhere secret and safe in production environment, since this is a tutorial, I went with what is easy and fast to go for a proof of concept.

namespace TweetQuery.Lib
{
    using System;
    using System.Collections.Generic;
    using System.Threading.Tasks;
    using Microsoft.Azure.Documents.Client;
    using Microsoft.Azure.Documents.Linq;

    public class CosmosDBRepository<T> where T : class
    {
        private readonly string Endpoint = "https://tweet.documents.azure.com:443/";
        private readonly string Key = "fjp9Z3qKPxSOfE0KS1aaKvUY27B8IoL347sdtMBMjkCQqPmoaKjGXoyltrItNXNN6h4QjAYLSY5nyb2djWWUOQ==";
        private readonly string DatabaseId = "tweetdb";
        private readonly string CollectionId = "tweets";
        private DocumentClient client;

        public CosmosDBRepository()
        {
            client = new DocumentClient(new Uri(Endpoint), Key);
        }

        public async Task<IEnumerable<T>> GetItemsAsync(string sql)
        {
            if (string.IsNullOrEmpty(sql))
                throw new ArgumentNullException(nameof(sql));

            FeedOptions queryOptions = new FeedOptions { MaxItemCount = -1 };

            var query = this.client.CreateDocumentQuery<T>(
                 UriFactory.CreateDocumentCollectionUri(DatabaseId, CollectionId),
                 sql, queryOptions)
                 .AsDocumentQuery();

            List<T> results = new List<T>();
            while (query.HasMoreResults)
            {
                results.AddRange(await query.ExecuteNextAsync<T>());
            }

            return results;
        }
    }
}

I opted for executing a SQL instead of a linq expression since constructing SQL is easier and simpler for a tutorial. Plus it decouples the query structure from compile time POCOs that we use as our models too.

I created a DocumentDbListener class based off the SearchBaseListener which was auto-generated from our ANTLR grammar. The sole purpose of this class is to generate a simple SQL against our search expression. To search inside nested arrays, I used a user defined function for azure document db. All of these are very crudely written, so forgive my indecency. Since this is just a tutorial, I tried to keep it as simple as possible.

function matchArrayElement(array, match) {
    for (var index = 0; index < array.length; index++) {
        var element = array[index];

        if (typeof match === "object") {
            for (var key in match) {
                if (match.hasOwnProperty(key) && element.hasOwnProperty(key)) {
                    var matchVal = match[key];
                    var elemVal = element[key];

                    return matchVal == elemVal;
                }
            }
        }
        else {
            return (element == match)
        }
    }

    return false;
}

All this method does is it tries to find nested array elements based on the match we send back. You can achieve the same result thorough JOINs in Azure Document Db or Array method ARRAY_CONTAINS, but I preferred a user defined function since it serves my purpose easily.

Constructing SQL from the query expression:

To understand how the SQL is generated from the query expression, let’s begin with the to:UserAccount expression. Since we start with the rule expr, let’s override the SearchBaseListener method EnterExpr first.

namespace TweetSearch.CosmosDb.DocumentDb
{
    using Antlr4.Runtime.Misc;
    using TweetSearch.CosmosDb.Util;

    public class DocumentDbListener : SearchBaseListener
    {
        private string projectionClause = "SELECT * FROM twt";
        private string whereClause;

        public string Output
        {
            get { return projectionClause + " " + whereClause; }
        }

        public override void EnterExpr([NotNull] SearchParser.ExprContext context)
        {
            this.whereClause = "WHERE";
        }
    }
}

The approach I took here is essentially the simplest. I handle the events fired the moment ANTLR enters a specific rule and I keep appending the SQL string to the whereClause. Since, entering the expr rule means that I will need a where SQL clause, I initialized it with “WHERE”. The thing to notice here is instead of concatenating I chose to initialize it because I expect this event to be fired exactly once since that is how the grammar is designed.

Following the same trail the next thing to handle will be the EnterTerm event. But, term is nothing but an OR relationship between 4 other rules. Handling them specifically gives me the edge since they produce simpler and smaller readable methods. For example, if we want to handle the to:UserAccount expression, a simple method like following should be sufficient for our use case.

        public override void EnterToText([NotNull] SearchParser.ToTextContext context)
        {
            var screenName = context.GetText().Substring(3).Enquote();
            this.whereClause = string.Concat(whereClause, " ", $"udf.matchArrayElement(twt.entities.user_mentions, {{ \"screen_name\" : {screenName} }} )");
        }

This is where our user defined function also comes in play though. I’m trying to find any tweet that has an user mention to the parsed user account I fetched from the query.

By following the same rule I completed the rest of the four rules and my full listener class looks like:

namespace TweetSearch.CosmosDb.DocumentDb
{
    using Antlr4.Runtime.Misc;
    using TweetSearch.CosmosDb.Util;

    public class DocumentDbListener : SearchBaseListener
    {
        private string projectionClause = "SELECT * FROM twt";
        private string whereClause;

        public string Output
        {
            get { return projectionClause + " " + whereClause; }
        }

        public override void EnterExpr([NotNull] SearchParser.ExprContext context)
        {
            this.whereClause = "WHERE";
        }

        public override void EnterFromText([NotNull] SearchParser.FromTextContext context)
        {
            var screenName = context.GetText().Substring(5).Enquote();
            this.whereClause = string.Concat(whereClause, " ", "twt.user.screen_name = ", screenName);
        }

        public override void EnterOp([NotNull] SearchParser.OpContext context)
        {
            var text = context.GetText();
            this.whereClause = string.Concat(this.whereClause, " ", text.ToUpper());
        }

        public override void EnterToText([NotNull] SearchParser.ToTextContext context)
        {
            var screenName = context.GetText().Substring(3).Enquote();
            this.whereClause = string.Concat(whereClause, " ", $"udf.matchArrayElement(twt.entities.user_mentions, {{ \"screen_name\" : {screenName} }} )");
        }

        public override void EnterHashText([NotNull] SearchParser.HashTextContext context)
        {
            var hashtag = context.GetText().Enquote();
            this.whereClause = string.Concat(whereClause, " ", $"udf.matchArrayElement(twt.entities.hashtags, {hashtag})");
        }

        public override void EnterExactText([NotNull] SearchParser.ExactTextContext context)
        {
            var text = context.GetText();
            this.whereClause = string.Concat(whereClause, " ", $"CONTAINS(twt.text, {text})");
        }
    }
}

We got our listener ready! Now, all we need is a context class that will bootstrap the lexer and parser and tokens so the input expression is transpiled and the output SQL is generated. Just like our last work on ANTLR, the TweetQueryContext class will look like the following:

namespace TweetSearch.CosmosDb.DocumentDb
{
    using Antlr4.Runtime;
    using Antlr4.Runtime.Tree;

    public class TweetQueryContext
    {
        private DocumentDbListener listener;

        public TweetQueryContext()
        {
            this.listener = new DocumentDbListener();
        }

        public SearchParser.ExprContext GenerateAST(string input)
        {
            var inputStream = new AntlrInputStream(input);
            var lexer = new SearchLexer(inputStream);
            var tokens = new CommonTokenStream(lexer);
            var parser = new SearchParser(tokens);
            parser.ErrorHandler = new BailErrorStrategy();

            return parser.expr();
        }

        public string GenerateQuery(string inputText)
        {
            var astree = this.GenerateAST(inputText);
            ParseTreeWalker.Default.Walk(listener, astree);
            return listener.Output;
        }
    }
}

Whew! That was easy, right?

Bootstrapping the api layer:

We have all we need except the api. Thanks to asp .net core, that is two clicks away. Open Visual Studio and open a .net core api project. Our TweetsController class looks like the following:

namespace TweetQuery.Controllers
{
    using Microsoft.AspNetCore.Mvc;
    using System.Threading.Tasks;
    using TweetQuery.Lib;
    using TweetQuery.Lib.Model;
    using TweetSearch.CosmosDb.DocumentDb;

    [Route("api/[controller]")]
    public class TweetsController : Controller
    {
        private CosmosDBRepository<Tweet> repository;
        private TweetQueryContext context;

        public TweetsController(CosmosDBRepository<Tweet> repository)
        {
            this.repository = repository;
            this.context = new TweetQueryContext();
        }

        [HttpGet("search")]
        public async Task<IActionResult> Search([FromQuery] string q)
        {
            if (string.IsNullOrEmpty(q))
                return BadRequest();

            var querySql = this.context.GenerateQuery(q).Trim();
            var result = await repository.GetItemsAsync(querySql);
            return Ok(result);
        }
    }
}

I reused the db repository we created earlier and as you see that is dependency injected in the controller which you have to configure in the ConfigureServices method in your Startup  class. I’m not adding that specific code here since it is already in the sample code and doesn’t belong to the scope of this tutorial. Same goes for the model class Tweet and the classes it uses inside.

Time to test!

The project is hosted here in github. Clone the code, and build and run it from your visual studio. As a sample query try the following:

http://localhost:5000/api/tweets/search?q=to:hatena_sugoi AND from:maeta_tw OR %23HopesUp OR "Surely June is a summer"

 

I url-encoded the hashtag here just to be nice on the REST client you might use. I highly suggest Postman if you don’t want anything heavy.

I only took the minimalists way of using ANTLR here, you can build your own expression tree based on the auto-generated listener and can do so much more if you want. A perfect example will be LinqToQuerystring written by Pete Smith. It generates the necessary LINQ expression for any IQueryable and thus allows you to write database agnostic Odata driven queries and it’s tons faster and lighter than the one Microsoft ships.

I hope this was fun. Happy RESTing!

Creating a nano scripting language using ANTLR and Roslyn

For everyone of us who wakes and codes everyday somewhere in this world, we often find ourselves pretty attached to the programming language we love most. All of us feels that point of idiosyncrasy when we try to break our boundaries and try to learn something new since this field is always expanding and evolving like a universe on steroids.

For those who still are newbies and is trying to find ways to understand how a programming language works, what could be better than trying to make one of your own? 🙂 This post is essentially a small proof of concept where we will go for a small nice fun looking esoteric programming language.

Before we jump into the very tidbits inside, allow me to explain what we will do here. We will write a small programming language that is parsed and lexed by ANTLR, transpiles to C# and the transpiled code is fed into Roslyn C# script api.

If the aforementioned words are looking too wordy for you, let me clear it right up for you. Like every language we use to talk every day, programming language comes with a grammar itself. It should be clear as a daylight to you if you have written a single line of code in your life. Every programming language follows a pretty defined structure and a dancing parade of words following that. So, since we are creating one, the first thing we need is that grammar for our language. Instead of doing it from scratch, our tool of choice is ANTLR, which stands for “Another Tool for Language Recognition”. ANTLR will help us to define the grammar, generate the lexer and parser for it and we will eventually be able to reuse those components to transpile our code to C#. Remember ANTLR do support for javascript, java, python too. So, if you want to have your lexer and parser defined in those languages, please don’t refrain yourself from using this.

Now, even before we start talking about languages and everything else, we need to understand a bit about compiler theory. Now, I can go into gory details of lexer and parser but since this is an age of google, I will let that responsibility fall in your hand. We will take somewhat of an exploratory way to understand how everything works and take it from there.

Let’s assume that we have a language like this:

derp a = 20 ':)' #Initialization

# basic if-else
a > 2 ???
yep ->
    a = 5 ':)'
kbye

# print
dump a ':)'

The first thing we need is a grammar. To understand the programming language we need something that understands all the words here. And that guy is the lexer. A lexer eats up the whole code block we send to the compiler, chops that into little lexemes (read strings/words here). So we can understand when we have hit a specific keyword. Like for our dummy language here, we used the popular internet lingo derp to initialize a variable.

The next thing in line to do is making some meaningful excerpt from the words. Like when we write a if  block, the compiler needs to know what strings of characters or words should stand together to construct a meaningful statement or expression. That is the job of a parser. If you see closely, you will see that every block of code can be expressed as a tree. For example, the initialization block here  looks like the following:

Assignment

This is definitely a single branch tree, it says you need to write the word derp to start the statement and you should end it with a smiley ( 🙂 ) to finish the statement. In the middle you use an identifier (name for your variable) and an ASSIGN operator (=) to define the flow of assignment from right to left. That tree is actually generated straight from the grammar we are going to use today.

So, why wait? Let’s have a look at the grammar we are going to be using today:

grammar Profane;

compilationUnit: statement* EOF;

statement:
        printstmt
        | assignstmt
        | ifstmt
		| setstmt;

printstmt	: 'dump' expr? SMILEY;
assignstmt	: 'derp' ID ASSIGN expr SMILEY;
setstmt		:  ID ASSIGN expr SMILEY;

ifstmt  :
        conditionExpr '???'
        'yep ->'
            statement*
        'kbye';

conditionExpr: expr relop expr;
expr: term | opExpression;
opExpression: term op term;
op: PLUS | ASSIGN | MINUS;
relop: EQUAL | NOTEQUAL | GT | LT | GTEQ | LTEQ;

term: ID | number | STRING;

number: NUMBER;

// Keywords
ID: [a-zA-Z_] [a-zA-Z0-9_]*;
SMILEY: ':)';
WS: [ \n\t\r]+ -> skip;

PLUS    :'+';
EQUAL   : '====';
ASSIGN  : '=';
NOTEQUAL: '!!==';
MINUS   : '-';
GT      : '>';
LT      : '<';
GTEQ    : '>=';
LTEQ    : '>=';

fragment INT: [0-9]+;
NUMBER: INT ('.'(INT)?)?;
STRING: '"' (~('\n' | '"'))* '"';

Now, my first advice here will be to not get confused by the size of the grammar and start taking out bits we do understand at first. If you refer to the first image in this and have a look at the grammar you will see that the rule assignstmt is defined the exact same way depicted in the picture. If you find the words ID, ASSIGN and SMILEY, you will also see that all of them are defined in the grammar below where they have their literal form. These tokens helps the parser to understand what you have written. And the assignstmt is called a rule, which define relations among  different tokens or rules. See? this is how we build rules for our grammar. Because a grammar is nothing but a set of well defined rules.

By the very first rule named compilationUnit,our language is nothing but a set of statements ending with an EOF. Every statement can either be a printstmt (for printing), ifstmt (if-else) or the aforementioned assignstmt. Neat huh? You can even go down and find out how the rest of the tree is built.

I strongly suggest using Visual Studio Code and its ANTLR4 extension for writing ANTLR grammar files. You will have nice looking railroad diagram like the one I posted for every rule you write!

Time to get our hands dirty. Fret not since the whole project is publicly available over github so I will only point out excerpts of the code that we need to visit. And I’m going back to the same assignstmt. First thing we need to do is download ANTLR . Go to the download page and download the latest .jar. ANTLR is written in Java so you will need to have java installed in your machine. We will use .net core here for the rest of the work so you can download that for your OS too. When you are done downloading ANTLR you can generate the target code for C# from your grammar file. In our case the grammar file name was Profane.g4 and to generate the C# lexer and parser based off the code all we had to do is invoke :

java -jar antlr-4.7-complete.jar -Dlanguage=CSharp Profane.g4

It will generate the C# classes you need for walking through the grammar. You will see a lexer and a listener class being created. You don’t really need to touch the lexer class. The listener class is used to handle events that are fired when ANTLR encounters a rule. So, to make use of our assignment statement we need to handle the event that is fired when ANTLR encounters our assignstmt rule. But how will we ever understand which event is that? Here comes the ANTLR magic.

ANTLR will generate the event you need for all your rules every time you create a lexer and parser from your grammar. That means every time you change your grammar you need to regenerate the C# targets again, the lexer and listener class. I added the command in the build target for the sample project so it will automatically do it on build.

So, lets create a class inherited from ProfaneBaseListener  class which was auto generated by ANTLR and name it ProfaneListener. Since our rule was named assignstmt ANTLR will generate a method called EnterAssignstmt inside it which will be invoked every time ANTLR encounters that type of statement or that rule. Our target here is to generate equivalent C# code from it so

derp some = 10 ':)'

will be transpiled in C# equivalent code like the following.

dynamic some = 10;

And we are going to do that in the method EnterAssignstmt which ANTLR has built out for us. Let’s have a look at it.

        public override void EnterAssignstmt([NotNull] ProfaneParser.AssignstmtContext context)
        {
            string target = context.ID().GetText();
            dynamic value = this.ResolveExpression(context.expr());
            this.Output += "dynamic " + target + " = " + value + ";\n";
        }

This is really simple work here. We are asking the context class to give us the ID and the expression so we can generate the equivalent C# code string. The nice thing you can notice here is ANTLR has generated class for our assignment statement context itself. But what is that ResolveExpression method. If we remember the rule it was :

Assignment
Rule assignstmt

Here expr is another rule. Which looks like.

expr
Rule expr

That means, expr can either be a rule named term or a rule named opExpression. Let’s have a look at both of them.

First is the rule called term. A term can be an identifier (the name for your variable), a number (int or float) or a string. Either one of these.

term
Rule term

That means that term can also be a valid value for rule expr. Since expr is either term or opExpression

The opExpression on the other hand is a tad complex one. This is an example where you can reuse multiple rules to create a complex rule.

opExpression

The only thing that we don’t know here is the op rule. the opExpression rule says it is structured as term followed by a rule named op and there has to be another term in the end.

op
Rule op

The  op rule defines an OR relationship between PLUS, MINUS and ASSIGN which stands respectively for ‘+’, ‘-‘ and ‘=’. That means this rule says we can write things like

derp some = 10 + 2 + someOtherDerp 🙂

Amazing! Isn’t it?

Before we look into the ResolveExpression method, let’s see a nice looking tree on how that previous statement actually gets parsed by all these rules.

assigngraph

This will hopefully help you to understand how things are happening here. Now, lets see the ResolveExpression method which transpiles the rule expr. 

        private dynamic ResolveExpression(ProfaneParser.ExprContext exprContext)
        {
            var opExpression = exprContext.opExpression();
            if (opExpression != null)
            {
                return ResolveOpExpression(opExpression);
            }
            else
            {
                return ResolveTerm(exprContext.term());
            }
        }

        private dynamic ResolveOpExpression(ProfaneParser.OpExpressionContext plusContext)
        {
            var leftTerm = plusContext.term().First();
            var rightTerm = plusContext.term().Last();

            var left = ResolveTerm(leftTerm);
            var right = ResolveTerm(rightTerm);

            return left + plusContext.op().GetText() + right;
        }

        private dynamic ResolveTerm(ProfaneParser.TermContext termContext)
        {
            if (termContext.number() != null)
            {
                return termContext.number().GetText();
            }
            else if (termContext.ID() != null)
            {
                return termContext.ID().GetText();
            }
            else if (termContext.STRING() != null)
            {
                Regex regex = new Regex("/\\$\\{([^\\}]+)\\}/g");
                var contextText = termContext.GetText();
                var replacedString = regex.Replace(contextText, "$1");
                return replacedString;
            }
            else return default(dynamic);
        }

We also see two new methods the ResolveExpression method uses named ResolveTerm and ResolveOpExpression. They generate the transpiled C# code for term rule and opExpression rule. We keep adding the generated code in a string named output and when it’s done we have our C# transpiled code ready to be executed.

Executing C# as a script using Roslyn:

Now that we have our C# code to be executed, we will use another tool called Roslyn. It is a compiler tool for .net that gives you rich set of features regarding code analysis and compilation. We will specifically be using the C# scripting api.

First we need to generate the AST for our code. If you are scratching head on what an abstract syntax tree is, you already saw something like it in the tree we posted before. To generate the tree, you need your lexer. Your lexer will generate tokens. Your parser will use those tokens to start the execution unit you need to compile your code which is the root node for your tree. This is still done using ANTLR.

When you have your tree, you need to walk on it. When you walk on the AST, the listener fires the events we need to generate the transpiled code like we did minutes ago. So the listener output is our C# code which we need to feed to roslyn.

Roslyn C# script engine comes with a nifty class named CSharpScript which will do the trick for you. All we have to do is to feed the C# code and load the assemblies we will need. Then, it will do its magic and we will have our own scripting language talking to us.

So, our transpiler class looks like

    public class ProfaneTranspiler
    {
        private ProfaneListener listener;
        private static readonly MetadataReference[] References =
        {
            MetadataReference.CreateFromFile(typeof(object).GetTypeInfo().Assembly.Location),
            MetadataReference.CreateFromFile(typeof(RuntimeBinderException).GetTypeInfo().Assembly.Location),
            MetadataReference.CreateFromFile(typeof(System.Runtime.CompilerServices.DynamicAttribute).GetTypeInfo().Assembly.Location),
            MetadataReference.CreateFromFile(typeof(ExpressionType).GetTypeInfo().Assembly.Location),
            MetadataReference.CreateFromFile(Assembly.Load(new AssemblyName("mscorlib")).Location),
            MetadataReference.CreateFromFile(Assembly.Load(new AssemblyName("System.Runtime")).Location)
        };

        public ProfaneTranspiler()
        {
            this.listener = new ProfaneListener();
        }

        public ProfaneParser.CompilationUnitContext GenerateAST(string input)
        {
            var inputStream = new AntlrInputStream(input);
            var lexer = new ProfaneLexer(inputStream);
            var tokens = new CommonTokenStream(lexer);
            var parser = new ProfaneParser(tokens);
            parser.ErrorHandler = new BailErrorStrategy();

            return parser.compilationUnit();
        }

        public string GenerateTranspiledCode(string inputText)
        {
            var astree = this.GenerateAST(inputText);
            ParseTreeWalker.Default.Walk(listener, astree);
            return listener.Output;
        }

        public async Task&amp;amp;amp;amp;lt;TranspileResult&amp;amp;amp;amp;gt; RunAsync(string code)
        {
            var result = new TranspileResult();

            if (string.IsNullOrEmpty(code))
                return result;

            Stopwatch watch = new Stopwatch();
            watch.Start();

            try
            {
                ScriptOptions scriptOptions = ScriptOptions.Default;
                scriptOptions = scriptOptions.AddReferences(References);
                scriptOptions = scriptOptions.AddImports("System");

                var resultCode = this.GenerateTranspiledCode(code);
                if (resultCode == null)
                {
                    watch.Stop();
                    result.TimeElapsed = watch.Elapsed.ToString();
                    return result;
                }

                var outputStrBuilder = new StringBuilder();
                using (var writer = new StringWriter(outputStrBuilder))
                {
                    Console.SetOut(writer);
                    var scriptState = await CSharpScript.RunAsync(resultCode, scriptOptions);
                    result.output = outputStrBuilder.ToString();
                }
            }
            catch (Exception ex)
            {
                result.output = ex.Message;
            }
            finally
            {
                watch.Stop();
                result.TimeElapsed = watch.Elapsed.ToString();
            }
            return result;
        }
    }

I uploaded the full sample code in github here. You need to build and run the Profane project which is a console app. It will host a small web api in port 5000. If you POST your code to the endpoint as plain text in the POST body, you will get back the output of your code. Postman can be a nice client to do so.

Hope this was fun. Knowing the internals of your daily programming language essentially boosts up the confidence while you write it. So make your own esoteric language if you have time. It’s always fun to make em.

I thought I knew C# : Garbage collection & disposal. Part-I

Yeah, that’s right, time to be a garbage guy. And if this line didn’t make you laugh, you probably know how bad of a stand up comedian I ever would make.

How things goes when anyone asks about these:

Now this is going to be long. And I’m going to jump into what I want to talk about right away. Let’s start with a regular Joe who writes C#. Let’s tell him to write a “safe” looking block of code that would essentially open a gzip file, read it as byte array, decompress the byte array using a buffer, write it over a memory stream and return it when he is done doing the whole thing.

This is something you might expect in return. This would be the block where the aforementioned byte array gets decompressed:

    public class Solution
    {
        public static byte[] Decompress(byte[] gzip)
        {
            using (MemoryStream memoryStream = new MemoryStream())
            using (GZipStream gzipStream = new GZipStream(memoryStream, CompressionMode.Decompress))
            {
                const int size = 4096;
                byte[] buffer = new byte[size];
                using (MemoryStream writeStream = new MemoryStream())
                {
                    int count = 0;
                    do
                    {
                        count = gzipStream.Read(buffer, 0, count);
                        if (count &amp;amp;amp;amp;amp;gt; 0)
                        {
                            memoryStream.Write(buffer, 0, count);
                        }
                    } while (count &amp;amp;amp;amp;amp;gt; 0);
                    return writeStream.ToArray();
                }
            }
        }
    }

And this would be a possible segment where you see the file being read:

    class Program
    {
        const int _max = 200000;
        static void Main()
        {
            byte[] array = File.ReadAllBytes("Capture.7z");
            Solution.Decompress(array);
        }
    }

And yes, the code sample credit goes to dotnetperls . The reason I started with an example before any explanations is I want to build up on this. I believe when you have a place to go, sometimes the fact that you know where you would end up reinforces what you will learn in the process. But it is very important here to know why you would end up here.

Breaking the code bits:

Let’s start breaking the code up into bits. Now, the first question you would ask the regular Joe like me is how do you claim this code block is “safe” and what do you mean when you say this is “safe”. The first answer would essentially be that there is a beautiful using block there which would essentially dispose the resources when it is done being used.

I focused three words here, throughout this articles I would bold out words like this and whenever I do that, if you don’t know exactly what I’m talking about, put these words in a dictionary in your brain and in turns I will explain all these. That also means if you think you know all of these, your journey ends here.

Words of wisdom:

The first word of wisdom here is dispose. And I would essentially start with some basics for it. Before going into dispose, we need to dive back on some proper backgrounds on .net garbage collection process. If you are totally new to this, garbage collector is an automatic memory manager, it lets you develop your application without having the need to free the memory every time. If I drove you inside more confusion, that means you need to know what happens when you allocate some memory, essentially which happens every time you declare and initiate any value or reference you write when you are writing C#.

Every time you write the new keyword to initialize an object in C#, you essentially allocate the object in a managed heap. And garbage collector automatically deallocates the objects that are not being used anymore from the managed heap “some time in the future”.  If you are already curious what is a managed heap, fret not. I will explain that too. But before that, lets talk about some fundamentals on memory.  When we essentially write C#, we essentially use a virtual address space. Since you have a lot of processes in the same computer who shares the same memory (read your RAM here) and you would essentially need them not to overlap with one another. Each process then needs to address a specific set of the memory for them and thus you have your virtual address space mapped for each process. By default 32 bit computers has around 2GB user mode virtual address space. When you are actually allocating memory, you allocate memory on this virtual address space, not the physical memory. For this, the garbage collector works on this virtual space and frees up this virtual memory for you automatically. Neat, huh?

I need memory:

What actually happens when we essentially write something like the following:

    static void Main()
    {
        Cat cat = new Cat("Nerd");
    }

Looks like we are initiating a harmless cat with the name Nerd. When you compile this C Sharp compiler will generate a common intermediate language (IL/CIL) code so the JIT compiler in CLR can compile those for any possible machine configuration. You see, I said a lot of jargons, I didn’t bold them out because I’m not going to talk about them here. Now the intermediate code that is being generated here kind of looks like this:

IL_000a:        newobj instance void CilNew.Cat::.ctor (string)

It looks about right, we only care about the newobj instruction here. This specific instruction needs to do three things.

  1. Calculate the total amount of memory you require for the object.
  2. Look for space in the managed heap for space.
  3. When the object is created, return the reference to the caller and advance the next object pointer  to the next available slot on the managed heap.

Im quiet sure number 1 is very very easy to understand here. Why would we need to look for space in the managed heap then? Lets look at this first.

managed-heap

If you look at the example here, now it should be pretty clear to you what I meant. If the next object pointer doesn’t find enough space to fit the next object in, you would expect a OutOfMemoryException . This can also happen when you don’t have enough physical memory either. This picture also can mislead you. I will come to that now. You might think now Virtual address space is contiguous always. Well, it’s not. Virtual address space can be fragmented. This means that there are free blocks or holes among used blocks and the virtual memory manager has to find a big enough free block to allocate so you can instantiate your variable. So, even if you have 2GB virtual address space, this does not mean you have 2GB contiguously. If you ask the virtual memory manager for 2GB of space, it could fail due to the fact you dont have that amount of contiguous address space. But for regular explanations that picture will suffice well.

Now we know how objects are allocated and we spent some time on what is the managed heap and how objects are allocated on the managed heap. The reason we discussed about this is to make you understand why you need garbage collection and when it is triggered.

States of virtual memory:

There are three states of the virtual memory. Free state says this block of memory is available for allocation. When you request for allocation, it goes to Reserved state. Much like booking a hotel. Now your memory block is reserved for you but not used yet. And no one else can use this block either because you reserved it. When you finally use it, it goes to Committed state. In this state, the block of memory has a physical storage association.

The garbage collector kicks in:

There are definitely multiple conditions which are responsible for garbage collection. And you already know the very first one now. When you run out of space for a new allocation in the virtual address space. We are going to jump in and see what actually the garbage collector does in a very basic level.

Garbage collection happens in two stages. Mark and Sweep. The mark essentially searches for managed objects that are referenced in managed code. It will attempt to finalize objects that are unreachable. That is the first thing to do on sweep stage. The last work to do on sweep stage is to reclaim the memory of the unreachable objects now.

I know you are thinking what is managed code. We would come back to this in the journey. Don’t worry. For now, keep in your mind that garbage collector can only deal with managed code.

So, the technique is essentially to mark objects the program might be using and just clean off the rest one. But, how the garbage collector would know which objects it needs to clean? How would it decide which objects are unreachable. It does it using something called Object Graph which is not essentially under the context of this article. But I do have a nice representation to go with.

object-graph-pr-before-gc

Lets assume this is the situation in the heap. You have a managed heap like this and lets assume the garbage collector kicks in due to less memory. It would essentially look like the following after the collection.

object-graph-pr-after-gc

Now it should be evident to you what basically happens in a garbage collection from a birds eye view. Marked man needs finalization and then it leaves.

I still didn’t properly explain how you essentially get these marked objects.  To understand that properly, we need to understand about generations.

Generations inside the heap:

Generations inside the heap essentially dictates how long the object would be essentially needed. And thus it is divided into long-lived and short-lived objects. There are three generations here and the indexing starts from zero:

  1. Generation 0: This is the youngest generation and contains short-lived objects. Temporary and newly allocated objects live here. This is the part of the heap where garbage collection happens very frequently.
  2. Generation 1: This is essentially a buffer between generation 0 and generation 2. Generation 2 contains long-lived objects. Generation 1 essentially holds the objects who is still looking to be short-lived but survived generation 0.
  3. Generation 2: This is the generation of long-lived objects. These objects are usually objects that stays for long time in the process. Statics come first in mind. And a new object can be allocated straight to generation 2 instead of generation 0 if it’s really big. Like a big array with a lot of space allocation.

Garbage collections are generation specific but the collection is recursive up until the younger generation. So it clears generation 1, it would also clear generation 0. If GC clears generation 2, it would also go down to clear generation 1 and generation 0.

I used the word survive  a moment ago. What I wanted to say is if an object doesn’t get reclaimed/cleaned up during a sweep operation over a generation it gets promoted to the next generation. If survival rate is higher in a generation, GC tries to increase the threshold of allocation of that specific generation. So in the next cleanup the application gets a big size of memory freed.

One more thing to remember here. Garbage collector would stop any managed thread to work. So, it has to be quick and efficient unless you are looking at performance penalties.

Back to the code bits:

If you have survived up until now, you deserve to go back to the code bit at the beginning. The first thing I’m going to clear up is the managed vs unmanaged resources. Managed resources are directly under the control of the garbage collector. It is a result of managed code which would eventually compile to intermediate language. Unmanaged resources are resources your garbage collector don’t really know about. That includes, open files, open network connections and of course unmanaged memory. Now, if you are using C# classes to do these, most of the times these are almost managed. That means the managed code does the “dirty work” inside and you don’t have to clean up these yourselves. The garbage collector would clean up the managed wrapper and the managed wrapper would clean up the unmanaged code in the disposal process.

Now, lets go back to the word dispose here. How would I dispose something off my code. Is there a method somewhere, something I could use? Indeed there is. A dispose method implementation has essentially two variations. Since your garbage collection can’t handle unmanaged resources, you need to wrap them. The first technique is to wrap them under any class derived from SafeHandle class and use IDisposable  interface to make it properly disposable. This very interface would expose the dispose() method you need and you would use that to dispose the resources yourself.

I explained in details how to do that in the next part.

To switch or not to switch

Now, the first thing that comes to mind when you are reading the title of this blog is a very “insightful” question that potentially dictates my reluctance to learn switch properly in C# even after all these years. Fret not, I’m the same old uncool dude who learns stuff late and in the process gets his ass kicked.

Today I want to go through the age old switch statement we all have been using. There’s a good number among us who prefers switch over an annoying if-else block. And I claim no crime there, not at all. Up until a couple of days ago I was a happy chump to use switch whenever I’m handling compile time constants like enums and writing if else blocks for my logical operation checks. Simple, happy as ever. All of that changed when my skyrim-ridden dull brain asked “Why there are two prominent branching paradigm here in this statically typed language” and kind of made a good point to find that out. I don’t remember what point my brain made at that time, sorry. But it’s my brain and I can’t deny much like deadpool cant.

deadpool

Let’s stay on focus, shall we? Lets go ahead and write a simple switch snippet like the following:

    public class Program
    {
        enum WhoGivesACrap
        {
            I_do,
            Nope_I_dont,
            Maybe_NotSure
        }
        static void Main(string[] args)
        {
            WhoGivesACrap whoGivesACrap = WhoGivesACrap.I_do;

            switch(whoGivesACrap)
            {
                case WhoGivesACrap.I_do:
                    System.Console.WriteLine($"{whoGivesACrap}");
                    break;
                case WhoGivesACrap.Maybe_NotSure:
                    System.Console.WriteLine($"{whoGivesACrap}");
                    break;
                case WhoGivesACrap.Nope_I_dont:
                    System.Console.WriteLine($"{whoGivesACrap}");
                    break;
            }
        }
    }

Now, this is definitely the first day at C# programming grade code. Dull, blasphemous and quiet frankly not worth of any attention. Same thoughts as mine to be honest. Thus I kept looking, booted up ildasm and asked what we can we see inside. Now, I can post the full de-assembled MSIL code here but that won’t make much of sense. Lets look on the parts we really want to look.

//000014:
//000015:             switch(whoGivesACrap)
    IL_0003:  ldloc.0
    IL_0004:  stloc.1
    .line 16707566,16707566 : 0,0 ''
//000016:             {
//000017:                 case WhoGivesACrap.I_do:
//000018:                     System.Console.WriteLine($"{whoGivesACrap}");
//000019:                     break;
//000020:                 case WhoGivesACrap.Maybe_NotSure:
//000021:                     System.Console.WriteLine($"{whoGivesACrap}");
//000022:                     break;
//000023:                 case WhoGivesACrap.Nope_I_dont:
//000024:                     System.Console.WriteLine($"{whoGivesACrap}");
//000025:                     break;
//000026:             }
//000027:         }
//000028:     }
//000029: }
    IL_0005:  ldloc.1
    IL_0006:  switch     (
                          IL_0019,
                          IL_0049,
                          IL_0031)
    IL_0017:  br.s       IL_0061

    .line 18,18 : 21,66 ''
//000018:                     System.Console.WriteLine($"{whoGivesACrap}");

I opted for dumping with my C# source code. And the thing that catches my eye is the switch  invocation with the three jump locations. And as my enums were adjacent it makes sense. CIL switch essentially create a jump table. The three arguments it takes are essentially jump locations which will be compared against my enums. Cool, at least now I have an answer why it is different than if-else-if-else. Remember I didn’t say if-else block because it makes more sense to do if-else in this fashion than checking else for no apparent reason.

If you are still not bored enough why don’t you go and have a look here?

Now, I also thought this would be the end of it. But the voices in my head reminded me to do one more thing. And that’s using non-adjacent values. Thus I modified my enum in the following fashion.

        enum WhoGivesACrap
        {
            I_do = 17,
            Nope_I_dont = 57,
            Maybe_NotSure = 945
        }

And hooked up ildasm again to figure out what happened this time. Now, my values are non adjacent. It doesn’t really make sense anymore to create around 900+ entry jump table.

//000014:
//000015:             switch(whoGivesACrap)
    IL_0004:  ldloc.0
    IL_0005:  stloc.1
    .line 16707566,16707566 : 0,0 ''
//000016:             {
//000017:                 case WhoGivesACrap.I_do:
//000018:                     System.Console.WriteLine($"{whoGivesACrap}");
//000019:                     break;
//000020:                 case WhoGivesACrap.Maybe_NotSure:
//000021:                     System.Console.WriteLine($"{whoGivesACrap}");
//000022:                     break;
//000023:                 case WhoGivesACrap.Nope_I_dont:
//000024:                     System.Console.WriteLine($"{whoGivesACrap}");
//000025:                     break;
//000026:             }
//000027:         }
//000028:     }
//000029: }
    IL_0006:  ldloc.1
    IL_0007:  ldc.i4.s   17
    IL_0009:  beq.s      IL_001e

    IL_000b:  br.s       IL_000d

    IL_000d:  ldloc.1
    IL_000e:  ldc.i4.s   57
    IL_0010:  beq.s      IL_004e

    IL_0012:  br.s       IL_0014

    IL_0014:  ldloc.1
    IL_0015:  ldc.i4     0x3b1
    IL_001a:  beq.s      IL_0036

    IL_001c:  br.s       IL_0066

    .line 18,18 : 21,66 ''
//000018:                     System.Console.WriteLine($"{whoGivesACrap}");

And I wasn’t wrong. Since the values are non-adjacent now CSC opted for loading the enums separately and used beq.s which stands for branch-if-equal in short form. Technically this looks like a vanilla if-else-if block. Although I can’t possibly say CSC would generate opcodes following this and this only.

I know this whole thing might sound insanely boring while performance wise this would only make a minuscule difference. Still, it’s always fun to answer the voices inside my head and they are much happier when they have something to hold against when they ask why.

Until next time!

Porting your WebApi 2.2 app to Azure Service Fabric

Now, to start talking about this, you gotta know why this is spawned off instead of spawning off a “hello world” to service fabric. I’m back to writing after a long time and it’s only fitting I do share what I have been doing meanwhile.

Usually in a production scenario you’d end up with a case where you have your applications lying around in several tiers. To be honest, I’m pointing reference to a production environment which is microservices driven. If you’re reading this and questioning why I did that you probably want to go down here to get yourself started with the basic concepts of Azure Service Fabric.

Now, I hope you have familiarized yourself with service fabric by now and I can start talking about how you can port your existing IIS hosted Asp.net Web Api 2.2 application to a self-hosted statless web api service in Azure Service Fabric.

To get yourself started please install Azure Service Fabric local development environment in your machine from here and go through the set of instructions to get yourself started. When you’re done with these, you’d see Visual Studio come up with a new set of project types for service fabric.

And it would pretty much like the following:

servicefabricprojecttype

Now, before we click the that elusive OK button, we need to understand what are the tasks at hand here. We have to do two things here.

1. Make sure our web api 2.2 app is compatible with self host as it was using IIS.

2. Make sure we can salvage the same Visual Studio project we used for our web project.

Making sure the existing web api 2.2 app is compatible of self hosting:

  1. For number one, one might think it should be farely easy to port a IIS driven web api app to a self hosted app, the reality might not be that easy. First, make sure you have a OWIN startup class initiated. Usually you would have one if you are using OWIN. And you’d also need a Program.cs file which is standard one for a console application as now your web app would be  a self-hosted application. So go ahead and add these two files to your web project and if you need reference, both of them are shown here.
  2. Now, although you have a startup.cs and Program.cs initiated already, you still haven’t converted your web app project to a console application. To do so, please go over your project properties and  on the Applications section, select Output Type as Console Application and set the Program class as the Startup object. Now, if you forget to add Program.cs as startup object which you’d only be able to do after you have created that file with a static void Main(string[] args) initiated on it, as long as it is named Program.cs the project would invoke it automatically.
  3. If you are using anything from Microsoft.Owin.Host.SystemWeb library, please make sure that these wont work anymore for you. For example if you are using code segments like the following to resolve file path to your traditional web app deployment and subsidiary folders like App_data, it wont work anymore for you.
     string path = System.Web.Hosting.HostingEnvironment.MapPath(@"~/App_Data/EmailTemplates/");
    

    So for things like these you might have to resort yourself to solutions that resolves deployment path in a console application.

  4. Now, there are some other pranks too. If you are familiar with Asp.net Identity and use it for your default identity provider, you’re in for a treat. Usually when you develop a expirable token generation paradigm for email and passwords, you’d need to use a IDataProtectionProvider which is usually the MachineKeyDataProtectionProvider under the namespace of Microsoft.Owin.Host.SystemWeb.DataProtection which you’d probably ditch because you would be using owin self host/Katana now. So, expect a null there where you try something like app.GetDataProtectionProvider() where app is your IAppBuilder. I ran into this myself and thanks to Katana being open source, you just pick up the MachineKeyDataProtectionProovider class from here.Just make sure you use it as  IDataProtectionProvider like the following:
    using System;
    using System.Web.Security;
    using Microsoft.Owin.Security.DataProtection;
    
    namespace TaskCat.Lib.DataProtection
    {
        using DataProtectionProviderDelegate = Func<string[], Tuple<Func<byte[], byte[]>, Func<byte[], byte[]>>>;
        using DataProtectionTuple = Tuple<Func<byte[], byte[]>, Func<byte[], byte[]>>;
    
        ///
    <summary>
        /// Used to provide the data protection services that are derived from the MachineKey API. It is the best choice of
        /// data protection when you application is hosted by ASP.NET and all servers in the farm are running with the same Machine Key values.
        /// </summary>
    
        internal class MachineKeyDataProtectionProvider: IDataProtectionProvider
        {
            ///
    <summary>
            /// Returns a new instance of IDataProtection for the provider.
            /// </summary>
    
            /// <param name="purposes">Additional entropy used to ensure protected data may only be unprotected for the correct purposes.</param>
            /// <returns>An instance of a data protection service</returns>
            public virtual MachineKeyDataProtector Create(params string[] purposes)
            {
                return new MachineKeyDataProtector(purposes);
            }
    
            public virtual DataProtectionProviderDelegate ToOwinFunction()
            {
                return purposes =>
                {
                    MachineKeyDataProtector dataProtecter = Create(purposes);
                    return new DataProtectionTuple(dataProtecter.Protect, dataProtecter.Unprotect);
                };
            }
    
            IDataProtector IDataProtectionProvider.Create(params string[] purposes)
            {
                return this.Create(purposes);
            }
        }
    }
    

    And you’d need to do the same with MachineKeyDataProtector class and use it as a IDataProtector like the following:

    using System.Web.Security;
    using Microsoft.Owin.Security.DataProtection;
    
    namespace TaskCat.Lib.DataProtection
    {
        internal class MachineKeyDataProtector: IDataProtector
        {
            private readonly string[] _purposes;
    
            public MachineKeyDataProtector(params string[] purposes)
            {
                _purposes = purposes;
            }
    
            public virtual byte[] Protect(byte[] userData)
            {
                return MachineKey.Protect(userData, _purposes);
            }
    
            public virtual byte[] Unprotect(byte[] protectedData)
            {
                return MachineKey.Unprotect(protectedData, _purposes);
            }
        }
    }
    
    

    Then you can use it as your replacement of your old one from Microsoft.Owin.Host.SystemWeb.DataProtection. This is not really needed if you don’t use Asp.net Identity but of to good use if you land in the same problem.Now, hopefully your current web api app would be self hosted just fine if you try something like the following:

    // Start OWIN host
    using (WebApp.Start<Startup>(url: baseAddress))
    {
      Console.WriteLine("Press ENTER to stop the server and close app...");
      Console.ReadLine();
    }
    

Now, lets try salvaging that Visual Studio project as much as we can.

Project changes when it comes to Visual Studio

  1. Now, you can use the converted project but from my personal experience it would be easier for you if you just click the new ServiceFabric project button now and create a new Stateless WebApi project. You can port/reuse most of your code and as the boilerplates are already written, you don’t really have to rewrite that.
  2. If you want to keep the old project intact, you can of course reference the old project in the new stateless web api service project and use your old startup class to hook up the OwinCommunicationListener instead of the one that it comes with. But beware, nuget might make it a nightmare for you if dependencies are not met/mismatched.

Hope you guys have fun with Azure Service Fabric and if you really want to look for more how you can do you own stateless web api from scratch take a look here. If I find time, I’d write one too.