Archive for the ‘Syntax+Highlighting’ Category

Syntax highlighting with MGrammar


Since I started exploring the possibilities of the various bits of codename “Oslo”, there has been one thing that has really annoyed me (and this is not Oslo’s fault). The lack of a decent tool to do syntax highlighting of M, MGrammar & custom DSLs is vital to be able to communicate the intentions of a bit of source code when you blog about it.

Since I’ve been using the bits in System.Dataflow in a couple of projects now, I knew of the existence of the Lexer etc. in the assembly. I started to investigate further with .NET Reflector and found one class that seemed quite relevant for the tool I wanted to write; System.Dataflow.LexerReader. You initialize the LexerReader with a ParserContext and a stream of input data (typically the source code) and iterate over the tokens that the Lexer discover.

So, the basic requirements for the utility I wanted to create were:

  • Take a compiled MGrammar (Mgx) as input.
  • Take a piece of source code that complies to the MGrammar as input.
  • Output a HTML fragment with syntax highlighted source code.

Since the MGrammar language has a notion of attributes, and more specific; supports the @{Classification} attribute that lets the language developer classify/group the different tokens into Keywords, Literals, Strings, Numerics etc., I started digging into the System.Dataflow to hopefully find a mechanism to retrieve the metadata during the Lexing phase.

After some hours of intensive searching with .NET Reflector and the Visual Studio debugger, I found the solution; when you iterate over the LexerReader instance, you end up with ParseTokenReference instances that both describes the token and its content. It doesn’t contain the classification information directly, and that was the big puzzle I had to solve. It turned out that the DynamicParser instance, that I used to load up the Mgx file and build the ParseContext had a GetTokenInfo() method that took an integer as the only parameter; tokenTag – and the ParseTokenReference instance had a .Tag property. Bingo!

So, I’ve put together a small spike that I’m intending to clean up – it’s located here at the moment and will be licensed under the Apache License.

Below is a sample output  from the utility – the input is a MGrammar that I wrote for a answer to a thread in the Oslo/MSDN forum.

For the first version it will probably be a command line tool – but it would probably be a good idea to create both a ASP.NET frontend and a Windows Live Writer addin for it.

module LarsW.Languages
    language nnnAuthLang
        syntax Main = ar:AuthRule* => Rules { valuesof(ar) };
        syntax AuthRule = ad:AllowDeny av:AuthVerb
            tOpenParen rl:RoleList tCloseParen tSemiColon
                          => AuthRule { Type {ad}, AuthType{av}, Roles
                          { valuesof(rl)} };
        syntax RoleList = ri:RoleItem  => List { ri }
                        | ri:RoleItem tComma rl:RoleList
                          => List { ri, valuesof(rl) };
        syntax RoleItem = tRoleName;
        syntax AllowDeny = a:tAllow => a
                         | d:tDeny => d;
        syntax AuthVerb = tText;
        token tText = ("a".."z"|"A".."Z")+;
        @{Classification["Keyword"]}token tAllow = "Allow";
        @{Classification["Keyword"]}token tDeny = "Deny";
        token tOpenParen = "(";
        token tCloseParen = ")";
        token tSemiColon = ";";
        token tComma = ",";
        token Whitespace = " "|"\t"|"\r"|"\n";
        token tRoleName = Language.Grammar.TextLiteral;
        interleave Skippable = Whitespace;

kick it on