Archive for March, 2009

Syntax highlighting with MGrammar

4 Comments »

Since I started exploring the possibilities of the various bits of codename “Oslo”, there has been one thing that has really annoyed me (and this is not Oslo’s fault). The lack of a decent tool to do syntax highlighting of M, MGrammar & custom DSLs is vital to be able to communicate the intentions of a bit of source code when you blog about it.

Since I’ve been using the bits in System.Dataflow in a couple of projects now, I knew of the existence of the Lexer etc. in the assembly. I started to investigate further with .NET Reflector and found one class that seemed quite relevant for the tool I wanted to write; System.Dataflow.LexerReader. You initialize the LexerReader with a ParserContext and a stream of input data (typically the source code) and iterate over the tokens that the Lexer discover.

So, the basic requirements for the utility I wanted to create were:

  • Take a compiled MGrammar (Mgx) as input.
  • Take a piece of source code that complies to the MGrammar as input.
  • Output a HTML fragment with syntax highlighted source code.

Since the MGrammar language has a notion of attributes, and more specific; supports the @{Classification} attribute that lets the language developer classify/group the different tokens into Keywords, Literals, Strings, Numerics etc., I started digging into the System.Dataflow to hopefully find a mechanism to retrieve the metadata during the Lexing phase.

After some hours of intensive searching with .NET Reflector and the Visual Studio debugger, I found the solution; when you iterate over the LexerReader instance, you end up with ParseTokenReference instances that both describes the token and its content. It doesn’t contain the classification information directly, and that was the big puzzle I had to solve. It turned out that the DynamicParser instance, that I used to load up the Mgx file and build the ParseContext had a GetTokenInfo() method that took an integer as the only parameter; tokenTag – and the ParseTokenReference instance had a .Tag property. Bingo!

So, I’ve put together a small spike that I’m intending to clean up – it’s located here at the moment and will be licensed under the Apache License.

Below is a sample output  from the utility – the input is a MGrammar that I wrote for a answer to a thread in the Oslo/MSDN forum.

For the first version it will probably be a command line tool – but it would probably be a good idea to create both a ASP.NET frontend and a Windows Live Writer addin for it.

module LarsW.Languages
{
    language nnnAuthLang
    {
        syntax Main = ar:AuthRule* => Rules { valuesof(ar) };
        syntax AuthRule = ad:AllowDeny av:AuthVerb
            tOpenParen rl:RoleList tCloseParen tSemiColon
                          => AuthRule { Type {ad}, AuthType{av}, Roles
                          { valuesof(rl)} };
        syntax RoleList = ri:RoleItem  => List { ri }
                        | ri:RoleItem tComma rl:RoleList
                          => List { ri, valuesof(rl) };
        syntax RoleItem = tRoleName;
        syntax AllowDeny = a:tAllow => a
                         | d:tDeny => d;
        syntax AuthVerb = tText;
        token tText = ("a".."z"|"A".."Z")+;
        @{Classification["Keyword"]}token tAllow = "Allow";
        @{Classification["Keyword"]}token tDeny = "Deny";
        token tOpenParen = "(";
        token tCloseParen = ")";
        token tSemiColon = ";";
        token tComma = ",";
        token Whitespace = " "|"\t"|"\r"|"\n";
        token tRoleName = Language.Grammar.TextLiteral;
        interleave Skippable = Whitespace;
    }
}

kick it on DotNetKicks.com


Celebrate Ada Lovelace Day!

No Comments »

20 minutes ago, this Twitter appeared in my TweetDeck. I’m currently watching the re-run of the first keynote of Mix’09 on one monitor, and idly checking twitter on another one.

image

So, for those of you that doesn’t know Jennifer; She’s a Developer Evangelist working out of Ann Arbor, Michigan for Microsoft. I had the chance to meet her again during the Global MVP Summit, and the thing that amazes me about her is her how she treat the people around her. That plus her exceptional technical skills and her work for Women in Technology makes her a woman in Technology I admire :-)


Parsing the command line with MGrammar – part 2

2 Comments »

In the first installment of this series we took a look at the basic grammar for parsing the command line with MGrammar. In this part I’ll show you how we can load in a compiled version of the MGrammar and parse the input (i.e. the command line) to produce a valid MGraph that we in turn can process in the backend code.

A quick reminder from part 1; the code is located here:
http://github.com/larsw/larsw.commandlineparser

You can download the code either by using git or downloading it as an archive. Once you’ve done that, open the solution LarsW.CommandLineParser.sln in Visual Studio 2008.

imageMost likely you will be presented with the following dialog box, informing you that opening the solution (or more correct the LarsW.CommandLineParser C# project inside) can pose a security risk. The reason for this is that I’ve included the a MSBuild task for compiling MGrammar files (.mg) into .mgx is that included in the Oslo SDK. Select the “Load project normally” and press OK.

We can first take a look at the extra plumbing I’ve added to the project to get the .mg file to compile. Right-click the LarsW.CommandLineParser project in the Solution Explorer, and choose Unload Project. Next, right-click it again, and choose Edit LarsW.CommandLineparser.csproj. This should bring up the project file will be shown as raw XML in the editor window.

In the first <PropertyGroup> I’ve added seven lines that I borrowed from a project created with the “M” template. They basically set’s up the path to various M-specific tools and auxiliary files.

The only line of these that really matter and that I had to tweak in order to get this right is the <MgTarget> element. Out of the box this is set to Mgx, that instructs the Mg compiler to spit out the result of the compilation as a .mgx file. As we will see later, the value needs to be set to MgResource in order to get the DynamicParser to load the .mgx as a resource.

If you navigate to the end of the project file, I’ve also added an <Import> element that imports some MGrammar specific MSBuild tasks and the most important thing; in the last <ItemGroup> section I’ve changed the element type from <None> to <MgCompile> for the cmd.mg file.

Well, we’ve been mucking around in the MSBuild plumbing too long now, haven’t we? Right-click the project again and choose Reload Project. When the project has loaded up again, build to ensure that everything is fine and dandy. Even though I haven’t stated it before, it should be obvious that the project depends on the latest (as of now that is the January 2009 CTP Refresh) Oslo SDK.

The core component is the CommandLineProcessor class.

It loads up the language (the compiled version of the cmd.mg) with DynamicParser.LoadFromResource(). The reason why we had to specify MgxResource as the MgTarget earlier is that if we don’t, and add the compiled .mgx file as a plain resource, the .LoadFromResource() method won’t find it. As of now, it seems that it will only look for resources with the .resource extension.

We then pass in the command line with a StringReader instance to the .Parse<T>() method on the DynamicParser instance. Even though it’s not specified, the T has to be object or a type that implements System.Dataflow.ISourceInfo. The internal/inner Node classes in GraphBuilder is what that will be handed out per default, but you can also create your own GraphBuilder and produce nodes from your own domain model.

So, by calling parser.Parse<object>(null, commandLineReader, ErrorReporter.Standard) we will get an instance to the root of the Abstract Syntax Tree (AST) returned if the input matches the grammar. The AST is basically a representation of the MGraph.

The next step is to traverse the AST and act upon the different node types. The grammar for this project is quite trivial and is mostly done by the private ProcessParameter() method in the CommandLineProcessor class. I suggest that you take a look at it if you’re interested in doing something similar.

So, just create an instance of the CommandLineProcessor and pass in an instance of an arbitrary class that contains method that will handle the command line arguments. To specify that a method is a argument handler, decorate it with the CommandLineArgumentHandler attribute. It will take in three parameters; short form & long form of the argument keyword and a description. For now the description isn’t used for anything but the idea is that the command line processor can auto generate a usage screen for you (typically shown with –?).

That’s about it – if you find it useful or modify the code, please let me know. With git you can push me a change set and I will try to merge it if you’ve come up with a cool feature.


Parsing the command line with MGrammar – part 1

1 Comment »

Let’s take a look at how we can use MGrammar to create a mini-DSL for a language most developers knows quite well; command line arguments. Most applications that accepts arguments on the command line in Windows (or in Linux/Un*x for that matter) is on the form:

Application.exe /a /b 123 /c “some input string goes here”

Some applications uses / as the “marker” that an argument is following, while other use - or . It is also quite common to allow both a verbose and an abbreviated version of the same command.

Well, that was the Command line 101. Here’s a brief explanation and some code on how we can do this with MGrammar + C#.

Here’s a screenshot of Intellipad where the MGrammar for the command line parsing DSL is displayed in the second pane (Click the image to show the picture in full size):

image If anyone has a Windows Live Writer plugin that does syntax highlighting of of M & MGrammar – please send me and email :-)

[Side note: Since the grammar for M & MGrammar is shipped as a part of the Samples in the Oslo SDK, it should be quite easy to put together a basic HTML syntax highlighter for both languages by loading the compiled grammar up and use the Lexer in System.Dataflow. Note to self: Investigate this further]

If you’re not familiar with MGrammar, I’ll walk through the cmd.mg for you. The general idea is that MGrammar helps you transform text (the input) into MGraph, a Directed-Label Graph, that can contain ordered an unordered nodes. The MGraph can then be traversed and acted upon.

The language CommandLineLang resides in a module named LarsW.Languages. the module keyword works pretty much as namespace NNN in C# and is used to divide the world into smaller pieces. Things that lives inside a module might be exposed to the outside by using the export keyword (not shown in the example) and thing from the outside might be welcomed in by using the import keyword.

The same way void Main(string[] args) is the default entry point in a C# application, syntax Main = …; Is the entry point in a MGrammar-based language.

In general, there are two things we need to work with in a MGrammar; syntax  and token statements. Last thing first; tokens are regular languages (regular expressions) where you can define the set of characters that will make a match using unicode characters, sequences of these and the normal Kleen operators; ? for optional elements, * for zero-to-many and + for one-to-many. Paranthesis – () – can be used for grouping of sets and | is used for choosing between two options. If you are familiar with regular expressions, writing tokens should be quite easy. Not that you can, and will, write the tokens in a hierarchical fashion, since your grammar would turn into a complete mess if you have to expand a lot of the regular expressions.

Syntax rules describe the context languages and can be made up by tokens and other syntax elements. You also have the possibility to project the matched tokens differently with the => operator. Without this you would have to do a lot more of coding in your backend code, so you will definitely want to exercise your grammar in Intellipad with some samples until you’re satisfied with the MGraph it outputs.

   syntax Rule = tToken tString tStatementTerminator;
   syntax Rule = tToken string:tString tStatementTerminator
             => Rule { Value { string }};

While not used in this sample, recursive rules is an essential building block in order to build grammars that can consume things like a comma-separated list (or repeating elements in general).

A repeating rule can look something like this:

   syntax Items = item:Item => Items {item}
                | items:Items item:Item
                 => Items {items, item};
   syntax Item = ...;

As you probably notice, the Items rule is used inside itself – so this rule is recursive. The “problem” with this type of syntax rules is that they produces nested nodes in the MGraph. This isn’t really a problem, but it makes the traversing in the backend more tedious. To mitigate this, The Oslo team came up with the valuesof() construction that will “flatten” out a set of hierarchical nodes for you:

   syntax Items = item:Item => Items {item}
                | items:Items item:Item
                  => Items {valuesof(items), item};

The Interleave keyword basically tells the lexer which tokens it can ignore. This will typically be whitespace and comments.

So now that we know some of  the basics, lets take a look at cmd.mg again. It basically consists of four syntax rules and four token rules. I’ve applied custom projections to most of the rules so that the MGraph production looks reasonable sane.

In the next installment of this series I will discuss how we can create a backend that will consume the MGraph and take action on the command line parameters.

The source is released under the Apache License 2.0 and can be found here: http://github.com/larsw/larsw.commandlineparser . This is the first project I release on GitHub, and if the experience is good, I believe I’ll continue to use it.

There’s a Download button that you can use to download either a zip or tar ball of the source tree, if you haven’t installed git.

kick it on DotNetKicks.com


Awakening from the Winter Hibernation

No Comments »

First of all; I would like to apologize for not producing any new content on this blog the last couple of month. There are many reasons for that; a lot of work for my current customer, being together with my son as much as possible – and the probably most important one; it’s been a really dark, cold and snowy winter here i Norway :-( I’m no bear – but I feel that I’ve been hibernating this winter. Since the days is getting longer, I need to do as the bear; wake up from the hibernation and get “online” again. For those of you that follow me on Twitter, you’ve probably noticed that that is almost the only social networking platform that I’ve used for a couple of months.

Another reason why I haven’t been around much either here or in the WCF forum is that I’ve gone into learning mode – and for the time being I’m focused at Codename “Oslo”.

I had the chance to attend the Global MVP Summit in Seattle last week and it was a blast! The whole conference was under a NDA that most people that use Twitter probably noticed, so I can’t go into a lot of details that was presented during keynotes and sessions.

But it’s not a secret that I’m a Connected Systems Developer MVP – and because of that, I “belong” to the Connected Systems Division. CSD as it is also called owns some great existing products like BizTalk, WCF & WF but the currently most hyped project is Codename “Oslo”. Revealed at the Professional Developers Conference in LA last October as Microsoft new platform for model-driven development. They state on the Oslo Development center that we should expect a ten-fold productivity increase – something that they’ve still to prove to us.

A lot of people I’ve talked to are really confused about what it is – and what technologies that sits under the “Oslo” umbrella.

For a period, I will try to write some blog posts to explain what “Oslo” is, what it can be used for and what I see as its strong and weak points (at the moment). No good explanations without code you might say – and that is totally correct.

First off, I’ll start with MGrammar – a language in the M* family for creating Domain-Specific Languages. This might not be a top-down approach to the Oslo platform, but I want to show some working code – and MGrammar is the technology I’ve been focusing mostly the last weeks.