Monday, December 3, 2012

Extending C# using Roslyn project

Microsoft Roslyn project provides amazing possibilities to take a deeper look into C# compiler. You can not only examine intermediate steps of the compilation, but also extend code compilation with your own syntax constructs or semantic rules. But even this is not full list of features available, you can also work with current visual studio workspace and this makes writing your own intelliSense almost a trivial task ;). Here is good Microsoft overview of this project.
Using Roslyn project I've implemented F# language feature which I am missing in c# - short collections initialization. In F# it is possible to initialize list in a following way:
let myList = [1;2;3]
which is in my opinion much more convenient in comparing to C# construct:
var myList = new List{1;2;3} 

So I decided to introduce following declaration for C#:
var myList = [<1>]

In order to extend C# with this construct we have to implement two important steps:
1) Modify syntax tree created by Roslyn, replacing [<expr1...exprN>] with new List<_>{expr1...exprN}

2) Infer type parameter for a list, and replace _ with this type.

SyntaxTree is a read-only construct and in fact is an extended AST. Roslyn API allows you to get updated instance of it using Visitor design pattern, so by overriding specific method of base SyntaxRewriter class you can modify this specific node of the SyntaxTree. In our case we have to override VisitElementAccessExpression method because our declaration is in fact indexer expression in C# for parser. In this override we need to define whether this expression is usual VisitElementAccessExpression or it is our list declaration. We use knowledge that first argument is a LessThenExpression and the last one is a GreaterThenExpression. So extraction will be following:
private static List<ExpressionSyntax> GetListCollectionInitializerElements(ElementAccessExpressionSyntax node)
{
var arguments = node.ArgumentList.Arguments;
if (arguments.Count == 0)
return null;
if (arguments.Count == 1)
{
var arg = arguments[0];
//should be greaterThen expression containing lessthen expression
var greaterThenBinaryExpression = arg.Expression as BinaryExpressionSyntax;
if (greaterThenBinaryExpression == null || greaterThenBinaryExpression.OperatorToken.Kind != SyntaxKind.GreaterThanToken)
return null;
var lessThenBinaryExpression = greaterThenBinaryExpression.ChildNodes().OfType<BinaryExpressionSyntax>().FirstOrDefault();
if (lessThenBinaryExpression == null || lessThenBinaryExpression.OperatorToken.Kind != SyntaxKind.LessThanToken)
return null;
var result = lessThenBinaryExpression.ChildNodes().OfType<ExpressionSyntax>().SingleOrDefault(child => !child.IsMissing);
if (result == null)
{
//we are dealing with [<>] construct - returning empty list
return new List<ExpressionSyntax>();
}
return new List<ExpressionSyntax> { result };
}
else
{
var first = arguments[0].Expression as BinaryExpressionSyntax;
var last = arguments.Last().Expression as BinaryExpressionSyntax;
if (first == null || first.Kind != SyntaxKind.LessThanExpression
|| last == null || last.Kind != SyntaxKind.GreaterThanExpression)
{
return null;
}
var result = new List<ExpressionSyntax> { first.Right, last.Left };
var totalArgs = arguments.Count;
result.InsertRange(1, arguments.Skip(1).Take(totalArgs - 2).Select(arg => arg.Expression));
return result;
}
}
After we got expressions which correspond to elements of our list, we have to infer type parameter for our list. I used simple rule for that - take type of the first expression so C# compiler will make all dirty work of checking types of remaining elements. For getting type of the first element in the list I used Roslyn's semantics services. Great thing there that I don't need to get all my current code compilable at this moment - otherwise it will not be simply possible. After type is extracted we form and return updated list initialization node:
public override SyntaxNode VisitElementAccessExpression(ElementAccessExpressionSyntax node)
{
var elements = GetListCollectionInitializerElements(node);
if (elements != null)
{
if (elements.Count > 0)
{
var type = GetArgumentType(elements[0]);
var syntaxList = new SeparatedSyntaxList<ExpressionSyntax>();
var intializerExpr = Syntax.InitializerExpression(SyntaxKind.CollectionInitializerExpression, syntaxList.Add(elements.ToArray()));
return Syntax.ParseExpression(string.Format("new System.Collections.Generic.List<{1}>{0}", intializerExpr, type));
}
else
{
//no elements of list - returning empty list of objects
return Syntax.ParseExpression("new System.Collections.Generic.List<Object>()");
}
}
return base.VisitElementAccessExpression(node);
}
private TypeSymbol GetArgumentType(ExpressionSyntax expression)
{
var info = semanticModel.GetTypeInfo(expression);
var resultantType = info.Type;
return resultantType;
}
After we modified our syntax tree, the only thing we need is to build demo exe file from it:
static void Main(string[] args)
{
var syntaxTree = SyntaxTree.ParseFile("code.txt");
var root = syntaxTree.GetRoot();
var newRoot = (CompilationUnitSyntax)(new ListsInitializerRewriter(GetSemanticsModel(syntaxTree)).Visit(root));
newRoot.Format(new FormattingOptions(false, 4, 4));
BuildExe(SyntaxTree.Create(newRoot));
}
static void BuildExe(SyntaxTree tree)
{
var result = GetCompilation(tree).Emit("test.exe");
Console.WriteLine("built into test.exe with success: {0}", result.Success);
}
private static SemanticModel GetSemanticsModel(SyntaxTree tree)
{
return GetCompilation(tree).GetSemanticModel(tree);
}
private static Compilation GetCompilation(SyntaxTree tree)
{
var mscorlib = MetadataReference.CreateAssemblyReference("mscorlib");
var compilation = Compilation.Create(
outputName: "HelloWorld",
syntaxTrees: new[] { tree },
references: new[] { mscorlib });
return compilation;
}
view raw RoslynDemo.cs hosted with ❤ by GitHub
Next step could possibly be integrating our new C# compiler into Visual Studio which can be performed by modifying BuildAction or CustomTool property for code.txt file in the solution explorer.
That is all, full project is on github.