Monday, October 27, 2014

Using Stanford NLP to run a sentiment analysis with F#

Stanford NLP is a great tool for text analysis and Sergey Tihon did a great job demonstrating how it can be called from .NET code with C# and F#.

Purpose of this post is to show how StanfordNLP sentiment analysis can be called from F# application. Code used in this example provides sentiment value - from very negative to very positive - for all sentences of the specified text.

Prerequisites:

    -Nuget Stanford.NLP.CoreNLP package needs to be installed (this code works with 3.4.0.0)  
    -Java binaries should be downloaded from http://nlp.stanford.edu/software/stanford-corenlp-full-2014-06-16.zip and unzipped. After that you need to extract content of stanford-corenlp-3.4-
models.jar(it is part of the zip file)to some directory.  

Source code:

    F# code is elegant as usual :)


    To call this method you can use following code, where models location should be set to modelsDir variable:


    Enjoy!

Monday, June 16, 2014

F# - getting function name

Getting F# function name and method info object is tricky but possible:


Unfortunately, this code only works when F# compiler does not inline function body into calling method.

Tuesday, February 26, 2013

Advanced constructor injection with Autofac

Autofac is a great dependency injection framework where lots of features get on with ease of use. However configuring constructor parameters sometimes is not trivial.

Lets imagine following requirement: we need to inject NLog logger into our component and set logger's name according to our component type. Simple but arguable solution is to inject implementation of service locator pattern, namely Nlog's LogFactory class, but it introduces repeatable code and clogs our components with extra responsibility of creating loggers for themselves.

Good news is that it is possible to group functionality for creating loggers into a single place and inject right logger straight into constructor. Consider following component class:
It is marked by custom attribute of TypedLoggerAttribute type which means that logger should be injected into constructor. Configuration of injected logger is implemented as an Autofac module: This module will prepare preconfigured loggers for all constructors of target components. Use following code to see how it works:

Monday, December 3, 2012

Extending C# using Roslyn project

Microsoft Roslyn project provides amazing possibilities to take a deeper look into C# compiler. You can not only examine intermediate steps of the compilation, but also extend code compilation with your own syntax constructs or semantic rules. But even this is not full list of features available, you can also work with current visual studio workspace and this makes writing your own intelliSense almost a trivial task ;). Here is good Microsoft overview of this project.
Using Roslyn project I've implemented F# language feature which I am missing in c# - short collections initialization. In F# it is possible to initialize list in a following way:
let myList = [1;2;3]
which is in my opinion much more convenient in comparing to C# construct:
var myList = new List{1;2;3} 

So I decided to introduce following declaration for C#:
var myList = [<1>]

In order to extend C# with this construct we have to implement two important steps:
1) Modify syntax tree created by Roslyn, replacing [<expr1...exprN>] with new List<_>{expr1...exprN}

2) Infer type parameter for a list, and replace _ with this type.

SyntaxTree is a read-only construct and in fact is an extended AST. Roslyn API allows you to get updated instance of it using Visitor design pattern, so by overriding specific method of base SyntaxRewriter class you can modify this specific node of the SyntaxTree. In our case we have to override VisitElementAccessExpression method because our declaration is in fact indexer expression in C# for parser. In this override we need to define whether this expression is usual VisitElementAccessExpression or it is our list declaration. We use knowledge that first argument is a LessThenExpression and the last one is a GreaterThenExpression. So extraction will be following:
After we got expressions which correspond to elements of our list, we have to infer type parameter for our list. I used simple rule for that - take type of the first expression so C# compiler will make all dirty work of checking types of remaining elements. For getting type of the first element in the list I used Roslyn's semantics services. Great thing there that I don't need to get all my current code compilable at this moment - otherwise it will not be simply possible. After type is extracted we form and return updated list initialization node:
After we modified our syntax tree, the only thing we need is to build demo exe file from it: Next step could possibly be integrating our new C# compiler into Visual Studio which can be performed by modifying BuildAction or CustomTool property for code.txt file in the solution explorer.
That is all, full project is on github.

Tuesday, November 27, 2012

NLog and logging abstractions

Consider following example: in your project you use third-party logging library and your developers are trying to log everything - so as a result it might be thousands of calls to that library in the code. Everything works well until something unexpected happens... For example it appears that new standards are set in your company and according to them only another logging library can be used. Or a critical issue in your library has been found and it forces you to switch from it to another implementation. It does not matter what causes change requirement - and changes like these are usual things in software development - usually you have to cope with them with minimal efforts.

I reckon it is good practice to introduce abstraction layer on top of a widely used third-party components such as logging. In fact this is a realization of letter "D" of SOLID principles. Depend upon Abstractions. Do not depend upon concretions - it says. And following this principle will ensure you that switching between third-party or our own components will be fast and painless.

Lets consider abstraction for my favorite .NET logging library NLog. First implementation that comes to mind is:
This abstraction gives possibility to create log messages with different priority and add exception information.
However if you implement this interface using NLog one important logger property will remain uninitialized - Name. This property is vital for identifying loggers and is often used for routing log messages and getting location in code where log message was created. While usually logger is initialized by calling LogManager.GetCurrentClassLogger() and this makes logger to have its Name referencing to a class where logger instance was initialized, there will be no use of this method when it is called inside of implementation of ILogger - all loggers will have same name.

The most straightforward approach is to add logger name parameter to abstraction and set it every time when call to logger is made. This requires extra coding and introduces complexity to your code. Not the best variant for utility services like logger.

Another solution of this problem is to set name with dependency injection framework. This is a good example how it can be done with Ninject.

Third approach became available after release of .NET 4.5 when set of caller attributes has been introduced. It does not require lot of coding and use of third-party libraries - you simply add string parameter with CallerFilePath attribute defined to all methods of the interface and its realization and initialize logger using processed value of this parameter. Resulting abstraction will be:
And Implementation:
When you create log message by calling for example Info() method you don't have to set value to sourceFilePath parameter - a compiler will put current path of the source file in the time of the compilation. All you need is to define your strategy how to match the path to your source file and logger's name- in my example I simply use short file name as a name of the logger.

Tuesday, October 9, 2012

Strongly-typed id for MongoDb C# driver

MongoDb C# driver already has a lot of features however it still can be improved.
One particular idea is to use strongly typed id field instead of generic ObjectId type which will lead to checking correct id type at a compile-time and not in a run-time. For example, using id of this type you can ensure that you pass id of a product to a GetProductById method. Here is how it can be implemented (requires MongoDb c# driver) :
MongoDb requires id fields to be initialized when document is saved to database and in order to make developer's life easier and the code cleaner they provide mechanism for generation id when it does not exist. Here is its implementation for id: After that id field can be declared in a following way. Note that after generator attribute has been set there is no need to set product id in a product constructor - it will be created when entity is saved to the database.

Tuesday, October 2, 2012

Writing multiline log messages into single line in NLog

I prefer to avoid writing single log message into several lines. Doing that allows me to analyze my logs using console tools like grep more efficiently.
While it is easy to meet this convention in custom log messages, user has to solve this problem for text which is put by several Nlog layout renderers, for example exception stacktrace renderer. To solve this problem I use replace layout wrapper. Take a look to a following nlog config: In this example I use my custom verbose_inline layout that replaces all newlines to "->" string. After that I can easily sort out statistics for e.g. specific exception using grep command like: