Monday, October 27, 2014

Using Stanford NLP to run a sentiment analysis with F#

Stanford NLP is a great tool for text analysis and Sergey Tihon did a great job demonstrating how it can be called from .NET code with C# and F#.

Purpose of this post is to show how StanfordNLP sentiment analysis can be called from F# application. Code used in this example provides sentiment value - from very negative to very positive - for all sentences of the specified text.

Prerequisites:

    -Nuget Stanford.NLP.CoreNLP package needs to be installed (this code works with 3.4.0.0)  
    -Java binaries should be downloaded from http://nlp.stanford.edu/software/stanford-corenlp-full-2014-06-16.zip and unzipped. After that you need to extract content of stanford-corenlp-3.4-
models.jar(it is part of the zip file)to some directory.  

Source code:

    F# code is elegant as usual :)

open System
open System.IO
open edu.stanford.nlp.ling
open edu.stanford.nlp.neural.rnn
open edu.stanford.nlp.sentiment
open edu.stanford.nlp.trees
open edu.stanford.nlp.util
open java.util
open edu.stanford.nlp.pipeline
let classForType<'t> =
java.lang.Class.op_Implicit typeof<'t>
type SentimentPrediction =
| VeryNegative
| Negative
| Neutral
| Positive
| VeryPositive
let classToSentiment = function
| 0 -> VeryNegative
| 1 -> Negative
| 2 -> Neutral
| 3 -> Positive
| 4 -> VeryPositive
| _ -> failwith "unknown class"
let makeSentimentAnalyzer modelsDir =
let props = Properties()
props.setProperty("annotators", "tokenize, ssplit, pos, parse, sentiment") |> ignore
let currDir = Environment.CurrentDirectory
Directory.SetCurrentDirectory modelsDir
let pipeline = StanfordCoreNLP(props)
Directory.SetCurrentDirectory currDir
fun text ->
(pipeline.``process`` text).get classForType<CoreAnnotations.SentencesAnnotation> :?> ArrayList
|> Seq.cast<CoreMap>
|> Seq.map(fun cm -> cm.get classForType<SentimentCoreAnnotations.AnnotatedTree>)
|> Seq.cast<Tree>
|> Seq.map (RNNCoreAnnotations.getPredictedClass >> classToSentiment)
|> Seq.toList
view raw sentiment1.fs hosted with ❤ by GitHub

    To call this method you can use following code, where models location should be set to modelsDir variable:

[<EntryPoint>]
let main argv =
let text = "awesome great this text is so exciting! this is disgusting sentence number two.";
let modelsDir = @"C:\tmp\stanford-corenlp-full-2014-06-16\models";
let analyzer = makeSentimentAnalyzer modelsDir
printfn "%A" (analyzer text)
0 // return an integer exit code
view raw sentiment2.fs hosted with ❤ by GitHub

    Enjoy!

Monday, June 16, 2014

F# - getting function name

Getting F# function name and method info object is tricky but possible:

let getFunctionName f =
let type' = f.GetType()
let method' = type'.GetMethods() |> Array.find (fun m -> m.Name="Invoke")
let il = method'.GetMethodBody().GetILAsByteArray()
let methodCodes = [byte OpCodes.Call.Value;byte OpCodes.Callvirt.Value]
let position = il |> Array.findIndex(fun x -> methodCodes |> List.exists ((=)x))
let metadataToken = BitConverter.ToInt32(il, position+1)
let actualMethod = type'.Module.ResolveMethod metadataToken
sprintf "%s.%s" actualMethod.DeclaringType.FullName actualMethod.Name

Unfortunately, this code only works when F# compiler does not inline function body into calling method.

Tuesday, February 26, 2013

Advanced constructor injection with Autofac

Autofac is a great dependency injection framework where lots of features get on with ease of use. However configuring constructor parameters sometimes is not trivial.

Lets imagine following requirement: we need to inject NLog logger into our component and set logger's name according to our component type. Simple but arguable solution is to inject implementation of service locator pattern, namely Nlog's LogFactory class, but it introduces repeatable code and clogs our components with extra responsibility of creating loggers for themselves.

Good news is that it is possible to group functionality for creating loggers into a single place and inject right logger straight into constructor. Consider following component class:
[AttributeUsage(AttributeTargets.Class, AllowMultiple = false, Inherited = false)]
public class TypedLoggerAttribute : Attribute
{
}
[TypedLogger]
public class Worker
{
private readonly Logger logger_;
public Worker(Logger logger)
{
logger_ = logger;
}
public void Start()
{
logger_.Info("Starting work");
}
}
view raw container.cs hosted with ❤ by GitHub
It is marked by custom attribute of TypedLoggerAttribute type which means that logger should be injected into constructor. Configuration of injected logger is implemented as an Autofac module:
public class TypedLoggersInjector : Module
{
private readonly LogFactory logFactory_;
public TypedLoggersInjector(LogFactory logFactory)
{
logFactory_ = logFactory;
}
protected override void AttachToComponentRegistration(IComponentRegistry componentRegistry, IComponentRegistration registration)
{
var reflectionActivator = registration.Activator as ReflectionActivator;
if (reflectionActivator != null && reflectionActivator.LimitType.GetCustomAttributes(typeof(TypedLoggerAttribute), false).Length > 0)
{
var targetType = reflectionActivator.LimitType;
var namedParameters = reflectionActivator.ConstructorFinder.FindConstructors(targetType)
.SelectMany(c => c.GetParameters().Where(p => p.ParameterType == typeof(Logger)))
.Select(p => new NamedParameter(p.Name, logFactory_.GetLogger(targetType.FullName))).ToList();
if (namedParameters.Count > 0)
{
registration.Preparing += (sender, args) =>
{
args.Parameters = args.Parameters.Concat(namedParameters);
};
}
}
base.AttachToComponentRegistration(componentRegistry, registration);
}
}
view raw autof_module.cs hosted with ❤ by GitHub
This module will prepare preconfigured loggers for all constructors of target components. Use following code to see how it works:
public static IContainer ConfigureContainer()
{
var builder = new ContainerBuilder();
builder.RegisterType<Worker>();
builder.RegisterModule(new TypedLoggersInjector(new LogFactory()));
return builder.Build();
}
public static void TestInjection()
{
var container = ConfigureContainer();
var worker = container.Resolve<Worker>();
worker.Start();
}
view raw autof_usage.cs hosted with ❤ by GitHub

Monday, December 3, 2012

Extending C# using Roslyn project

Microsoft Roslyn project provides amazing possibilities to take a deeper look into C# compiler. You can not only examine intermediate steps of the compilation, but also extend code compilation with your own syntax constructs or semantic rules. But even this is not full list of features available, you can also work with current visual studio workspace and this makes writing your own intelliSense almost a trivial task ;). Here is good Microsoft overview of this project.
Using Roslyn project I've implemented F# language feature which I am missing in c# - short collections initialization. In F# it is possible to initialize list in a following way:
let myList = [1;2;3]
which is in my opinion much more convenient in comparing to C# construct:
var myList = new List{1;2;3} 

So I decided to introduce following declaration for C#:
var myList = [<1>]

In order to extend C# with this construct we have to implement two important steps:
1) Modify syntax tree created by Roslyn, replacing [<expr1...exprN>] with new List<_>{expr1...exprN}

2) Infer type parameter for a list, and replace _ with this type.

SyntaxTree is a read-only construct and in fact is an extended AST. Roslyn API allows you to get updated instance of it using Visitor design pattern, so by overriding specific method of base SyntaxRewriter class you can modify this specific node of the SyntaxTree. In our case we have to override VisitElementAccessExpression method because our declaration is in fact indexer expression in C# for parser. In this override we need to define whether this expression is usual VisitElementAccessExpression or it is our list declaration. We use knowledge that first argument is a LessThenExpression and the last one is a GreaterThenExpression. So extraction will be following:
private static List<ExpressionSyntax> GetListCollectionInitializerElements(ElementAccessExpressionSyntax node)
{
var arguments = node.ArgumentList.Arguments;
if (arguments.Count == 0)
return null;
if (arguments.Count == 1)
{
var arg = arguments[0];
//should be greaterThen expression containing lessthen expression
var greaterThenBinaryExpression = arg.Expression as BinaryExpressionSyntax;
if (greaterThenBinaryExpression == null || greaterThenBinaryExpression.OperatorToken.Kind != SyntaxKind.GreaterThanToken)
return null;
var lessThenBinaryExpression = greaterThenBinaryExpression.ChildNodes().OfType<BinaryExpressionSyntax>().FirstOrDefault();
if (lessThenBinaryExpression == null || lessThenBinaryExpression.OperatorToken.Kind != SyntaxKind.LessThanToken)
return null;
var result = lessThenBinaryExpression.ChildNodes().OfType<ExpressionSyntax>().SingleOrDefault(child => !child.IsMissing);
if (result == null)
{
//we are dealing with [<>] construct - returning empty list
return new List<ExpressionSyntax>();
}
return new List<ExpressionSyntax> { result };
}
else
{
var first = arguments[0].Expression as BinaryExpressionSyntax;
var last = arguments.Last().Expression as BinaryExpressionSyntax;
if (first == null || first.Kind != SyntaxKind.LessThanExpression
|| last == null || last.Kind != SyntaxKind.GreaterThanExpression)
{
return null;
}
var result = new List<ExpressionSyntax> { first.Right, last.Left };
var totalArgs = arguments.Count;
result.InsertRange(1, arguments.Skip(1).Take(totalArgs - 2).Select(arg => arg.Expression));
return result;
}
}
After we got expressions which correspond to elements of our list, we have to infer type parameter for our list. I used simple rule for that - take type of the first expression so C# compiler will make all dirty work of checking types of remaining elements. For getting type of the first element in the list I used Roslyn's semantics services. Great thing there that I don't need to get all my current code compilable at this moment - otherwise it will not be simply possible. After type is extracted we form and return updated list initialization node:
public override SyntaxNode VisitElementAccessExpression(ElementAccessExpressionSyntax node)
{
var elements = GetListCollectionInitializerElements(node);
if (elements != null)
{
if (elements.Count > 0)
{
var type = GetArgumentType(elements[0]);
var syntaxList = new SeparatedSyntaxList<ExpressionSyntax>();
var intializerExpr = Syntax.InitializerExpression(SyntaxKind.CollectionInitializerExpression, syntaxList.Add(elements.ToArray()));
return Syntax.ParseExpression(string.Format("new System.Collections.Generic.List<{1}>{0}", intializerExpr, type));
}
else
{
//no elements of list - returning empty list of objects
return Syntax.ParseExpression("new System.Collections.Generic.List<Object>()");
}
}
return base.VisitElementAccessExpression(node);
}
private TypeSymbol GetArgumentType(ExpressionSyntax expression)
{
var info = semanticModel.GetTypeInfo(expression);
var resultantType = info.Type;
return resultantType;
}
After we modified our syntax tree, the only thing we need is to build demo exe file from it:
static void Main(string[] args)
{
var syntaxTree = SyntaxTree.ParseFile("code.txt");
var root = syntaxTree.GetRoot();
var newRoot = (CompilationUnitSyntax)(new ListsInitializerRewriter(GetSemanticsModel(syntaxTree)).Visit(root));
newRoot.Format(new FormattingOptions(false, 4, 4));
BuildExe(SyntaxTree.Create(newRoot));
}
static void BuildExe(SyntaxTree tree)
{
var result = GetCompilation(tree).Emit("test.exe");
Console.WriteLine("built into test.exe with success: {0}", result.Success);
}
private static SemanticModel GetSemanticsModel(SyntaxTree tree)
{
return GetCompilation(tree).GetSemanticModel(tree);
}
private static Compilation GetCompilation(SyntaxTree tree)
{
var mscorlib = MetadataReference.CreateAssemblyReference("mscorlib");
var compilation = Compilation.Create(
outputName: "HelloWorld",
syntaxTrees: new[] { tree },
references: new[] { mscorlib });
return compilation;
}
view raw RoslynDemo.cs hosted with ❤ by GitHub
Next step could possibly be integrating our new C# compiler into Visual Studio which can be performed by modifying BuildAction or CustomTool property for code.txt file in the solution explorer.
That is all, full project is on github.

Tuesday, November 27, 2012

NLog and logging abstractions

Consider following example: in your project you use third-party logging library and your developers are trying to log everything - so as a result it might be thousands of calls to that library in the code. Everything works well until something unexpected happens... For example it appears that new standards are set in your company and according to them only another logging library can be used. Or a critical issue in your library has been found and it forces you to switch from it to another implementation. It does not matter what causes change requirement - and changes like these are usual things in software development - usually you have to cope with them with minimal efforts.

I reckon it is good practice to introduce abstraction layer on top of a widely used third-party components such as logging. In fact this is a realization of letter "D" of SOLID principles. Depend upon Abstractions. Do not depend upon concretions - it says. And following this principle will ensure you that switching between third-party or our own components will be fast and painless.

Lets consider abstraction for my favorite .NET logging library NLog. First implementation that comes to mind is:
public interface ILogger
{
void Info(string message);
void Info(string message, Exception exc);
void Debug(string message);
void Debug(string message, Exception exc);
void Warn(string message);
void Warn(string message, Exception exc);
void Error(string message);
void Error(string message, Exception exc);
void Fatal(string message);
void Fatal(string message, Exception exc);
void Trace(string message);
void Trace(string message, Exception exc);
}
view raw ILogger.cs hosted with ❤ by GitHub

This abstraction gives possibility to create log messages with different priority and add exception information.
However if you implement this interface using NLog one important logger property will remain uninitialized - Name. This property is vital for identifying loggers and is often used for routing log messages and getting location in code where log message was created. While usually logger is initialized by calling LogManager.GetCurrentClassLogger() and this makes logger to have its Name referencing to a class where logger instance was initialized, there will be no use of this method when it is called inside of implementation of ILogger - all loggers will have same name.

The most straightforward approach is to add logger name parameter to abstraction and set it every time when call to logger is made. This requires extra coding and introduces complexity to your code. Not the best variant for utility services like logger.

Another solution of this problem is to set name with dependency injection framework. This is a good example how it can be done with Ninject.

Third approach became available after release of .NET 4.5 when set of caller attributes has been introduced. It does not require lot of coding and use of third-party libraries - you simply add string parameter with CallerFilePath attribute defined to all methods of the interface and its realization and initialize logger using processed value of this parameter. Resulting abstraction will be:
public interface ILogger
{
void Info(string message, [CallerFilePath] string sourceFilePath = null);
void Info(string message, Exception exc, [CallerFilePath] string sourceFilePath = null);
void Debug(string message, [CallerFilePath] string sourceFilePath = null);
void Debug(string message, Exception exc, [CallerFilePath] string sourceFilePath = null);
void Warn(string message, [CallerFilePath] string sourceFilePath = null);
void Warn(string message, Exception exc, [CallerFilePath] string sourceFilePath = null);
void Error(string message, [CallerFilePath] string sourceFilePath = null);
void Error(string message, Exception exc, [CallerFilePath] string sourceFilePath = null);
void Fatal(string message, [CallerFilePath] string sourceFilePath = null);
void Fatal(string message, Exception exc, [CallerFilePath] string sourceFilePath = null);
void Trace(string message, [CallerFilePath] string sourceFilePath = null);
void Trace(string message, Exception exc, [CallerFilePath] string sourceFilePath = null);
}
view raw ILogger.cs hosted with ❤ by GitHub
And Implementation:
public class Logger : ILogger
{
private static NLog.Logger GetInnerLogger(string sourceFilePath)
{
var logger = sourceFilePath == null ? LogManager.GetCurrentClassLogger() : LogManager.GetLogger(Path.GetFileName(sourceFilePath));
return logger;
}
public void Info(string message, [CallerFilePath] string sourceFilePath = null)
{
GetInnerLogger(sourceFilePath).Info(message);
}
public void Info(string message, Exception exc, [CallerFilePath]string sourceFilePath = null)
{
GetInnerLogger(sourceFilePath).InfoException(message, exc);
}
public void Debug(string message, [CallerFilePath]string sourceFilePath = null)
{
GetInnerLogger(sourceFilePath).Debug(message);
}
public void Debug(string message, Exception exc, [CallerFilePath]string sourceFilePath = null)
{
GetInnerLogger(sourceFilePath).DebugException(message,exc);
}
public void Warn(string message, [CallerFilePath]string sourceFilePath = null)
{
GetInnerLogger(sourceFilePath).Warn(message);
}
public void Warn(string message, Exception exc, [CallerFilePath]string sourceFilePath = null)
{
GetInnerLogger(sourceFilePath).WarnException(message, exc);
}
public void Error(string message, [CallerFilePath]string sourceFilePath = null)
{
GetInnerLogger(sourceFilePath).Error(message);
}
public void Error(string message, Exception exc, [CallerFilePath]string sourceFilePath = null)
{
GetInnerLogger(sourceFilePath).ErrorException(message, exc);
}
public void Fatal(string message, [CallerFilePath]string sourceFilePath = null)
{
GetInnerLogger(sourceFilePath).Fatal(message);
}
public void Fatal(string message, Exception exc, [CallerFilePath]string sourceFilePath = null)
{
GetInnerLogger(sourceFilePath).FatalException(message, exc);
}
public void Trace(string message, [CallerFilePath]string sourceFilePath = null)
{
GetInnerLogger(sourceFilePath).Trace(message);
}
public void Trace(string message, Exception exc, [CallerFilePath]string sourceFilePath = null)
{
GetInnerLogger(sourceFilePath).TraceException(message, exc);
}
}
view raw Logger.cs hosted with ❤ by GitHub
When you create log message by calling for example Info() method you don't have to set value to sourceFilePath parameter - a compiler will put current path of the source file in the time of the compilation. All you need is to define your strategy how to match the path to your source file and logger's name- in my example I simply use short file name as a name of the logger.

Tuesday, October 9, 2012

Strongly-typed id for MongoDb C# driver

MongoDb C# driver already has a lot of features however it still can be improved.
One particular idea is to use strongly typed id field instead of generic ObjectId type which will lead to checking correct id type at a compile-time and not in a run-time. For example, using id of this type you can ensure that you pass id of a product to a GetProductById method. Here is how it can be implemented (requires MongoDb c# driver) :
public class Id<T> : IEquatable<IId<T>>
{
public ObjectId Value { get;private set; }
public override string ToString()
{
return Value.ToString();
}
public Id(ObjectId value)
{
Value = value;
}
public static Id<T> GetNew()
{
return new Id<T>(ObjectId.GenerateNewId());
}
#region generated
public bool Equals(Id<T> other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return other.Value.Equals(Value);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != typeof(Id<T>)) return false;
return Equals((Id<T>)obj);
}
public override int GetHashCode()
{
return Value.GetHashCode();
}
public static bool operator ==(Id<T> left, Id<T> right)
{
return Equals(left, right);
}
public static bool operator !=(Id<T> left, Id<T> right)
{
return !Equals(left, right);
}
public bool Equals(IId<T> other)
{
return Equals((object)other);
}
#endregion
}
view raw id.cs hosted with ❤ by GitHub
MongoDb requires id fields to be initialized when document is saved to database and in order to make developer's life easier and the code cleaner they provide mechanism for generation id when it does not exist. Here is its implementation for id:
public class IdGenerator<T> : IIdGenerator
{
public object GenerateId(object container, object document)
{
return Id<T>.GetNew();
}
public bool IsEmpty(object id)
{
var casted = id as Id<T>;
return casted == null || casted.Value == ObjectId.Empty;
}
}
view raw generator.cs hosted with ❤ by GitHub
After that id field can be declared in a following way. Note that after generator attribute has been set there is no need to set product id in a product constructor - it will be created when entity is saved to the database.
public class Product
{
[BsonId(IdGenerator = typeof(IdGenerator<Product>))]
public Id<Product> Id { get; private set; }
public string Name { get; set; }
public Decimal Price { get; set; }
}
view raw product.cs hosted with ❤ by GitHub

Tuesday, October 2, 2012

Writing multiline log messages into single line in NLog

I prefer to avoid writing single log message into several lines. Doing that allows me to analyze my logs using console tools like grep more efficiently.
While it is easy to meet this convention in custom log messages, user has to solve this problem for text which is put by several Nlog layout renderers, for example exception stacktrace renderer. To solve this problem I use replace layout wrapper. Take a look to a following nlog config:
<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<variable name="brief" value="${longdate} | ${level} | ${logger} | ${message} | ${exception:format=ToString}"/>
<variable name="verbose" value="${longdate} | ${processid} | ${processname} | ${threadid} | ${level} | ${logger} | ${message} | ${exception:format=ToString}"/>
<variable name="verbose_inline" value="${replace:inner=${verbose}:searchFor=\\r\\n|\\n:replaceWith=->:regex=true}"/>
<targets>
<target name="file" xsi:type="File" layout="${verbose_inline}" fileName="${basedir}/logs/mcserver_${shortdate}.log" />
<target name="console" xsi:type="ColoredConsole" layout="${brief}" />
</targets>
<rules>
<logger name="*" minlevel="Trace" writeTo="file" />
<logger name="*" minlevel="Trace" writeTo="console" />
</rules>
</nlog>
view raw gistfile1.xml hosted with ❤ by GitHub
In this example I use my custom verbose_inline layout that replaces all newlines to "->" string. After that I can easily sort out statistics for e.g. specific exception using grep command like:
grep -i 'System.NullReferenceException' *.log >nullref.txt
view raw gistfile1.txt hosted with ❤ by GitHub