Tuesday, July 12, 2011

Interactive WinForm Tag Cloud Control (Think “Cool, I can add a Word/Tag Cloud thing to my WinForm app!”)

CodePlex - Word Cloud (Tag Cloud) Generator Control for .NET Windows.Forms in c#

“Generate word cloud form some input text. A word cloud is a set of randomly arranged set of words used in your text. The size and the color of each word expresses it's usage frequency. Rarely used words are small and pale. The control is clickable and allows to identify a word under mouse.

SNAGHTML710fde9d

Background

This control is inspired by the free web based word cloud generator called Wordle.

In fact the control is a screw-out product of my project at http://sourcecodecloud.codeplex.com .

I really loved visualizations produced by Wordle, but my goal was to write a non web based local solution to process large amount of sensible data. There where number of components I found on the web, but most of them had either very pure performance when processing text and the visualization or layout was not that I expected.

…”

CodePlex - Source Code Word Cloud Generator

“Brief Description

Generate word cloud form your code to see what your code is about and what it does. A word cloud is a set of randomly arranged keywords, variable and class names etc. used in your code. The size and the color of each word expresses it's usage frequency. Rarely used words are small and pale. It might give you a hint about how good or bad your code base is and how to improve it.

Motivation

The idea behind this project is quite simple and comes form Phillip Calçado's blog post Tag Clouds: See How Noisy Your Code Is http://fragmental.tw/2009/04/29/tag-clouds-see-how-noisy-your-code-is.

A tag cloud (word cloud) is a visual representation for text data. Words are usually placed on some rectangular area and the importance of each tag is shown with font size and/or color. This format is useful for quickly perceiving the most prominent terms in analyzed text. Wordle http://www.wordle.net/ is one of the free tools to build such clouds. You can paste any text or a website URL and in a few seconds you get an idea what the website or text is about. And what is your code about?

One more very interesting article about that can be found under http://programmer.97things.oreilly.com/wiki/index.php/Code_in_the_Language_of_the_Domain in the book 97 Things Every Programmer Should Know.

So if you take your code remove comments, literals, block some very common words (like company name) and generate a word cloud of it, you will get an interesting picture to discuss with your colleagues in a coffee corner.

  • If words "if", "then", "else", "switch", "case" are first what you see - your code is sprinkled with conditionals!
  • Is "string" in your words top 10 ? - Congratulations if you write text processing software, otherwise in might be a bad smell.
  • Are you writing API or a library so you should see word "public" in front rows. If you are not working on a library or API, the word public might be a signal to think on better protection.
  • Do you see your classes at first glance or are they far away in background? Behind "int", "byte", "array" etc.? Is your code in your domain language? By the way very interesting article http://programmer.97things.oreilly.com/wiki/index.php/Code_in_the_Language_of_the_Domain on this topic from the book 97 Things Every Programmer Should Know.

We have tried this with multiple projects code bases. Every time except interesting new facts we learned about our own code, it was pretty much fun comparing different code clouds with each other. But do not forget, it is not a replacement for static code analysis and even not a code metric calculator. "Like most visualisation tools it is not a scientific proof of any kind but it gives you a hint about how good or bad your code base is." ( Phillip Calçado)

…”

I can see where something like this could be very useful for data analysis and exploration… hum…

No comments: