Showing posts with label NLP. Show all posts
Showing posts with label NLP. Show all posts

Monday, January 12, 2015

The WordNet Language List to rule them all...

A Complete Multilingual WordNet List by Language

What is WordNet?

WordNet is a lexical database that groups words into sets of synonyms called synsets, providing short definitions and usage examples, and records a number of relations among these synonym sets or their members. WordNet can thus be seen as a combination of dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. Both the lexicographic data (lexicographer files) and the compiler (called grind) for producing the distributed database are available

Multilingual WordNet by Language and Their Licenses

Below is a table of multilingual WordNet by language and their licenses, as well as other pertinent information.


It's been a bit since I've blogged about WordNet, but still Samuel hunted me down and sent me an email about his project, compiling the uber WordNet Language list. And since he's from a SoCal College, (and it has been a while since I've blogged about WordNet... oh wait, I already said that... ;) here you go!


Related Past Post XRef:
Mix OpenNLP, IKVM.Net and C# and you get some noun phrase and contextual relevance goodness
SharpEntropy - Maximum Entropy Modeling
"Statistical parsing of English sentences"

Java for .Net? Yep, the IKVM.NET way...
Java for .Net? Ja!
Java Implementation for Mono/.Net (IVKM.Net)

NLP is Hard... But with AboditNLP it's not as...

Thursday, October 02, 2014

Getting started with the free (for 1000 calls) Text Analysis API from AYLIEN

Text Analysis blog | Aylien - How to Get Started with AYLIEN Text Analysis API


Getting up and running with AYLIEN’s Text Analysis APIs couldn’t be easier. It’s a simple 3 part process from signing up to calling the API. This blog will take you through the complete process of creating an account, retrieving your API Key and Application ID, and making your first call to the API.

Part 1: Signing up for a free account

Navigate to and click on the “Subscribe For free” button. This will bring you to a sign up form which will ask for your details in order to setup your account and generate your credentials.

By signing up, you will get access to our basic plan which will allow you to make 1,000 API calls per day for free. Note: There is no credit card needed to get access to our basic plan. ;)


Part 3: Creating your first application
Our getting started guide is designed to get you up and running with the API and making calls as quickly and as easily as possible. Here you will find information on the API Documentation, Features, Links to a demo and some code snippets.

We have included sample code snippets for you to use in the following languages.

  • Java
  • Node.js
  • Python
  • Go
  • PHP
  • C#
  • Ruby

To start making calls, while you’re on the getting started page, scroll down to the “Calling the API” section. Choose which language you wish to use and take a copy of the code snippet. In this example, we are going to use Node.js.



Okay, 1,000 calls is not enough to build a biz on (not that you would) but it is more than enough to play with and still do some cool things. Imagine using this in your blogging, where you gather some cool text analysis info automagically from your post. Or spread out over time, analysis of all your posts. Or maybe a means to help you filter down your news stream. Or... or... or... There's a ton of stuff you can do with an API like this and being free'ish, you can play for, well, free.

Thursday, September 25, 2014

Understanding a Sentiment Analysis Engine

Microsoft Lystavlen - the Online display board - Understanding the Sentiment Engine in Microsoft Social Listening

Sentiment Analysis

If you want to see how the public perceives your company or product, you can use sentiment analysis, which determines people’s attitudes toward a topic. Sentiment analysis reflects the public perception of a post’s content in relation to the keywords that were used to find the post (a post is a eg a Twitter post or a Facebook comment)

Each post that results from your defined search queries is processed by the sentiment engine in the original language and annotated with a calculated sentiment value. Sentiment values are provided for the following languages:

  1. English
  2. German
  3. French
  4. Spanish
  5. Portuguese
  6. Italian

The sentiment value results in a positive, negative, or neutral sentiment for a post. Occasionally, the algorithm identifies positive and negative parts of a sentence and still rates the post as neutral. This happens because the amount of a post’s text identified as positive or negative cancel each other out. A post is also classified as neutral if there are no positive or negative statements detected in it.

Note that the sentiment algorithm is not a self-learning system, even if you can edit any post’s sentiment value in the post list.

Understanding the Sentiment Engine

Lets look closer at the sentiment engine using the example post below, in the context of the search topic "Windows Phone"



I thought this post a great explanation of Sentiment Analysis, which comes up in my day job now and then. While we don't use the Social Listen or related product, we do use a library that lets us integrate like functionality into our LOB, but e apps, explaining how it works quickly has always been "fun". This post and explanation will come in handy next time...


Related Past Post XRef:
Comparing Sentiment Analysis REST API's
10 Professionals, 10 views on the coming trends in text analytics

Tuesday, May 20, 2014

Prefect for your next marketing-ware page - The New Age BS Generator

New Age Bullshit Generator

Namaste. Do you want to sell a New Age product and/or service? Tired of coming up with meaningless copy for your starry-eyed customers? Want to join the ranks of bestselling self-help authors? We can help.

Just click and the truth will manifest

Click the Reionize electrons button at the top of the page to generate a full page of New Age poppycock.

The inspiration for this idea came from watching philosophy debates involving Deepak Chopra. I wrote a blog post about it if you're interested.

After sitting through hours of New Age rhetoric, I decided to have a crack at writing code to generate it automatically and speed things up a bit. I cobbled together a list of New Age buzzwords and cliché sentence patterns and this is the result.

...You’ll get some profound-sounding nonsense here, too.

So, what is this for? Put it on your website as placeholder text. Print it out as a speech for your yoga class and see if anyone can guess a computer wrote it. Use it to write the hottest new bestseller in the self-help section, or give false hope to depressed friends and family members.


Seb Pearce - On the New Age Bullshit Generator and parodying woo

Link to the Generator:
(Yes, it’s open source: GitHub link)

A while back, I was on a philosophy debate binge. Watching Sam Harris and the late Christopher Hitchens annihilate their opponents with precision and wit is my idea of a good night in, and YouTube’s “related videos” are a deep, dark rabbithole.



Finally an awesome example of NLP. Some of these items are pretty cool sounding too. Best of all it's open source... :)

(via Beyond Search - Natural Language Processing Used to Serve Up Cynicism)

Tuesday, April 29, 2014

Nine to Mine - Nine free Data Mining/Analysis eBooks

CodeCondo - 9 Free Books for Learning Data Mining & Data Analysis

Data mining, data analysis, these are the two terms that very often make the impressions of being very hard to understand – complex – and that you’re required to have the highest grade education in order to understand them.


By learning from these books, you will quickly uncover the ‘secrets’ of data mining and data analysis, and hopefully be able to make better judgement of what they do, and how they can help you in your working projects, both now and in the future.

I just want to say that, in order to learn these complex subjects, you need to have a completely open mind, be open to every possibility, because that is usually where all the learning happens, and no doubt your brain is going to set itself on fire; multiple times.


image image image image image image imageimageimage

Learn Data Science from Free Books

There is no better way to learn than from books, and then going out in the world and putting that newly found knowledge to the test, or otherwise we’re bound to forget what we actually had learned. This is a beautiful list of books that every aspiring data scientist should take note of, and add to his list of learning materials.

What books have you read in order to help you begin your own journey in data mining and analysis? I’m sure that the community would love to hear more, and I’m eager to see what I potentially let slip through my fingers myself.

Some light reading for the week...

(via KDNuggets - 9 Free Books for Learning Data Mining and Data Analysis)


Related Past Post XRef:
"Theory and Applications for Advanced Text Mining" Open eBook...
Free Big Data eBook of the Day, "Mining of Massive Datasets"

Tuesday, November 19, 2013

A word or two or 10 about Word Clouds

Beyond Search - Easily Generate Your Own Word Clouds

Word clouds have become inescapable, and it is easy to see why– many people find such a blending of text and visual information easy to understand. But how, exactly, can you generate one of these content confections? Smashing Apps shares its collection of “10 Amazing Word Cloud Generators.”


VocabGrabber is different. It doesn’t even make a particularly pretty picture. As the name implies, VocabGrabber uses your text to build a list of vocabulary words, complete with examples of usage pulled from directly from the content. This could be a useful tool for students, or anyone learning something new that comes with specialized terminology. If your learning materials are digital, a simple cut-and-paste can generate a handy list of terms and in-context examples. A valuable find in a list full of fun and useful tools.

Smashing Apps - 10 Amazing Word Cloud Generators

Smashing Apps has been featured at Wordpress Showcase. If you like Smashing Apps and would like to share your love with us so you can click here to rate us.

In this session, we are presenting 10 amazing word cloud generators for you. Word cloud can be defined as a graphical representation of word frequency, whereas word cloud generators simply are the tools to map data, such as words and tags in a visual and engaging way. These generators come with different features that include different fonts, shapes, layouts and editing capabilities.

Without any further ado, here we are presenting a fine collection of 10 amazing and useful word cloud generators for you. Leave us a comment and let us know what you think of the proliferation of design inspiration in general on the web. Your comments are always more than welcome. Let us have a look. Enjoy!



Make sure you click through as SmashingApps has done a great job with blurbs and snap for each one.


Related Past Post XRef:
Wordle’ing Terms of Service Agreements – How a ToS would look as a word/tag cloud
Bipin shows us that creating a tag cloud doesn't have to be hard to do (in ASP.Net)
Interactive WinForm Tag Cloud Control (Think “Cool, I can add a Word/Tag Cloud thing to my WinForm app!”)
"WordCloud - A Squarified Treemap of Word Frequency" - Something like this would be cool in a Feed Reader...
Feed Stream Analysis - Web Feed/Post Analysis to Group Like/Related Posts
"Statistical parsing of English sentences"
"A Model for Weblog Research"

Monday, November 18, 2013

10 Professionals, 10 views on the coming trends in text analytics

KDNuggets - Top 10 trends in text analytics

Data Driven Business recently interviewed forward thinking text analytics professionals from leading companies like Bank of America, Home Depot and PayPal, on challenges they are face, overcoming them, and the industry as a whole.
Alesia Siuchykava, Data Driven Business/Text Analytics News, Nov 13, 2013.


Data Driven Business recently conducted interviews with text analytics professionals from a number of leading companies and identified 10 trends in text analytics that can be observed over the next 6-12 months.

1. Fusion of text (unstructured) data with structured data ...

2. Increase in interest in multilingual text analytics. ...

3. Algorithmic understanding of social media comments. ...

4. Commercialization of sentiment detection. ...

5. Finding trends and trending events in news streams. ...

6. More built-in visualization capabilities. ...

7. Streaming real-time text analytics. ...

8. Use of text analytics for getting insights from unstructured Big Data. ...

9. Advances in machine learning. ...

10. Integration of different capabilities. ...

Some interesting thoughts in, and on, these trends. Note the common themes? Big Data, real-time, social...

Monday, October 28, 2013

"Theory and Applications for Advanced Text Mining" Open eBook...

Intech - Computer and Information Science - Information and Knowledge Engineering - Theory and Applications for Advanced Text Mining

Edited by Shigeaki Sakurai, ISBN 978-953-51-0852-8, 218 pages, Publisher: InTech, Chapters published November 21, 2012 under CC BY 3.0 license DOI: 10.5772/3115

Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields.


Just published last year, this free eBook looks interesting (well to me anyway). Most of it is way over my head, but there's enough here that it looks like a good set of reads... Also the WordNet chapter, Ontology Learning Using Word Net Lexical Expansion and Text Mining, caught my eye.

(via KDNuggets - Free Book: Theory and Applications for Advanced Text Mining)


Related Past Post XRef:
"Statistical parsing of English sentences"
Feed Stream Analysis - Web Feed/Post Analysis to Group Like/Related Posts
SharpEntropy - Maximum Entropy Modeling

Mix OpenNLP, IKVM.Net and C# and you get some noun phrase and contextual relevance goodness

Friday, October 04, 2013

Comparing Sentiment Analysis REST API's

Skyttle Blog - A tool for evaluating Sentiment Analysis REST APIs

There is a growing number of Sentiment Analysis REST APIs out there, and the potential user is faced with a lot of choice. Accuracy of analysis is the most important factor, and the best way to see if an analyzer will perform well in the intended task, is to run different analyzers on a sample of your data, and compare their output with manually assigned sentiment labels.

To make it easier for potential users to run such experiments, we’ve released a small open-source project. The project implements clients to several Sentiment Analyzers: Alchemy, Bitext, Chatterbox, Datumbox, Repustate, Semantria, Skyttle, and Viralheat. As input, it takes a text file with short texts, each annotated as positive, negative or neutral, and outputs a spreadsheet where responses of each API are recorded, as well as an accuracy rate and an error rate calculated against the manual labels.

The project is available on github: Once you clone/unpack it, you will need to install requirements:


SemantAPI - Semantapi.Robot

SemantAPI is a free, open source toolkit intended for a quick and easy comparison of the most popular NLP and sentiment analysis solutions on the market. The toolkit offers 2 independent analysis applications: SemantAPI.Robot and SemantAPI.Human. Both applications are written in C# and based on Microsoft’s .Net framework 3.5 platform.

Redistributable package of SemantAPI toolkit can be downloaded here.
The source code is available on GitHub here.

SemantAPI.Robot is an application that takes the specified source file and runs an analysis of every line therein, using the selected services.

The results are generated in a regular CSV file, with two columns per selected service:

  • The “sentiment score” column contains float sentiment values provided by the target service, which can be used for precise sentiment analysis.
  • The “sentiment polarity” value contains a verbal representation of the sentiment score, making it easy to read and understand at a glance.

The current version of the SemantAPI.Robot application supports the following NLP solutions:

  • Semantria. Modern, fast-growing NLP solution based on Lexalytics’ Salience engine.
  • AlchemyAPI. One of the world’s most popular NLP solutions.
  • Chatterbox. Social technology engine that uses machine learning for sentiment analysis.
  • Viralheat. Social media monitoring solution that offers a sentiment analysis API for 3rd-party integrators.
  • Bitext. Semantic technologies solution with a sentiment analysis API that claims to have the highest accuracy on the market.

This is a day job kind of thing, one that I'm seeing more chatter and discussion about. In house we've licensed one library, and I've built a couple Proof of Concept app's with it. But I was doing so, kind of in a vacuum, not being able to compare the results against another platform. We, with this, I now can!

That and I just like the idea of these service and having the C# to access them all... :)

Wednesday, August 29, 2012

Mix OpenNLP, IKVM.Net and C# and you get some noun phrase and contextual relevance goodness

randonom - Extracting noun phrases with contextual relevance in .NET using OpenNLP

A few months ago I was working on a project that had a word cloud-like feature. A word cloud is an interesting way to visually represent a popular theme or topic. I had a dataset of user reviews from another project that we wanted to parse and use. This began my first exposure to Natural Language Processing (NLP) and other advanced text analytics tools.


A viable .NET implementation

Eventually I came across a wiki article entitled “A quick guide to using OpenNLP from .NET” that introduced me to a remarkable project called IKVM.NET. After generating a shiney new .NET OpenNLP assembly with the steps provided I was able to use the OpenNLP namespaces with ease in my project.

The first step in using the parsers in OpenNLP was to instantiate a model using Java streams. I created a base class for my NounPhraseParser with a utility method to help load these models.



I think this project worked out remarkably well. I don’t know if I’ll attempt to use something like this in a production environment, but if nothing else it was a very enlightening foray into the interesting world of Natural Language Processing. There are many other subjects in this area that I would like to explore, such as Sentiment Analysis and ways to identify subjects of significance in large bodies of text. As the IBM Watson project demonstrated to us not too long ago, this is a young field with staggering potential. The current trajectory of research along with significant advances in computation capability suggest it won’t be long before we can communicate with computers/information systems as easily as if you were talking to your best friend.



I can't believe it's been 6 years since I've blogged about OpenNLP (sigh, and I've still not worked on the project I had meant to when watching for it then... It's on the list still... but...). Anyway... If you've wanted to do natural language processing (NLP) and are looking for options, then check out Sean's post...

(via DotnetKicks - Extracting noun phrases with context in .NET using OpenNLP)


Related Past Post XRef:
SharpEntropy - Maximum Entropy Modeling
"Statistical parsing of English sentences"

Java for .Net? Yep, the IKVM.NET way...
Java for .Net? Ja!
Java Implementation for Mono/.Net (IVKM.Net)

Thursday, February 09, 2012

NLP is Hard... But with AboditNLP it's not as...

Elegant Code - NuGet Project Uncovered: AboditNLP

"If you are coming to this series of posts for the first time you might check out my introductory post for a little context.

AboditNLP is a Natural Language Processor library. This kind of stuff in interesting, but not something I have chosen to spend my time on.

It has a demo which gives you sample things to type to the library. ..."


Natural Language Conversations

A conversational natural language engine allows humans and computers to converse in a natural way using typed messages. These messages may be exchanged using SMS, XMPP chat, email, or web-based chat.

Conversational interfaces are an improvement over form-filling interfaces for many applications. One of the best known examples of this is the input box on Google calendar where you can type something like '10:20 meeting at Larry's' and it will create a meeting starting at 10:20 and will set the location to 'Larry's'. This particulare example is a hybrid interface where a conversational element has been integrated into a traditional form-based interface. A pure conversation interface is often called a 'chatbot'.

Conversational interfaces are particularly well suited to mobile applications where the small screen, tiny keyboard and lack of a mouse makes traditional form-filling a tedious and error prone experience.

Consider for example the simple request what orders did we receive last month on a friday after 4pm. What would the dialog look like to specify a query like that? What if, instead, your users could simply enter what they are looking for? That's what a conversational natural language engine can do for your business.

NLP is hard

General purpose NLP is a really hard problem but for a specific application domain (like CRM integration, product support, home automation, ...) it's possible to define a sufficiently large recognition base that you can provide a good experience to your users. This library is focused on providing you the tools you need to create such domain specific chatbots or to add natural language capable input boxes to your traditional forms-based applications. ...


Software and Licensing

This Natural Language engine will soon be available for download and integration in your .NET projects. For personal, non-commercial projects there is no charge. For use in commercial applications and for any consulting requirements please ..."

AboditNLP - Natural Language Interface to Home Automation

"Rather than hunting through a multi-layer web-page-by-web-page interface all aspects of a home can be controlled and/or queried directly using single line commands issued from a smartphone or computer using SMS or a chat client like Google Talk.

Examples of things you can do when you connect a home automation system to this natural language engine can be seen in this actual dialog with my own home automation system:-


Interesting... I can see the applications for this already. Image speech to text and then mix this in? hum... (of course that could lead to some pretty funny results


Related Past Post XRef:
Feed Stream Analysis - Web Feed/Post Analysis to Group Like/Related Posts
SharpEntropy - Maximum Entropy Modeling
"WordCloud - A Squarified Treemap of Word Frequency" - Something like this would be cool in a Feed Reader...
"Statistical parsing of English sentences"
"A Model for Weblog Research" - MS Research TreeMap.Net