Text contains a wealth of information, but it mostly comes in unstructured form.
In this age of
unprecedented data growth, more and more companies are turning to
Natural Language Processing to make sense of their
textual data or to interact with their clients in new and exciting ways.
NLP helps them determine the topics that are covered in a text, the people and places
that are mentioned, the sentiment that is expressed, etc.
NLP Town combines a decade of research and development in Natural Language
Processing with over ten years of experience in software engineering.
Named Entity Recognition and the Road to Deep Learning
Perplexed by Game of Thrones. A Song of N-Grams and Language Models
Anything2Vec, or How Word2Vec Conquered NLP
Understanding Deep Learning Models in NLP
DIY methods for sentiment analysis
Off-the-shelf methods for sentiment analysis
Text Classification Made Simple
NLP in the Cloud: Measuring the Quality of NLP APIs
NLP People: the 2016 NLP job market analysis
Thesauri are the foundation for many Information Retrieval systems. Search engines, for example, extend the queries of their users by adding synonyms and related keywords. In my PhD dissertation, I explored novel ways of automatically discovering such words with similar meanings, even across dialects and different languages. The output of these methods can harness Information Retrieval systems against language variation in their text collections.
If you have a spam filter, you’re already using text categorization every day. More advanced applications are able to recognize topics such as politics, football or music with very high precision. For example, if you’re a media company, you can use these methods to alert users about new content on topics they’re really interested in, or you can customize the content of your website and search results to your customers’ personal profiles.
Thanks to the popularity of blogs, social media, and review sites, more and more people are making themselves heard. Your users and customers write reviews of your products on their blogs, or share their opinion about your company on Twitter. Sentiment Analysis identifies the feelings behind these messages: it classifies them as positive or negative, or detects emotions such as anger and joy. It allows you to know your customers better and to respond to their needs more successfully.
These days users expect more from a search engine than a system that simply returns a list of documents with the words they typed in. Through a careful application of Natural Language Processing, modern search engines aim to truly understand the needs of their users. They scan their text collection for synonyms, learn the topics of a text, or identify entities such as people and places and discover the relations between them. In these ways, they extract the information present in their documents, and present it to their users in novel ways.
There is more to Machine Translation than Google Translate. I have extensive experience training and optimizing state-of-the-art MT systems such as Moses and Phrasal, which allow companies to build customized translation solutions. By training these systems on their own texts or those of their customers, companies can achieve far superior performance and bring down translation effort and cost.
Information Extraction is the task of identifying useful information in unstructured documents. In the past I have developed methods for finding definitions of terms in legislation, for identifying entities (people, organizations, locations) and the relations between them, and for transforming job ads and CVs to structured tables whose values can be matched. In this way, Natural Language Processing helps turn large collections of unstructured text into actionable insights.