Skip to main content

Natural Language Processing

What is natural language processing and what are its applications?

Informatics Science AI

Natural language processing (NLP) brings together two disciplines as apparently distant as linguistics and artificial intelligence. Today, this field of computer science, which consists of transforming natural language into a formal language — such as programming — that computers can process, is constantly evolving and its applications are growing.

NLP allows a machine to process natural language and generate answers automatically.
NLP allows a machine to process natural language and generate answers automatically.

If you have ever asked Alexa or Siri for the time, you will have realised that you do not always have to ask the question in the same way. You can ask "what time is it?" or "can you tell me the time?" and in both cases receive an appropriate response. The same is true of Google's automatic translator, which detects the nuances between different words depending on the context. These examples, and many more, have something called natural language processing (NLP) behind them.

What is natural language processing (NLP)?

According to IBM's definition, natural language processing (NLP) refers to the branch of computer science — and more specifically, the branch of artificial intelligence — concerned with giving computers the ability to understand text and spoken words in much the same way human beings can. This technology has now become highly advanced thanks to the application of technologies like machine learning (automatic learning), big data, the internet of things and neuronal networks.

Some of the most important applications focus on (business intelligence), which automatically analyses customer reactions through their comments on the internet or the questions they ask to get information. Then there are the chatbots, another application which, although there is much room for improvement, streamline interaction with customers through chats or telephone answering services by offering quick, automatic answers using natural language processing.

Natural language processing has its roots in the 1950s, when Alan Turing published a paper (Computing Machines and Intelligence) in which he proposed what is now known as the Turing Test. The test examined the ability of a machine to exhibit intelligent behaviour similar to that of a human being. Since then, the evolution of the algorithms associated with this technology has enabled the current progress to be made.

The evolution of natural language processing and its algorithms

  • 1949
  • 1950
  • 1954
  • 1956
  • 1960s
  • 1980s
  • 1990s
  • 2000s
  • 2010s
  • 2020s
Icon

IBM sponsors the Index Thomisticus, a compilation of the works of St. Thomas Aquinas created by the Italian Jesuit Roberto Busa (inventor of computer linguistics).

Icon

Alan Turing publishes the article Computational machines and intelligence, where he proposes the Turing Test to determine whether a machine can think or not.

Icon

The Georgetown-IBM experiment achieves the automatic translation of more than sixty sentences from Russian into English, giving a boost to computational linguistics.

Icon

John McCarthy, Marvin Minsky and Claude Shannon coin the term "artificial intelligence" at the Dartmouth Conference.

Icon

Pattern recognition and "nearest neighbour" algorithms are introduced.

Icon

Machine learning algorithms are introduced and natural language generation takes off.

Icon

Advanced speech recognition and topic modelling technologies are introduced.

Icon

More advanced statistical and topic models, such as LDA, are introduced. The term "deep learning" also emerged.

Icon

Translation with neural machines, i.e. without human intervention, is implemented and conversational artificial intelligence takes a leap forward.

Icon

More and more business sectors will apply this technology and, together with machine vision, it will enable the new challenges of Industry 4.0 to be met.

Source: Deloitte.

 SEE INFOGRAPHIC: The evolution of natural language processing and its algorithms [PDF] External link, opens in new window.

How does natural language processing work

The first models of natural language analysis were symbolic and were based on manually encoding the rules of the language. This made it possible to distinguish, for example, the tenses and conjugations of verbs and to extract the meaning of the root. The 1980s and 1990s saw the statistical revolution. Instead of writing sets of rules (and exceptions) NLP systems began to use statistical inference algorithms to analyse other texts and make comparisons in search of patterns.

The advantage of statistical models is that they are more reliable in understanding new words or in detecting errors, such as misspelled or accidentally omitted words. Most current systems use a combination of symbolic and statistical models. In particular, natural language processing systems perform several types of analyses:

  • Morphological: focuses on distinguishing the different types of words (verbs, nouns, prepositions, etc.) and their variations (gender, number, tense, etc.).
  • Syntactical: separates sentences from each other and analyses their constituent parts (subject, verb, predicate) in order to extract their meaning.
  • Semantic: analyses the meaning, not only of individual words, but also of the sentences of which they are part and of the discourse as a whole.
  • Pragmatic: is responsible for extracting the intention of the text depending on its context and makes it possible to differentiate factors such as irony, ambiguity or mood.

Applications of natural language processing (examples)

Your word processor's spellchecker or your phone's autocorrect use natural language processing techniques, but the applications go much further:

Virtual assistants and intelligent chatbots

Virtual assistants, such as Siri, Alexa and Google Assistant, use natural language processing to process the questions and commands users use and provide accurate and consistent responses. They are increasingly used on business websites to guide the user.

Document classification

The task of classifying large numbers of documents according to subject matter or style can be streamlined with NLP systems.

Sentiment and opinion analysis

Comments on social networks about products and services are extremely important to companies and NLP systems can extract relevant information from them.

Text comparison

NLP systems make it possible to find patterns in texts and detect matches between them, which facilitates plagiarism detection and quality control.

Document anonymisation

Through NLP systems, documents can be processed to identify and remove mentions of personal data, thus ensuring the privacy of individuals and institutions.

Machine Translation

Instant machine translation applications use natural language processing techniques to deliver accurate, semantically and grammatically correct foreign language texts.

Content recommendation

Content platforms use language preference analysis to suggest books, movies or songs. These applications analyse users' preferences to provide them with relevant content.

Natural language processing tools

Numerous companies offer software tools to apply natural language processing techniques. To develop them, they use standard programming languages, especially Python — the most widely applied for this purpose — :

  • Natural Language Toolkit (NLTK): This Python library has a modular structure that facilitates NLP functions such as tagging and sorting, among others.
  • MonkeyLearn: is a NLP platform that provides models for text or sentiment analysis, topic classification or keyword extraction tasks.
  • IBM Watson: is a set of AI services stored in IBM's cloud that offers NLP systems, enabling the identification and extraction of categories, sentiments, entities, etc.
  • Google Cloud Natural Language: this natural language API provides several models for sentiment analysis, content classification and entity extraction, among others.
  • Amazon Comprehend: is a NLP service integrated into the Amazon Web Services infrastructure for sentiment analysis, topic modelling, or recognition of entities, among others.
  • SpaCy: is an open code library for NLP with Python. It is one of the most recent and is very useable having been designed to analyse large volumes of data.
  • GenSim: is a specialised Python library focused on modelling subjects, recognising similarities between texts and navigating between different documents.