AI-Natural Language Processing: Difference between revisions

From MediaWiki
Jump to navigationJump to search
No edit summary
 
Line 1: Line 1:
= Unit 8: Natural Language Processing =
= Natural Language Processing =


== Contents ==
== Contents ==

Latest revision as of 13:07, 4 January 2025

Natural Language Processing

Contents

What is natural language processing?

Despite the level of complexity, there exist common patterns that can be exploited by computers to automatically perform human-like activities related to verbal communication. This is the goal of natural language processing (NLP), a discipline that combines linguistics and computing science to emulate our capacity to manage the language.

NLP Tasks

From a linguistic approach, the NLP tasks can be divided into the following categories:

Syntax

NLP tasks related to sentence structures include:

  • **Part-of-Speech (POS) Tagging:** Automatically identifying the syntactical category (POS) of each word in a sentence. Example:
 * *"Alice is a student of physics"* → [("Alice", NNP), ("is", VBZ), ("a", DT), ("student", NN), ("of", IN), ("physics", NNS)].
  • **Parsing:** Determining all the syntactical relations of words in a sentence. Parse trees help represent these relations (refer to Figure 1).

Semantics

Semantics deals with the meaning of words, sentences, and texts. Common tasks include:

  • **Optical Character Recognition (OCR):** Recognizing hand-written or printed words.
  • **Natural Language Understanding (NLU):** Transforming sentences into semantic data structures.
  • **Sentiment Analysis:** Classifying emotional feelings (e.g., positive, negative, neutral).
  • **Machine Translation:** Automatically translating text between languages.
  • **Topic Classification:** Detecting topics or subjects within texts.

Speech

NLP tasks related to voice include:

  • **Speech Recognition:** Understanding human speech.
  • **Speech Synthesis:** Converting text into speech.

Discourse and Dialogue

NLP tasks that address conversational and narrative interactions, such as:

  • **Automatic Summarization:** Extracting key ideas from text.
  • **Dialogue Act Classification:** Capturing the intention of utterances (e.g., questioning, greeting).

Factors for Success

The success of NLP applications is due to:

  • Increased computing power (e.g., parallel CPUs and GPUs).
  • Advancements in machine learning methods (e.g., deep learning).
  • Availability of linguistic datasets (corpora).
  • Insights from linguistic theories (e.g., Noam Chomsky's language rules).

An example of a dialogue system

A dialogue system (DS) is an NLP-based application capable of holding conversations with humans using speech. Dialogue systems often rely on modular architectures. Key components include:

  • **Automatic Speech Recognition (ASR):** Recognizes words from audio.
  • **Sentiment Analyzer (SA):** Classifies sentiment in speech.
  • **Natural Language Understanding (NLU):** Transforms words into semantic logical forms.
  • **Natural Language Generation (NLG):** Generates appropriate responses.
  • **Text-to-Speech (TTS):** Converts text responses to audio.

There are two working modes: 1. **Long loop:** User → ATT → ASR → EV → DAT → SA → EM → NLU → DM → ASM → NLG → TTS → ECA. 2. **Short loop:** User → ATT → IM → DM → ASM → ECA.

Introduction to machine translation

Machine translation (MT) focuses on transforming text between languages. Key points:

  • **Statistical MT:** Uses aligned parallel corpora for translations.
  • **BLEU (Bilingual Evaluation Understudy):** A scoring system to evaluate translation quality. Scores range from 0 to 1, with higher scores indicating closer matches to reference translations.

Summary

Natural language processing (NLP) is a branch of AI that enables computers to understand and process human language. Applications include:

  • Language translation.
  • Text classification.
  • Sentiment analysis.

Real-world examples:

  • Chatbots for customer service.
  • Translation apps.
  • Social media analytics.

Natural Language Processing (NLP): Key Points and Summary

Introduction

NLP bridges unstructured and structured data, enabling computers to process human language effectively.

Use Cases

Applications include:

  • Machine translation.
  • Virtual assistants.
  • Sentiment analysis.
  • Spam detection.

Tools and Techniques

  • **Tokenization:** Breaking text into tokens.
  • **Stemming and Lemmatization:** Reducing words to root forms.
  • **POS Tagging:** Identifying grammatical roles.
  • **Named Entity Recognition (NER):** Identifying entities in text.

Example: A Dialogue System

A dialogue system (DS) is an NLP-based application capable of holding conversations with humans. An example is ChatGPT, which allows human-like text-based conversations. Components of a modular DS architecture include:

  • **Acoustic Turn-Taking (ATT)**: Detecting when a user finishes speaking.
  • **Automatic Speech Recognition (ASR)**: Converting speech to text.
  • **Sentiment Analyzer (SA)**: Determining emotional tone.
  • **Dialogue Manager (DM)**: Generating appropriate system responses.
  • **Text-to-Speech Synthesizer (TTS)**: Rendering speech output.

Two operating modes can function simultaneously: 1. **Long Loop**: User → ATT → ASR → Sentiment Analysis → DM → TTS. 2. **Short Loop**: User → ATT → DM → TTS.

ChatGPT Interaction Example

      1. **1. Key Language Signals for Frustration**

When analyzing text, I look at: - **Word Choice**: Phrases like "This isn’t working" or "Why does this keep happening?" often imply repeated failure or dissatisfaction. - **Repetition**: Repeated expressions of the same issue (e.g., "It’s still wrong") suggest emotional intensity, often associated with frustration. - **Tone of Questions**: Questions with emotionally charged words ("keep happening," "still wrong") can indicate frustration compared to neutral inquiry.

      1. **2. Comparison: Frustration vs. Inquiry/Curiosity**

To differentiate between frustration and curiosity/inquiry, I consider: - **Context**: The broader situation often reveals the user's emotional state. - **Phrasing Style**:

 - **Frustration**: Includes emotionally charged or negative descriptors (e.g., "It’s not working").
 - **Inquiry/Curiosity**: Focuses on exploration or learning (e.g., "Can you explain why this happens?").
      1. **3. How NLP Makes This Determination**

NLP techniques like sentiment analysis, lexical context, and behavioral patterns help determine emotional tone.

      1. **4. Why This Matters**

Recognizing emotional tone enables better adaptation of responses, providing reassurance for frustration and detailed explanations for curiosity.

Glossary of Terms

  1. **Natural Language Processing (NLP):** AI field enabling language understanding.
  2. **Tokenization:** Splitting text into smaller units.
  3. **Stemming:** Reducing words to their root forms.
  4. **Lemmatization:** Reducing words to base forms using vocabulary mapping.
  5. **Part of Speech (POS) Tagging:** Identifying grammatical roles of words.
  6. **Named Entity Recognition (NER):** Classifying entities like names and places.
  7. **Sentiment Analysis:** Determining the emotion behind text.
  8. **Machine Translation (MT):** Translating text between languages.
  9. **Word Cloud:** Visualizing word frequency.
  10. **BLEU:** Metric for evaluating translation quality.
  11. **Corpus/Corpora:** Linguistic datasets for NLP tasks.
  12. **Deep Learning:** Using neural networks with many layers for AI tasks.
  13. **Dialogue System (DS):** Applications enabling conversational AI.

Appendices

Refer to the provided resources for further study on NLP tools and techniques.