AI-Natural Language Processing

From MediaWiki
Revision as of 13:07, 4 January 2025 by Paulreed (talk | contribs) (→‎Unit 8: Natural Language Processing)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Natural Language Processing

Contents

What is natural language processing?

Despite the level of complexity, there exist common patterns that can be exploited by computers to automatically perform human-like activities related to verbal communication. This is the goal of natural language processing (NLP), a discipline that combines linguistics and computing science to emulate our capacity to manage the language.

NLP Tasks

From a linguistic approach, the NLP tasks can be divided into the following categories:

Syntax

NLP tasks related to sentence structures include:

  • **Part-of-Speech (POS) Tagging:** Automatically identifying the syntactical category (POS) of each word in a sentence. Example:
 * *"Alice is a student of physics"* → [("Alice", NNP), ("is", VBZ), ("a", DT), ("student", NN), ("of", IN), ("physics", NNS)].
  • **Parsing:** Determining all the syntactical relations of words in a sentence. Parse trees help represent these relations (refer to Figure 1).

Semantics

Semantics deals with the meaning of words, sentences, and texts. Common tasks include:

  • **Optical Character Recognition (OCR):** Recognizing hand-written or printed words.
  • **Natural Language Understanding (NLU):** Transforming sentences into semantic data structures.
  • **Sentiment Analysis:** Classifying emotional feelings (e.g., positive, negative, neutral).
  • **Machine Translation:** Automatically translating text between languages.
  • **Topic Classification:** Detecting topics or subjects within texts.

Speech

NLP tasks related to voice include:

  • **Speech Recognition:** Understanding human speech.
  • **Speech Synthesis:** Converting text into speech.

Discourse and Dialogue

NLP tasks that address conversational and narrative interactions, such as:

  • **Automatic Summarization:** Extracting key ideas from text.
  • **Dialogue Act Classification:** Capturing the intention of utterances (e.g., questioning, greeting).

Factors for Success

The success of NLP applications is due to:

  • Increased computing power (e.g., parallel CPUs and GPUs).
  • Advancements in machine learning methods (e.g., deep learning).
  • Availability of linguistic datasets (corpora).
  • Insights from linguistic theories (e.g., Noam Chomsky's language rules).

An example of a dialogue system

A dialogue system (DS) is an NLP-based application capable of holding conversations with humans using speech. Dialogue systems often rely on modular architectures. Key components include:

  • **Automatic Speech Recognition (ASR):** Recognizes words from audio.
  • **Sentiment Analyzer (SA):** Classifies sentiment in speech.
  • **Natural Language Understanding (NLU):** Transforms words into semantic logical forms.
  • **Natural Language Generation (NLG):** Generates appropriate responses.
  • **Text-to-Speech (TTS):** Converts text responses to audio.

There are two working modes: 1. **Long loop:** User → ATT → ASR → EV → DAT → SA → EM → NLU → DM → ASM → NLG → TTS → ECA. 2. **Short loop:** User → ATT → IM → DM → ASM → ECA.

Introduction to machine translation

Machine translation (MT) focuses on transforming text between languages. Key points:

  • **Statistical MT:** Uses aligned parallel corpora for translations.
  • **BLEU (Bilingual Evaluation Understudy):** A scoring system to evaluate translation quality. Scores range from 0 to 1, with higher scores indicating closer matches to reference translations.

Summary

Natural language processing (NLP) is a branch of AI that enables computers to understand and process human language. Applications include:

  • Language translation.
  • Text classification.
  • Sentiment analysis.

Real-world examples:

  • Chatbots for customer service.
  • Translation apps.
  • Social media analytics.

Natural Language Processing (NLP): Key Points and Summary

Introduction

NLP bridges unstructured and structured data, enabling computers to process human language effectively.

Use Cases

Applications include:

  • Machine translation.
  • Virtual assistants.
  • Sentiment analysis.
  • Spam detection.

Tools and Techniques

  • **Tokenization:** Breaking text into tokens.
  • **Stemming and Lemmatization:** Reducing words to root forms.
  • **POS Tagging:** Identifying grammatical roles.
  • **Named Entity Recognition (NER):** Identifying entities in text.

Example: A Dialogue System

A dialogue system (DS) is an NLP-based application capable of holding conversations with humans. An example is ChatGPT, which allows human-like text-based conversations. Components of a modular DS architecture include:

  • **Acoustic Turn-Taking (ATT)**: Detecting when a user finishes speaking.
  • **Automatic Speech Recognition (ASR)**: Converting speech to text.
  • **Sentiment Analyzer (SA)**: Determining emotional tone.
  • **Dialogue Manager (DM)**: Generating appropriate system responses.
  • **Text-to-Speech Synthesizer (TTS)**: Rendering speech output.

Two operating modes can function simultaneously: 1. **Long Loop**: User → ATT → ASR → Sentiment Analysis → DM → TTS. 2. **Short Loop**: User → ATT → DM → TTS.

ChatGPT Interaction Example

      1. **1. Key Language Signals for Frustration**

When analyzing text, I look at: - **Word Choice**: Phrases like "This isn’t working" or "Why does this keep happening?" often imply repeated failure or dissatisfaction. - **Repetition**: Repeated expressions of the same issue (e.g., "It’s still wrong") suggest emotional intensity, often associated with frustration. - **Tone of Questions**: Questions with emotionally charged words ("keep happening," "still wrong") can indicate frustration compared to neutral inquiry.

      1. **2. Comparison: Frustration vs. Inquiry/Curiosity**

To differentiate between frustration and curiosity/inquiry, I consider: - **Context**: The broader situation often reveals the user's emotional state. - **Phrasing Style**:

 - **Frustration**: Includes emotionally charged or negative descriptors (e.g., "It’s not working").
 - **Inquiry/Curiosity**: Focuses on exploration or learning (e.g., "Can you explain why this happens?").
      1. **3. How NLP Makes This Determination**

NLP techniques like sentiment analysis, lexical context, and behavioral patterns help determine emotional tone.

      1. **4. Why This Matters**

Recognizing emotional tone enables better adaptation of responses, providing reassurance for frustration and detailed explanations for curiosity.

Glossary of Terms

  1. **Natural Language Processing (NLP):** AI field enabling language understanding.
  2. **Tokenization:** Splitting text into smaller units.
  3. **Stemming:** Reducing words to their root forms.
  4. **Lemmatization:** Reducing words to base forms using vocabulary mapping.
  5. **Part of Speech (POS) Tagging:** Identifying grammatical roles of words.
  6. **Named Entity Recognition (NER):** Classifying entities like names and places.
  7. **Sentiment Analysis:** Determining the emotion behind text.
  8. **Machine Translation (MT):** Translating text between languages.
  9. **Word Cloud:** Visualizing word frequency.
  10. **BLEU:** Metric for evaluating translation quality.
  11. **Corpus/Corpora:** Linguistic datasets for NLP tasks.
  12. **Deep Learning:** Using neural networks with many layers for AI tasks.
  13. **Dialogue System (DS):** Applications enabling conversational AI.

Appendices

Refer to the provided resources for further study on NLP tools and techniques.