AI-Natural Language Processing
Unit 8: Natural Language Processing
Contents
- What is natural language processing?
- An example of a dialogue system
- Introduction to machine translation
- Summary
- Natural Language Processing (NLP): Key Points and Summary
- Glossary of Terms
What is natural language processing?
Despite the level of complexity, there exist common patterns that can be exploited by computers to automatically perform human-like activities related to verbal communication. This is the goal of natural language processing (NLP), a discipline that combines linguistics and computing science to emulate our capacity to manage the language.
NLP Tasks
From a linguistic approach, the NLP tasks can be divided into the following categories:
Syntax
NLP tasks related to sentence structures include:
- **Part-of-Speech (POS) Tagging:** Automatically identifying the syntactical category (POS) of each word in a sentence. Example:
* *"Alice is a student of physics"* → [("Alice", NNP), ("is", VBZ), ("a", DT), ("student", NN), ("of", IN), ("physics", NNS)].
- **Parsing:** Determining all the syntactical relations of words in a sentence. Parse trees help represent these relations (refer to Figure 1).
Semantics
Semantics deals with the meaning of words, sentences, and texts. Common tasks include:
- **Optical Character Recognition (OCR):** Recognizing hand-written or printed words.
- **Natural Language Understanding (NLU):** Transforming sentences into semantic data structures.
- **Sentiment Analysis:** Classifying emotional feelings (e.g., positive, negative, neutral).
- **Machine Translation:** Automatically translating text between languages.
- **Topic Classification:** Detecting topics or subjects within texts.
Speech
NLP tasks related to voice include:
- **Speech Recognition:** Understanding human speech.
- **Speech Synthesis:** Converting text into speech.
Discourse and Dialogue
NLP tasks that address conversational and narrative interactions, such as:
- **Automatic Summarization:** Extracting key ideas from text.
- **Dialogue Act Classification:** Capturing the intention of utterances (e.g., questioning, greeting).
Factors for Success
The success of NLP applications is due to:
- Increased computing power (e.g., parallel CPUs and GPUs).
- Advancements in machine learning methods (e.g., deep learning).
- Availability of linguistic datasets (corpora).
- Insights from linguistic theories (e.g., Noam Chomsky's language rules).
An example of a dialogue system
A dialogue system (DS) is an NLP-based application capable of holding conversations with humans using speech. Dialogue systems often rely on modular architectures. Key components include:
- **Automatic Speech Recognition (ASR):** Recognizes words from audio.
- **Sentiment Analyzer (SA):** Classifies sentiment in speech.
- **Natural Language Understanding (NLU):** Transforms words into semantic logical forms.
- **Natural Language Generation (NLG):** Generates appropriate responses.
- **Text-to-Speech (TTS):** Converts text responses to audio.
There are two working modes: 1. **Long loop:** User → ATT → ASR → EV → DAT → SA → EM → NLU → DM → ASM → NLG → TTS → ECA. 2. **Short loop:** User → ATT → IM → DM → ASM → ECA.
Introduction to machine translation
Machine translation (MT) focuses on transforming text between languages. Key points:
- **Statistical MT:** Uses aligned parallel corpora for translations.
- **BLEU (Bilingual Evaluation Understudy):** A scoring system to evaluate translation quality. Scores range from 0 to 1, with higher scores indicating closer matches to reference translations.
Summary
Natural language processing (NLP) is a branch of AI that enables computers to understand and process human language. Applications include:
- Language translation.
- Text classification.
- Sentiment analysis.
Real-world examples:
- Chatbots for customer service.
- Translation apps.
- Social media analytics.
Natural Language Processing (NLP): Key Points and Summary
Introduction
NLP bridges unstructured and structured data, enabling computers to process human language effectively.
Use Cases
Applications include:
- Machine translation.
- Virtual assistants.
- Sentiment analysis.
- Spam detection.
Tools and Techniques
- **Tokenization:** Breaking text into tokens.
- **Stemming and Lemmatization:** Reducing words to root forms.
- **POS Tagging:** Identifying grammatical roles.
- **Named Entity Recognition (NER):** Identifying entities in text.
Example: A Dialogue System
A dialogue system (DS) is an NLP-based application capable of holding conversations with humans. An example is ChatGPT, which allows human-like text-based conversations. Components of a modular DS architecture include:
- **Acoustic Turn-Taking (ATT)**: Detecting when a user finishes speaking.
- **Automatic Speech Recognition (ASR)**: Converting speech to text.
- **Sentiment Analyzer (SA)**: Determining emotional tone.
- **Dialogue Manager (DM)**: Generating appropriate system responses.
- **Text-to-Speech Synthesizer (TTS)**: Rendering speech output.
Two operating modes can function simultaneously: 1. **Long Loop**: User → ATT → ASR → Sentiment Analysis → DM → TTS. 2. **Short Loop**: User → ATT → DM → TTS.
ChatGPT Interaction Example
- **1. Key Language Signals for Frustration**
When analyzing text, I look at: - **Word Choice**: Phrases like "This isn’t working" or "Why does this keep happening?" often imply repeated failure or dissatisfaction. - **Repetition**: Repeated expressions of the same issue (e.g., "It’s still wrong") suggest emotional intensity, often associated with frustration. - **Tone of Questions**: Questions with emotionally charged words ("keep happening," "still wrong") can indicate frustration compared to neutral inquiry.
- **2. Comparison: Frustration vs. Inquiry/Curiosity**
To differentiate between frustration and curiosity/inquiry, I consider: - **Context**: The broader situation often reveals the user's emotional state. - **Phrasing Style**:
- **Frustration**: Includes emotionally charged or negative descriptors (e.g., "It’s not working"). - **Inquiry/Curiosity**: Focuses on exploration or learning (e.g., "Can you explain why this happens?").
- **3. How NLP Makes This Determination**
NLP techniques like sentiment analysis, lexical context, and behavioral patterns help determine emotional tone.
- **4. Why This Matters**
Recognizing emotional tone enables better adaptation of responses, providing reassurance for frustration and detailed explanations for curiosity.
Glossary of Terms
- **Natural Language Processing (NLP):** AI field enabling language understanding.
- **Tokenization:** Splitting text into smaller units.
- **Stemming:** Reducing words to their root forms.
- **Lemmatization:** Reducing words to base forms using vocabulary mapping.
- **Part of Speech (POS) Tagging:** Identifying grammatical roles of words.
- **Named Entity Recognition (NER):** Classifying entities like names and places.
- **Sentiment Analysis:** Determining the emotion behind text.
- **Machine Translation (MT):** Translating text between languages.
- **Word Cloud:** Visualizing word frequency.
- **BLEU:** Metric for evaluating translation quality.
- **Corpus/Corpora:** Linguistic datasets for NLP tasks.
- **Deep Learning:** Using neural networks with many layers for AI tasks.
- **Dialogue System (DS):** Applications enabling conversational AI.
Appendices
Refer to the provided resources for further study on NLP tools and techniques.