Go Engineer Systematic Course 009 [Study Notes]

Other features: Personal Center, Favorites, Manage shipping addresses (add, delete, modify, query), Messages. Copy inventory_srv --> userop_srv. Query and replace all inventory. Elasticsearch Deep Dive Document. 1. What is Elasticsearch. Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene, capable of quickly…

Other Features

  • Personal Center
  • Favorites
  • Manage Shipping Addresses (CRUD)
  • Messages

Copy inventory_srv--> userop_srv query and replace all inventory

Elasticsearch In-depth Analysis Document

1. What is Elasticsearch

Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene, capable of rapidly storing, searching, and analyzing massive amounts of data. It is a core component of the Elastic Stack (formerly ELK Stack).

2. Problems Faced by MySQL Search - In-depth Analysis

2.1 Detailed Explanation of Performance Issues

Problem Phenomenon:

-- When data volume reaches 1 million records, the following query may take several seconds
SELECT * FROM products WHERE name LIKE '%手机%' OR description LIKE '%手机%';

Performance Comparison Data:

Data Volume MySQL LIKE Query Elasticsearch Full-Text Search Performance Improvement
10,000 records 50ms 10ms 5 times
100,000 records 500ms 15ms 33 times
1 million records 5000ms 20ms 250 times
10 million records 50000ms+ 30ms 1600 times+

Root Causes:

  1. Full Table Scan: LIKE '%keyword%' cannot use B+ tree indexes, requiring scanning all rows
  2. I/O Intensive: Each query needs to read a large amount of data from disk
  3. CPU Intensive: String matching operations performed on every row of data
  4. Memory Pressure: Large amounts of data loaded into memory for processing

Real-world Case:

An e-commerce platform's product table has 5 million records. Using MySQL fuzzy search for "苹果手机" (Apple phone):
- Query time: 8.3 seconds
- CPU usage: Spiked to 85%
- With 10 concurrent queries, response time increased to over 30 seconds

2.2 Detailed Explanation of No Relevance Ranking Issue

Pain Points of MySQL Query Results:

-- MySQL can only sort by fixed rules
SELECT * FROM products
WHERE name LIKE '%手机%'
ORDER BY price DESC;  -- Can only sort by fields like price, time, etc.

Elasticsearch Relevance Scoring Mechanism:

Search term: "小米手机" (Xiaomi phone)

Relevance score calculation:
┌─────────────────────────────────────┐
│ Document 1: "小米手机12 Pro" (Xiaomi Phone 12 Pro)              │
│ • Term Frequency (TF): Both keywords appear           │
│ • Inverse Document Frequency (IDF): Calculates term rarity      │
│ • Field Length: Shorter title, higher weight         │
│ • Score: 9.8                         │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ Document 2: "这是一款性价比很高的手机" (This is a very cost-effective phone)      │
│ • Term Frequency (TF): Only "手机" (phone) appears            │
│ • Inverse Document Frequency (IDF): "手机" (phone) is common        │
│ • Field Length: Longer description, lower weight         │
│ • Score: 3.2                         │
└─────────────────────────────────────┘

Detailed Explanation of Relevance Factors:

  1. TF (Term Frequency): The frequency of a keyword appearing in a document
  2. IDF (Inverse Document Frequency): The rarity of a keyword across all documents
  3. Field Length Normalization: Matches in shorter fields have higher weight than in longer fields
  4. Field Weight Boost: Can set titles to be more important than content
  5. Query-time Weight: Can specify certain query terms as more important

2.3 Detailed Explanation of Inability to Full-Text Search

Limitations of MySQL Full-Text Index:

-- MySQL full-text index creation
ALTER TABLE products ADD FULLTEXT(name, description);

-- Problem 1: Minimum word length limit (default 4 characters)
-- "手机" (phone) can be searched, but "机" (machine) cannot

-- Problem 2: Poor Chinese word segmentation support
-- "苹果手机" (Apple phone) is treated as a whole, searching for "苹果" (Apple) won't find it

Elasticsearch Full-Text Search Capabilities:

// ES analysis process example
Input text: "我想买一台苹果手机" (I want to buy an Apple phone)

Tokenization result:
[我] [想] [买] [一台] [苹果] [手机] [苹果手机]

Synonym expansion:
[苹果] → [Apple, iPhone]
[手机] → [手机, 电话, mobile]

Spell correction:
"苹果手击" (Apple hand hit) → suggest "苹果手机" (Apple phone)

2.4 Detailed Explanation of Inaccurate Search and No Word Segmentation Issues

Problems with MySQL String Matching:

-- Search for "笔记本" (notebook)
SELECT * FROM products WHERE name LIKE '%笔记本%';
-- Result: Can find "笔记本电脑" (notebook computer)
-- Problem: Cannot find "笔记 本子" (notes notebook), "notebook", "手提电脑" (laptop)

Elasticsearch Smart Word Segmentation Process:

Original text: "ThinkPad X1 Carbon超轻薄笔记本电脑" (ThinkPad X1 Carbon ultra-thin laptop)

Standard tokenizer:
[ThinkPad] [X1] [Carbon] [超轻薄] [笔记本] [电脑]

IK tokenizer (Chinese):
[ThinkPad] [X1] [Carbon] [超] [轻薄] [超轻薄]
[笔记] [本] [笔记本] [电脑] [笔记本电脑]

Pinyin tokenizer:
[si] [kao] [pad] → Can be searched by pinyin

N-gram tokenization:
[Thi] [hin] [ink] [nkP] → Supports partial matching

3. What is Full-Text Search - Core Principle Analysis

3.1 Structured Data vs Unstructured Data

Structured Data (MySQL storage method):
┌──────┬────────┬────────┬────────┐
│  ID  │  Name  │ Price  │ Stock  │
├──────┼────────┼────────┼────────┤
│  1   │iPhone  │ 5999   │  100   │
│  2   │ 小米   │ 2999   │  200   │
└──────┴────────┴────────┴────────┘

Unstructured Data (text content):
"This iPhone uses an A15 processor, with powerful performance,
excellent camera effects, and 20% improved battery life,
User review: 'Awesome, great value for money!'"

3.2 Detailed Explanation of Inverted Index Principle

Forward Index (MySQL):

Document ID → Content
Doc1 → "小米手机" (Xiaomi phone)
Doc2 → "苹果手机" (Apple phone)
Doc3 → "小米电视" (Xiaomi TV)

Inverted Index (Elasticsearch):

Term → Document List
"小米" (Xiaomi) → [Doc1, Doc3]
"手机" (phone) → [Doc1, Doc2]
"苹果" (Apple) → [Doc2]
"电视" (TV) → [Doc3]

Search "小米手机" (Xiaomi phone):
1. Find "小米" (Xiaomi) → Get [Doc1, Doc3]
2. Find "手机" (phone) → Get [Doc1, Doc2]
3. Calculate intersection → Doc1 (most relevant)

3.3 Detailed Structure of Inverted Index

Complete Inverted Index Structure:

Term: "手机" (phone)
├── Document Frequency (DF): 1000 documents contain this term
├── Inverted List:
│   ├── Doc1:
│   │   ├── Term Frequency (TF): 3 times
│   │   ├── Positions: [5, 28, 102]
│   │   └── Fields: [title, description]
│   ├── Doc2:
│   │   ├── Term Frequency (TF): 1 time
│   │   ├── Positions: [15]
│   │   └── Fields: [title]
│   └── ...
└── Statistics: Highest term frequency, average term frequency, etc.

4. Elasticsearch Architecture Explained

4.1 Cluster Architecture

Elasticsearch Cluster Architecture Diagram:

┌─────────────── ES Cluster ──────────────┐
│                                         │
│  ┌─────────────────────────────────┐   │
│  │     Master Node                 │   │
│  │  • Cluster management           │   │
│  │  • Index creation/deletion      │   │
│  │  • Shard allocation             │   │
│  └─────────────────────────────────┘   │
│                                         │
│  ┌──────────┐  ┌──────────┐           │
│  │ Data     │  │ Data     │           │
│  │ Node 1   │  │ Node 2   │           │
│  │ ┌──────┐ │  │ ┌──────┐ │           │
│  │ │ P0   │ │  │ │ R0   │ │           │
│  │ ├──────┤ │  │ ├──────┤ │           │
│  │ │ R1   │ │  │ │ P1   │ │           │
│  │ └──────┘ │  │ └──────┘ │           │
│  └──────────┘  └──────────┘           │
│                                         │
│  P = Primary Shard                      │
│  R = Replica Shard                      │
└─────────────────────────────────────────┘

4.2 Data Write Process

Detailed Write Process:

Client → Coordinating Node → Primary Shard → Replica Shard

1. Client sends write request
   ↓
2. Coordinating node determines shard via hash routing
   ↓
3. Request forwarded to primary shard node
   ↓
4. Primary shard writes successfully
   ↓
5. Replicated in parallel to replica shards
   ↓
6. All replicas confirm
   ↓
7. Returns success response to client

Timeline:
T0 ──→ T1 ──→ T2 ──→ T3 ──→ T4
Receive   Route   Primary Shard  Replica   Response

4.3 Query Process

Query Execution Process:

Phase 1: Query
┌─────────────────────────────────┐
│ Coordinating node sends query requests to all shards   │
│ Each shard returns Top N document IDs and scores  │
└─────────────────────────────────┘
           ↓
Phase 2: Fetch
┌─────────────────────────────────┐
│ Coordinating node aggregates and sorts all results       │
│ Retrieves the complete content of the final required documents       │
└─────────────────────────────────┘

5. Detailed Explanation of Elasticsearch Core Features

5.1 Detailed Explanation of Query Types

// 1. Match query - Full-text search
{
  "query": {
    "match": {
      "title": {
        "query": "苹果手机",
        "operator": "and"  // Must contain all terms
      }
    }
  }
}

// 2. Term query - Exact match
{
  "query": {
    "term": {
      "category.keyword": "手机"  // No tokenization, exact match
    }
  }
}

// 3. Range query - Range search
{
  "query": {
    "range": {
      "price": {
        "gte": 1000,
        "lte": 5000
      }
    }
  }
}

// 4. Bool compound query
{
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "手机"}}
      ],
      "filter": [
        {"range": {"price": {"lte": 5000}}}
      ],
      "should": [
        {"match": {"brand": "苹果"}}  // Boosting factor
      ],
      "must_not": [
        {"term": {"status": "discontinued"}}
      ]
    }
  }
}

5.2 Aggregation Analysis Functionality

// Sales data analysis example
{
  "aggs": {
    "sales_per_category": {
      "terms": {
        "field": "category"
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        },
        "total_sales": {
          "sum": {
            "field": "sales_count"
          }
        },
        "price_ranges": {
          "range": {
            "field": "price",
            "ranges": [
              {"to": 1000},
              {"from": 1000, "to": 5000},
              {"from": 5000}
            ]
          }
        }
      }
    }
  }
}

6. Real-world Application Case Studies

6.1 E-commerce Search Optimization Case Study

Comparison of Search Optimization on an E-commerce Platform Before and After:

Metric MySQL Solution Elasticsearch Solution Improvement Effect
Average Search Latency 2.3 seconds 0.05 seconds 46 times improvement
Search Accuracy 65% 92% 27% increase
Zero Result Rate 18% 3% 15% decrease
Number of Servers 8 servers 3 servers 62.5% cost saving
Concurrency Capability 100 QPS 5000 QPS 50 times improvement

Implementation Details:

  1. Data Synchronization Architecture:
    MySQL(Primary Data) → Binlog → Logstash → Elasticsearch

    Scheduled Full Synchronization (nightly)
  2. Search Optimization Strategies:
  3. Pinyin Search: Supports searching "pinguo" to find "苹果" (Apple)
  4. Synonyms: Configure "手机" (phone), "电话" (telephone), "mobile" as synonyms
  5. Search Suggestions: Real-time prompts for possible search terms
  6. Correction Function: Automatically corrects common spelling errors

6.2 Log Analysis System Case Study

Log Analysis System of an Internet Company:

Log Processing Flow:

Application Server → Filebeat → Logstash → Elasticsearch → Kibana
     ↓           ↓          ↓            ↓            ↓
   Generate logs    Collect      Process & Transform      Store & Index     Visualize & Display

Processing Scale:
• Log Volume: 100GB per day
• Number of Log Entries: 1 billion entries/day
• Query Response: Millisecond level
• Retention Period: 30 days hot data, 1 year cold data

7. Performance Optimization Best Practices

7.1 Index Design Optimization

// Optimized Mapping Design
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart",
        "fields": {
          "keyword": {
            "type": "keyword"  // Supports exact match
          },
          "pinyin": {
            "type": "text",
            "analyzer": "pinyin"  // Supports pinyin search
          }
        }
      },
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100  // Price precision optimization
      },
      "category": {
        "type": "keyword"  // Categories do not need tokenization
      },
      "description": {
        "type": "text",
        "analyzer": "ik_smart"
      },
      "created_time": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||epoch_millis"
      }
    }
  }
}

7.2 Query Performance Optimization Techniques

  1. Use Filter instead of Query (when scoring is not needed)
    ```json
    // Before optimization: Use query (calculates score)
    {"query": {"term": {"status": "active"}}}

// After optimization: Use filter (does not calculate score, can be cached)
{"query": {"bool": {"filter": {"term": {"status": "active"}}}}}
```

  1. Reasonably set the number of shards
    ```
    Shard count reference formula:
    Number of shards = Data volume (GB) / 30GB

Example:
- 100GB data: 3-4 shards
- 1TB data: 35-40 shards
```

  1. Batch operation optimization
    json
    // Use bulk API for batch indexing
    POST _bulk
    {"index": {"_index": "products", "_id": 1}}
    {"name": "iPhone", "price": 5999}
    {"index": {"_index": "products", "_id": 2}}
    {"name": "小米", "price": 2999}

8. Elasticsearch vs Traditional Databases

8.1 Comparison of Applicable Scenarios

Scenario MySQL Elasticsearch Recommended Choice
Full-text Search ❌ Poor ✅ Excellent ES
Transaction Support ✅ Full ACID ❌ No transactions MySQL
Real-time Statistical Analysis ⚠️ Average ✅ Excellent ES
Relational Queries ✅ Excellent ❌ Limited MySQL
Geospatial Search ❌ Poor ✅ Excellent ES
Log Analysis ❌ Unsuitable ✅ Specialty ES
Precise Numerical Calculation ✅ Precise ⚠️ Approximate MySQL

8.2 Hybrid Architecture Solution

Recommended Hybrid Architecture:

        User Request
           ↓
    ┌──────────────┐
    │   Application Layer     │
    └──────────────┘
           ↓
    ┌──────────────────────────┐
    │      Search Request  → ES       │
    │      Transactional Operations  → MySQL    │
    │      Cache     → Redis    │
    └──────────────────────────┘

Data Synchronization:
MySQL(Write) → Binlog → Canal/Debezium → Kafka → ES(Read)

9. Common Problems and Solutions

9.1 Data Consistency Issues

Problem: MySQL and ES data inconsistency

Solutions:
1. Dual-write Strategy: Simultaneously write to MySQL and ES, using message queues to ensure eventual consistency
2. CDC (Change Data Capture): Real-time synchronization via Binlog
3. Regular Verification: Scheduled tasks to compare data differences and fix them

9.2 Deep Paging Issues

Problem: Extremely poor performance when querying data on the 10,000th page

Solutions:

// 1. Use search_after (recommended)
{
  "size": 10,
  "sort": [{"_id": "asc"}],
  "search_after": [10000]  // Sort value of the last document on the previous page
}

// 2. Use scroll API (suitable for export)
POST /products/_search?scroll=1m
{
  "size": 100,
  "query": {"match_all": {}}
}

10. Summary

Elasticsearch perfectly solves various problems faced by traditional databases in search scenarios through its inverted index, distributed architecture, and powerful full-text search capabilities. Proper use of Elasticsearch can:

  1. Improve search performance: From seconds to milliseconds
  2. Improve search quality: Through relevance scoring and smart tokenization
  3. Support complex analysis: Real-time aggregation and statistical analysis
  4. Reduce operational costs: Fewer servers, higher efficiency

However, it is important to note that Elasticsearch is not a replacement for MySQL, but a supplement. In actual projects, the appropriate storage solution should be chosen based on the specific scenario, and a hybrid architecture of MySQL + Elasticsearch can typically leverage their respective advantages.

主题测试文章,只做测试使用。发布者:Walker,转转请注明出处:https://walker-learn.xyz/archives/4782

(0)
Walker的头像Walker
上一篇 Mar 7, 2026 10:00
下一篇 Mar 7, 2026 08:00

Related Posts

  • TS Mount Everest 002 [Study Notes]

    Generics /* * @Author: error: error: git config user.name & please set dead value or install git && error: git config user.email & please set dead value or install git &a…

    Personal Mar 27, 2025
    1.6K00
  • Go Engineer System Course 010 [Study Notes]

    Install Elasticsearch (understand as a database) and Kibana (understand as a connection tool). The versions of ES and Kibana (port 5601) must be consistent.

    Learning Elasticsearch (ES) by comparison with MySQL: Terminology Mapping
    MySQL | Elasticsearch
    database | index (索引)
    table | type (fixed as _doc from 7.x, multiple types completely removed in 8.x...)

    Personal Nov 25, 2025
    43300
  • Node: In-depth Yet Easy to Understand (Shengsi Garden Education) 003 [Study Notes]

    WebSocket and SSE Overview WebSocket Basics Definition: WebSocket is a full-duplex connection upgraded after an HTTP handshake, allowing clients and servers to push data bidirectionally over the same TCP channel, eliminating the need for repeated polling. Handshake Process: The client initiates an HTTP request with the Upgrade: websocket header; The server responds with 101 Switching Protocols, and both parties agree...

    Personal Nov 24, 2025
    44900
  • Go Engineer Training Course 018 [Learning Notes]

    Getting Started with API Gateway and Continuous Deployment (Kong & Jenkins) corresponds to the course materials "Chapter 2: Getting Started with Jenkins" and "Chapter 3: Deploying Services with Jenkins", outlining the practical path for Kong and Jenkins in enterprise-level continuous delivery. Even with zero prior experience, you can follow the steps to build your own gateway + continuous deployment pipeline. Pre-class Introduction: What is an API Gateway? An API Gateway sits between clients and backend microservices...

    Personal Nov 25, 2025
    28600
  • Go Engineering Systematic Course 014 [Study Notes]

    RocketMQ Quick Start. Go to our various configurations (podman) to see how it's installed. Introduction to Concepts: RocketMQ is a distributed messaging middleware open-sourced by Alibaba and an Apache top-level project. Core components: NameServer: Service discovery and routing; Broker: Message storage, delivery, and fetching; Producer: Message producer (sends messages); Consumer: Message consumer (subscribes to and consumes messages); Topic/Tag: Topic/...

    Personal Nov 25, 2025
    28800
EN
简体中文 繁體中文 English