Go Engineer Systematic Course 009 [Study Notes]

Other features: Personal Center, Favorites, Manage shipping addresses (add, delete, modify, query), Messages. Copy inventory_srv --> userop_srv. Query and replace all inventory. Elasticsearch Deep Dive Document. 1. What is Elasticsearch. Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene, capable of quickly…

Other Features

  • Personal Center
  • Favorites
  • Manage Shipping Addresses (CRUD)
  • Messages

Copy inventory_srv--> userop_srv query and replace all inventory

Elasticsearch In-depth Analysis Document

1. What is Elasticsearch

Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene, capable of rapidly storing, searching, and analyzing massive amounts of data. It is a core component of the Elastic Stack (formerly ELK Stack).

2. Problems Faced by MySQL Search - In-depth Analysis

2.1 Detailed Explanation of Performance Issues

Problem Phenomenon:

-- When data volume reaches 1 million records, the following query may take several seconds
SELECT * FROM products WHERE name LIKE '%手机%' OR description LIKE '%手机%';

Performance Comparison Data:

Data Volume MySQL LIKE Query Elasticsearch Full-Text Search Performance Improvement
10,000 records 50ms 10ms 5 times
100,000 records 500ms 15ms 33 times
1 million records 5000ms 20ms 250 times
10 million records 50000ms+ 30ms 1600 times+

Root Causes:

  1. Full Table Scan: LIKE '%keyword%' cannot use B+ tree indexes, requiring scanning all rows
  2. I/O Intensive: Each query needs to read a large amount of data from disk
  3. CPU Intensive: String matching operations performed on every row of data
  4. Memory Pressure: Large amounts of data loaded into memory for processing

Real-world Case:

An e-commerce platform's product table has 5 million records. Using MySQL fuzzy search for "苹果手机" (Apple phone):
- Query time: 8.3 seconds
- CPU usage: Spiked to 85%
- With 10 concurrent queries, response time increased to over 30 seconds

2.2 Detailed Explanation of No Relevance Ranking Issue

Pain Points of MySQL Query Results:

-- MySQL can only sort by fixed rules
SELECT * FROM products
WHERE name LIKE '%手机%'
ORDER BY price DESC;  -- Can only sort by fields like price, time, etc.

Elasticsearch Relevance Scoring Mechanism:

Search term: "小米手机" (Xiaomi phone)

Relevance score calculation:
┌─────────────────────────────────────┐
│ Document 1: "小米手机12 Pro" (Xiaomi Phone 12 Pro)              │
│ • Term Frequency (TF): Both keywords appear           │
│ • Inverse Document Frequency (IDF): Calculates term rarity      │
│ • Field Length: Shorter title, higher weight         │
│ • Score: 9.8                         │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ Document 2: "这是一款性价比很高的手机" (This is a very cost-effective phone)      │
│ • Term Frequency (TF): Only "手机" (phone) appears            │
│ • Inverse Document Frequency (IDF): "手机" (phone) is common        │
│ • Field Length: Longer description, lower weight         │
│ • Score: 3.2                         │
└─────────────────────────────────────┘

Detailed Explanation of Relevance Factors:

  1. TF (Term Frequency): The frequency of a keyword appearing in a document
  2. IDF (Inverse Document Frequency): The rarity of a keyword across all documents
  3. Field Length Normalization: Matches in shorter fields have higher weight than in longer fields
  4. Field Weight Boost: Can set titles to be more important than content
  5. Query-time Weight: Can specify certain query terms as more important

2.3 Detailed Explanation of Inability to Full-Text Search

Limitations of MySQL Full-Text Index:

-- MySQL full-text index creation
ALTER TABLE products ADD FULLTEXT(name, description);

-- Problem 1: Minimum word length limit (default 4 characters)
-- "手机" (phone) can be searched, but "机" (machine) cannot

-- Problem 2: Poor Chinese word segmentation support
-- "苹果手机" (Apple phone) is treated as a whole, searching for "苹果" (Apple) won't find it

Elasticsearch Full-Text Search Capabilities:

// ES analysis process example
Input text: "我想买一台苹果手机" (I want to buy an Apple phone)

Tokenization result:
[我] [想] [买] [一台] [苹果] [手机] [苹果手机]

Synonym expansion:
[苹果] → [Apple, iPhone]
[手机] → [手机, 电话, mobile]

Spell correction:
"苹果手击" (Apple hand hit) → suggest "苹果手机" (Apple phone)

2.4 Detailed Explanation of Inaccurate Search and No Word Segmentation Issues

Problems with MySQL String Matching:

-- Search for "笔记本" (notebook)
SELECT * FROM products WHERE name LIKE '%笔记本%';
-- Result: Can find "笔记本电脑" (notebook computer)
-- Problem: Cannot find "笔记 本子" (notes notebook), "notebook", "手提电脑" (laptop)

Elasticsearch Smart Word Segmentation Process:

Original text: "ThinkPad X1 Carbon超轻薄笔记本电脑" (ThinkPad X1 Carbon ultra-thin laptop)

Standard tokenizer:
[ThinkPad] [X1] [Carbon] [超轻薄] [笔记本] [电脑]

IK tokenizer (Chinese):
[ThinkPad] [X1] [Carbon] [超] [轻薄] [超轻薄]
[笔记] [本] [笔记本] [电脑] [笔记本电脑]

Pinyin tokenizer:
[si] [kao] [pad] → Can be searched by pinyin

N-gram tokenization:
[Thi] [hin] [ink] [nkP] → Supports partial matching

3. What is Full-Text Search - Core Principle Analysis

3.1 Structured Data vs Unstructured Data

Structured Data (MySQL storage method):
┌──────┬────────┬────────┬────────┐
│  ID  │  Name  │ Price  │ Stock  │
├──────┼────────┼────────┼────────┤
│  1   │iPhone  │ 5999   │  100   │
│  2   │ 小米   │ 2999   │  200   │
└──────┴────────┴────────┴────────┘

Unstructured Data (text content):
"This iPhone uses an A15 processor, with powerful performance,
excellent camera effects, and 20% improved battery life,
User review: 'Awesome, great value for money!'"

3.2 Detailed Explanation of Inverted Index Principle

Forward Index (MySQL):

Document ID → Content
Doc1 → "小米手机" (Xiaomi phone)
Doc2 → "苹果手机" (Apple phone)
Doc3 → "小米电视" (Xiaomi TV)

Inverted Index (Elasticsearch):

Term → Document List
"小米" (Xiaomi) → [Doc1, Doc3]
"手机" (phone) → [Doc1, Doc2]
"苹果" (Apple) → [Doc2]
"电视" (TV) → [Doc3]

Search "小米手机" (Xiaomi phone):
1. Find "小米" (Xiaomi) → Get [Doc1, Doc3]
2. Find "手机" (phone) → Get [Doc1, Doc2]
3. Calculate intersection → Doc1 (most relevant)

3.3 Detailed Structure of Inverted Index

Complete Inverted Index Structure:

Term: "手机" (phone)
├── Document Frequency (DF): 1000 documents contain this term
├── Inverted List:
│   ├── Doc1:
│   │   ├── Term Frequency (TF): 3 times
│   │   ├── Positions: [5, 28, 102]
│   │   └── Fields: [title, description]
│   ├── Doc2:
│   │   ├── Term Frequency (TF): 1 time
│   │   ├── Positions: [15]
│   │   └── Fields: [title]
│   └── ...
└── Statistics: Highest term frequency, average term frequency, etc.

4. Elasticsearch Architecture Explained

4.1 Cluster Architecture

Elasticsearch Cluster Architecture Diagram:

┌─────────────── ES Cluster ──────────────┐
│                                         │
│  ┌─────────────────────────────────┐   │
│  │     Master Node                 │   │
│  │  • Cluster management           │   │
│  │  • Index creation/deletion      │   │
│  │  • Shard allocation             │   │
│  └─────────────────────────────────┘   │
│                                         │
│  ┌──────────┐  ┌──────────┐           │
│  │ Data     │  │ Data     │           │
│  │ Node 1   │  │ Node 2   │           │
│  │ ┌──────┐ │  │ ┌──────┐ │           │
│  │ │ P0   │ │  │ │ R0   │ │           │
│  │ ├──────┤ │  │ ├──────┤ │           │
│  │ │ R1   │ │  │ │ P1   │ │           │
│  │ └──────┘ │  │ └──────┘ │           │
│  └──────────┘  └──────────┘           │
│                                         │
│  P = Primary Shard                      │
│  R = Replica Shard                      │
└─────────────────────────────────────────┘

4.2 Data Write Process

Detailed Write Process:

Client → Coordinating Node → Primary Shard → Replica Shard

1. Client sends write request
   ↓
2. Coordinating node determines shard via hash routing
   ↓
3. Request forwarded to primary shard node
   ↓
4. Primary shard writes successfully
   ↓
5. Replicated in parallel to replica shards
   ↓
6. All replicas confirm
   ↓
7. Returns success response to client

Timeline:
T0 ──→ T1 ──→ T2 ──→ T3 ──→ T4
Receive   Route   Primary Shard  Replica   Response

4.3 Query Process

Query Execution Process:

Phase 1: Query
┌─────────────────────────────────┐
│ Coordinating node sends query requests to all shards   │
│ Each shard returns Top N document IDs and scores  │
└─────────────────────────────────┘
           ↓
Phase 2: Fetch
┌─────────────────────────────────┐
│ Coordinating node aggregates and sorts all results       │
│ Retrieves the complete content of the final required documents       │
└─────────────────────────────────┘

5. Detailed Explanation of Elasticsearch Core Features

5.1 Detailed Explanation of Query Types

// 1. Match query - Full-text search
{
  "query": {
    "match": {
      "title": {
        "query": "苹果手机",
        "operator": "and"  // Must contain all terms
      }
    }
  }
}

// 2. Term query - Exact match
{
  "query": {
    "term": {
      "category.keyword": "手机"  // No tokenization, exact match
    }
  }
}

// 3. Range query - Range search
{
  "query": {
    "range": {
      "price": {
        "gte": 1000,
        "lte": 5000
      }
    }
  }
}

// 4. Bool compound query
{
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "手机"}}
      ],
      "filter": [
        {"range": {"price": {"lte": 5000}}}
      ],
      "should": [
        {"match": {"brand": "苹果"}}  // Boosting factor
      ],
      "must_not": [
        {"term": {"status": "discontinued"}}
      ]
    }
  }
}

5.2 Aggregation Analysis Functionality

// Sales data analysis example
{
  "aggs": {
    "sales_per_category": {
      "terms": {
        "field": "category"
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        },
        "total_sales": {
          "sum": {
            "field": "sales_count"
          }
        },
        "price_ranges": {
          "range": {
            "field": "price",
            "ranges": [
              {"to": 1000},
              {"from": 1000, "to": 5000},
              {"from": 5000}
            ]
          }
        }
      }
    }
  }
}

6. Real-world Application Case Studies

6.1 E-commerce Search Optimization Case Study

Comparison of Search Optimization on an E-commerce Platform Before and After:

Metric MySQL Solution Elasticsearch Solution Improvement Effect
Average Search Latency 2.3 seconds 0.05 seconds 46 times improvement
Search Accuracy 65% 92% 27% increase
Zero Result Rate 18% 3% 15% decrease
Number of Servers 8 servers 3 servers 62.5% cost saving
Concurrency Capability 100 QPS 5000 QPS 50 times improvement

Implementation Details:

  1. Data Synchronization Architecture:
    MySQL(Primary Data) → Binlog → Logstash → Elasticsearch

    Scheduled Full Synchronization (nightly)
  2. Search Optimization Strategies:
  3. Pinyin Search: Supports searching "pinguo" to find "苹果" (Apple)
  4. Synonyms: Configure "手机" (phone), "电话" (telephone), "mobile" as synonyms
  5. Search Suggestions: Real-time prompts for possible search terms
  6. Correction Function: Automatically corrects common spelling errors

6.2 Log Analysis System Case Study

Log Analysis System of an Internet Company:

Log Processing Flow:

Application Server → Filebeat → Logstash → Elasticsearch → Kibana
     ↓           ↓          ↓            ↓            ↓
   Generate logs    Collect      Process & Transform      Store & Index     Visualize & Display

Processing Scale:
• Log Volume: 100GB per day
• Number of Log Entries: 1 billion entries/day
• Query Response: Millisecond level
• Retention Period: 30 days hot data, 1 year cold data

7. Performance Optimization Best Practices

7.1 Index Design Optimization

// Optimized Mapping Design
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart",
        "fields": {
          "keyword": {
            "type": "keyword"  // Supports exact match
          },
          "pinyin": {
            "type": "text",
            "analyzer": "pinyin"  // Supports pinyin search
          }
        }
      },
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100  // Price precision optimization
      },
      "category": {
        "type": "keyword"  // Categories do not need tokenization
      },
      "description": {
        "type": "text",
        "analyzer": "ik_smart"
      },
      "created_time": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||epoch_millis"
      }
    }
  }
}

7.2 Query Performance Optimization Techniques

  1. Use Filter instead of Query (when scoring is not needed)
    ```json
    // Before optimization: Use query (calculates score)
    {"query": {"term": {"status": "active"}}}

// After optimization: Use filter (does not calculate score, can be cached)
{"query": {"bool": {"filter": {"term": {"status": "active"}}}}}
```

  1. Reasonably set the number of shards
    ```
    Shard count reference formula:
    Number of shards = Data volume (GB) / 30GB

Example:
- 100GB data: 3-4 shards
- 1TB data: 35-40 shards
```

  1. Batch operation optimization
    json
    // Use bulk API for batch indexing
    POST _bulk
    {"index": {"_index": "products", "_id": 1}}
    {"name": "iPhone", "price": 5999}
    {"index": {"_index": "products", "_id": 2}}
    {"name": "小米", "price": 2999}

8. Elasticsearch vs Traditional Databases

8.1 Comparison of Applicable Scenarios

Scenario MySQL Elasticsearch Recommended Choice
Full-text Search ❌ Poor ✅ Excellent ES
Transaction Support ✅ Full ACID ❌ No transactions MySQL
Real-time Statistical Analysis ⚠️ Average ✅ Excellent ES
Relational Queries ✅ Excellent ❌ Limited MySQL
Geospatial Search ❌ Poor ✅ Excellent ES
Log Analysis ❌ Unsuitable ✅ Specialty ES
Precise Numerical Calculation ✅ Precise ⚠️ Approximate MySQL

8.2 Hybrid Architecture Solution

Recommended Hybrid Architecture:

        User Request
           ↓
    ┌──────────────┐
    │   Application Layer     │
    └──────────────┘
           ↓
    ┌──────────────────────────┐
    │      Search Request  → ES       │
    │      Transactional Operations  → MySQL    │
    │      Cache     → Redis    │
    └──────────────────────────┘

Data Synchronization:
MySQL(Write) → Binlog → Canal/Debezium → Kafka → ES(Read)

9. Common Problems and Solutions

9.1 Data Consistency Issues

Problem: MySQL and ES data inconsistency

Solutions:
1. Dual-write Strategy: Simultaneously write to MySQL and ES, using message queues to ensure eventual consistency
2. CDC (Change Data Capture): Real-time synchronization via Binlog
3. Regular Verification: Scheduled tasks to compare data differences and fix them

9.2 Deep Paging Issues

Problem: Extremely poor performance when querying data on the 10,000th page

Solutions:

// 1. Use search_after (recommended)
{
  "size": 10,
  "sort": [{"_id": "asc"}],
  "search_after": [10000]  // Sort value of the last document on the previous page
}

// 2. Use scroll API (suitable for export)
POST /products/_search?scroll=1m
{
  "size": 100,
  "query": {"match_all": {}}
}

10. Summary

Elasticsearch perfectly solves various problems faced by traditional databases in search scenarios through its inverted index, distributed architecture, and powerful full-text search capabilities. Proper use of Elasticsearch can:

  1. Improve search performance: From seconds to milliseconds
  2. Improve search quality: Through relevance scoring and smart tokenization
  3. Support complex analysis: Real-time aggregation and statistical analysis
  4. Reduce operational costs: Fewer servers, higher efficiency

However, it is important to note that Elasticsearch is not a replacement for MySQL, but a supplement. In actual projects, the appropriate storage solution should be chosen based on the specific scenario, and a hybrid architecture of MySQL + Elasticsearch can typically leverage their respective advantages.

主题测试文章,只做测试使用。发布者:Walker,转转请注明出处:https://walker-learn.xyz/archives/4782

(0)
Walker的头像Walker
上一篇 Mar 10, 2026 00:00
下一篇 Mar 8, 2026 15:40

Related Posts

  • Go Engineer System Course 004 [Study Notes]

    Requirements Analysis Backend Management System Product Management Product List Product Categories Brand Management Brand Categories Order Management Order List User Information Management User List User Addresses User Messages Carousel Management E-commerce System Login Page Homepage Product Search Product Category Navigation Carousel Display Recommended Products Display Product Details Page Product Image Display Product Description Product Specification Selection Add to Cart Shopping Cart Product List Quantity Adjustment Delete Product Checkout Function User Center Order Center My...

    Nov 25, 2025
    27300
  • Node: In-depth Yet Easy to Understand (Shengsi Garden Education) 003 [Study Notes]

    WebSocket and SSE Overview WebSocket Basics Definition: WebSocket is a full-duplex connection upgraded after an HTTP handshake, allowing clients and servers to push data bidirectionally over the same TCP channel, eliminating the need for repeated polling. Handshake Process: The client initiates an HTTP request with the Upgrade: websocket header; The server responds with 101 Switching Protocols, and both parties agree...

    Personal Nov 24, 2025
    39700
  • In-depth Understanding of ES6 009 [Learning Notes]

    Classes in JavaScript function PersonType(name){ this.name = name; } PersonType.prototype.sayName = function(){ console.log(this.name) } var person = new PersonType("Nicholas") p…

    Personal Mar 8, 2025
    1.2K00
  • Waving to the world, embracing infinite possibilities 🌍✨

    Standing higher, seeing further. Life is like a series of tall buildings; we constantly climb upwards, not to show off the height, but to see a broader landscape. The two girls in the picture stand atop the city, with outstretched arms, as if embracing the boundless possibilities of the world. This is not merely a journey overlooking the city, but rather, a tribute to freedom and dreams. Brave Exploration, Breaking Boundaries. Everyone's life is an adventure; we are born free, and thus should explore unknown landscapes and experience more stories. Perhaps there will be challenges along the way, but it is precisely those moments of ascent...

    Personal Feb 26, 2025
    1.4K00
  • Go Engineer Systematic Course 008 [Study Notes]

    Orders and Shopping Cart
    First, copy the service code framework of 'srv' from the inventory service, then find and replace the corresponding name (order_srv).

    Fundamentals of Encryption Technology
    Symmetric Encryption
    Principle:
    Uses the same key for encryption and decryption.
    Like a single key that can both lock and unlock a door.
    Fast encryption speed, suitable for large data transfers.
    Use cases:
    Local file encryption
    Database content encryption
    Content encryption during large data transfers
    Fast communication between internal systems...

    Personal Nov 25, 2025
    26500
EN
简体中文 繁體中文 English