Other Features

User Center
Favorites
Manage Shipping Addresses (CRUD)
Messages

Copy inventory_srv--> userop_srv query and replace all inventory

Elasticsearch In-depth Analysis Document

1. What is Elasticsearch

Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene, capable of rapidly storing, searching, and analyzing massive amounts of data. It is a core component of the Elastic Stack (formerly ELK Stack).

2. Problems Faced by MySQL Search - In-depth Analysis

2.1 Detailed Explanation of Low Performance Issues

Problem Phenomenon:

-- 当数据量达到100万条时，以下查询可能需要数秒
SELECT * FROM products WHERE name LIKE '%手机%' OR description LIKE '%手机%';

Performance Comparison Data:

Data Volume	MySQL LIKE Query	Elasticsearch Full-Text Search	Performance Improvement
10K records	50ms	10ms	5x
100K records	500ms	15ms	33x
1M records	5000ms	20ms	250x
10M records	50000ms+	30ms	1600x+

Root Causes:

Full table scan: LIKE '%keyword%' cannot use B+ tree indexes and must scan all rows
I/O intensive: Each query requires reading a large amount of data from disk
CPU intensive: String matching operations are performed on every row of data
Memory pressure: Large amounts of data are loaded into memory for processing

Real-world Case:

A product table on an e-commerce platform has 5 million records. Using MySQL fuzzy search for "Apple phone":
- Query time: 8.3 seconds
- CPU utilization: soared to 85%
- When 10 concurrent queries were made, response time increased to over 30 seconds

2.2 Detailed Explanation of No Relevance Ranking Issue

Pain Points of MySQL Query Results:

-- MySQL只能按固定规则排序
SELECT * FROM products
WHERE name LIKE '%手机%'
ORDER BY price DESC;  -- 只能按价格、时间等字段排序

Elasticsearch's Relevance Scoring Mechanism:

Search term: "小米手机" (Xiaomi phone)

Relevance score calculation:
┌─────────────────────────────────────┐
│ Document 1: "小米手机12 Pro" (Xiaomi Phone 12 Pro)              │
│ • Term Frequency (TF): Both keywords appear           │
│ • Inverse Document Frequency (IDF): Calculates the rarity of the term      │
│ • Field length: Shorter title, higher weight         │
│ • Score: 9.8                         │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ Document 2: "这是一款性价比很高的手机" (This is a very cost-effective phone)      │
│ • Term Frequency (TF): Only "手机" (phone) appears            │
│ • Inverse Document Frequency (IDF): "手机" (phone) is more common        │
│ • Field length: Longer description, lower weight         │
│ • Score: 3.2                         │
└─────────────────────────────────────┘

Detailed Explanation of Relevance Factors:

TF (Term Frequency): The frequency of keywords appearing in a document
IDF (Inverse Document Frequency): The rarity of keywords across all documents
Field length normalization: Matches in shorter fields have higher weight than in longer fields
Field weight boost: Can set titles to be more important than content
Query-time weight: Can specify certain query terms as more important

2.3 Detailed Explanation of Inability to Perform Full-Text Search

Limitations of MySQL Full-Text Index:

-- MySQL全文索引创建
ALTER TABLE products ADD FULLTEXT(name, description);

-- Problem 1: Minimum word length limit (default 4 characters)
-- "手机" (phone) can be searched, but "机" (machine/device) cannot

-- Problem 2: Poor Chinese word segmentation support
-- "苹果手机" (Apple phone) is treated as a whole, searching for "苹果" (Apple) won't find it

Elasticsearch Full-Text Search Capabilities:

// ES analysis process example
Input text: "我想买一台苹果手机" (I want to buy an Apple phone)

Tokenization result:
[我] [想] [买] [一台] [苹果] [手机] [苹果手机]

Synonym expansion:
[苹果] → [Apple, iPhone]
[手机] → [phone, telephone, mobile]

Spell correction:
"苹果手击" (Apple hand strike) → Suggest "苹果手机" (Apple phone)

2.4 Detailed Explanation of Inaccurate Search and No Word Segmentation Issues

Problems with MySQL String Matching:

-- Search for "笔记本" (notebook)
SELECT * FROM products WHERE name LIKE '%笔记本%';
-- Result: Can find "笔记本电脑" (laptop computer)
-- Problem: Cannot find "笔记 本子" (notes notebook), "notebook", "手提电脑" (portable computer)

Elasticsearch Smart Tokenization Process:

Original text: "ThinkPad X1 Carbon超轻薄笔记本电脑" (ThinkPad X1 Carbon ultra-thin laptop computer)

Standard tokenizer:
[ThinkPad] [X1] [Carbon] [ultra-thin] [notebook] [computer]

IK tokenizer (Chinese):
[ThinkPad] [X1] [Carbon] [ultra] [thin] [ultra-thin]
[note] [book] [notebook] [computer] [laptop computer]

Pinyin tokenizer:
[si] [kao] [pad] → Can be searched by Pinyin

N-gram tokenization:
[Thi] [hin] [ink] [nkP] → Supports partial matching

3. What is Full-Text Search - Core Principle Analysis

3.1 Structured Data vs Unstructured Data

Structured Data (MySQL storage method):
┌──────┬────────┬────────┬────────┐
│  ID  │  Name  │ Price  │ Stock  │
├──────┼────────┼────────┼────────┤
│  1   │iPhone  │ 5999   │  100   │
│  2   │ Xiaomi │ 2999   │  200   │
└──────┴────────┴────────┴────────┘

Unstructured Data (Text content):
"This iPhone uses an A15 processor, offering powerful performance,
excellent camera effects, and 20% improved battery life.
User review: 'Amazing, great value for money!'"

3.2 Detailed Explanation of Inverted Index Principle

Forward Index (MySQL):

Document ID → Content
Doc1 → "Xiaomi Phone"
Doc2 → "Apple Phone"
Doc3 → "Xiaomi TV"

Inverted Index (Elasticsearch):

Term → Document List
"Xiaomi" → [Doc1, Doc3]
"Phone" → [Doc1, Doc2]
"Apple" → [Doc2]
"TV" → [Doc3]

Search "Xiaomi Phone":
1. Search "Xiaomi" → Get [Doc1, Doc3]
2. Search "Phone" → Get [Doc1, Doc2]
3. Calculate intersection → Doc1 (most relevant)

3.3 Detailed Structure of Inverted Index

Complete Inverted Index Structure:

Term: "phone"
├── Document Frequency (DF): 1000 documents contain this term
├── Inverted List:
│   ├── Doc1:
│   │   ├── Term Frequency (TF): 3 times
│   │   ├── Positions: [5, 28, 102]
│   │   └── Fields: [title, description]
│   ├── Doc2:
│   │   ├── Term Frequency (TF): 1 time
│   │   ├── Positions: [15]
│   │   └── Fields: [title]
│   └── ...
└── Statistics: Highest term frequency, average term frequency, etc.

4. Detailed Explanation of Elasticsearch Architecture

4.1 Cluster Architecture

Elasticsearch Cluster Architecture Diagram:

┌─────────────── ES Cluster ──────────────┐
│                                         │
│  ┌─────────────────────────────────┐   │
│  │     Master Node                 │   │
│  │  • Cluster management           │   │
│  │  • Index creation/deletion      │   │
│  │  • Shard allocation             │   │
│  └─────────────────────────────────┘   │
│                                         │
│  ┌──────────┐  ┌──────────┐           │
│  │ Data     │  │ Data     │           │
│  │ Node 1   │  │ Node 2   │           │
│  │ ┌──────┐ │  │ ┌──────┐ │           │
│  │ │ P0   │ │  │ │ R0   │ │           │
│  │ ├──────┤ │  │ ├──────┤ │           │
│  │ │ R1   │ │  │ │ P1   │ │           │
│  │ └──────┘ │  │ └──────┘ │           │
│  └──────────┘  └──────────┘           │
│                                         │
│  P = Primary Shard                     │
│  R = Replica Shard                     │
└─────────────────────────────────────────┘

4.2 Data Write Process

Detailed Write Process:

Client → Coordinating Node → Primary Shard → Replica Shard

1. Client sends write request
   ↓
2. Coordinating node determines shard via hash routing
   ↓
3. Request forwarded to primary shard node
   ↓
4. Primary shard writes successfully
   ↓
5. Replicated to replica shards in parallel
   ↓
6. All replicas acknowledge
   ↓
7. Returns success response to client

Timeline:
T0 ──→ T1 ──→ T2 ──→ T3 ──→ T4
Receive   Route   Primary Shard  Replica   Respond

4.3 Query Process

Query Execution Process:

Phase 1: Query
┌─────────────────────────────────┐
│ Coordinating node sends query requests to all shards   │
│ Each shard returns Top N document IDs and scores  │
└─────────────────────────────────┘
           ↓
Phase 2: Fetch
┌─────────────────────────────────┐
│ Coordinating node consolidates and sorts all results       │
│ Retrieves the complete content of the final required documents       │
└─────────────────────────────────┘

5. Detailed Explanation of Elasticsearch Core Features

5.1 Detailed Explanation of Query Types

// 1. Match Query - Full-text search
{
  "query": {
    "match": {
      "title": {
        "query": "苹果手机",
        "operator": "and"  // Must contain all terms
      }
    }
  }
}

// 2. Term Query - Exact match
{
  "query": {
    "term": {
      "category.keyword": "手机"  // No tokenization, exact match
    }
  }
}

// 3. Range Query - Range search
{
  "query": {
    "range": {
      "price": {
        "gte": 1000,
        "lte": 5000
      }
    }
  }
}

// 4. Bool Compound Query
{
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "手机"}}
      ],
      "filter": [
        {"range": {"price": {"lte": 5000}}}
      ],
      "should": [
        {"match": {"brand": "苹果"}}  // Bonus item
      ],
      "must_not": [
        {"term": {"status": "discontinued"}}
      ]
    }
  }
}

5.2 Aggregation Analysis Function

// Sales data analysis example
{
  "aggs": {
    "sales_per_category": {
      "terms": {
        "field": "category"
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        },
        "total_sales": {
          "sum": {
            "field": "sales_count"
          }
        },
        "price_ranges": {
          "range": {
            "field": "price",
            "ranges": [
              {"to": 1000},
              {"from": 1000, "to": 5000},
              {"from": 5000}
            ]
          }
        }
      }
    }
  }
}

6. Real-world Application Case Studies

6.1 E-commerce Search Optimization Case Study

Comparison of an e-commerce platform's search before and after optimization:

Metric	MySQL Solution	Elasticsearch Solution	Improvement Effect
Average Search Time	2.3 seconds	0.05 seconds	46x improvement
Search Accuracy	65%	92%	27% improvement
Zero Result Rate	18%	3%	15% reduction
Number of Servers	8 units	3 units	62.5% cost savings
Concurrency Capability	100 QPS	5000 QPS	50x improvement

Implementation Details:

Data Synchronization Architecture:
MySQL(Primary Data) → Binlog → Logstash → Elasticsearch ↓ Scheduled full synchronization (nightly)
Search Optimization Strategies:
Pinyin search: Supports searching "pinguo" to find "Apple"
Synonyms: Configure "手机" (phone), "电话" (telephone), "mobile" as synonyms
Search suggestions: Real-time prompts for possible search terms
Correction function: Automatically corrects common spelling errors

6.2 Log Analysis System Case Study

Log analysis system of an internet company:

Log Processing Flow:

Application Server → Filebeat → Logstash → Elasticsearch → Kibana
     ↓           ↓          ↓            ↓            ↓
   Generate logs    Collect      Process & Transform      Store & Index     Visualize & Display

Processing Scale:
• Log volume: 100GB per day
• Number of log entries: 1 billion entries/day
• Query response: Millisecond level
• Retention period: 30 days hot data, 1 year cold data

7. Performance Optimization Best Practices

7.1 Index Design Optimization

// Optimized Mapping Design
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart",
        "fields": {
          "keyword": {
            "type": "keyword"  // Supports exact matching
          },
          "pinyin": {
            "type": "text",
            "analyzer": "pinyin"  // Supports Pinyin search
          }
        }
      },
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100  // Price precision optimization
      },
      "category": {
        "type": "keyword"  // Category does not require tokenization
      },
      "description": {
        "type": "text",
        "analyzer": "ik_smart"
      },
      "created_time": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||epoch_millis"
      }
    }
  }
}

7.2 Query Performance Optimization Techniques

Use Filter instead of Query (when scoring is not needed)
```json
// Before optimization: using query (calculates score)
{"query": {"term": {"status": "active"}}}

// After optimization: using filter (does not calculate score, can be cached)
{"query": {"bool": {"filter": {"term": {"status": "active"}}}}}
```

Reasonably set the number of shards
```
Shard count reference formula:
Number of shards = Data volume (GB) / 30GB

Example:
- 100GB data: 3-4 shards
- 1TB data: 35-40 shards
```

Batch operation optimization
json // Use bulk API for batch indexing POST _bulk {"index": {"_index": "products", "_id": 1}} {"name": "iPhone", "price": 5999} {"index": {"_index": "products", "_id": 2}} {"name": "Xiaomi", "price": 2999}

8. Elasticsearch vs Traditional Databases

8.1 Comparison of Applicable Scenarios

Scenario	MySQL	Elasticsearch	Recommended Choice
Full-text Search	❌ Poor	✅ Excellent	ES
Transaction Support	✅ Full ACID	❌ No Transactions	MySQL
Real-time Statistical Analysis	⚠️ Fair	✅ Excellent	ES
Relational Queries	✅ Excellent	❌ Limited	MySQL
Geolocation Search	❌ Poor	✅ Excellent	ES
Log Analysis	❌ Not suitable	✅ Specialized	ES
Precise Numerical Calculations	✅ Precise	⚠️ Approximate	MySQL

8.2 Hybrid Architecture Solution

Recommended Hybrid Architecture:

        User Request
           ↓
    ┌──────────────┐
    │ Application Layer     │
    └──────────────┘
           ↓
    ┌──────────────────────────┐
    │      Search Request  → ES       │
    │      Transactional Operations  → MySQL    │
    │      Cache     → Redis    │
    └──────────────────────────┘

Data Synchronization:
MySQL(Write) → Binlog → Canal/Debezium → Kafka → ES(Read)

9. Common Problems and Solutions

9.1 Data Consistency Issues

Problem: MySQL and ES data inconsistency

Solutions:
1. Dual-write strategy: Write to both MySQL and ES simultaneously, using a message queue to ensure eventual consistency
2. CDC (Change Data Capture): Real-time synchronization via Binlog
3. Regular verification: Scheduled tasks compare data differences and fix them

9.2 Deep Paging Issues

Problem: Extremely poor performance when querying data on the 10,000th page

Solutions:

// 1. Use search_after (recommended)
{
  "size": 10,
  "sort": [{"_id": "asc"}],
  "search_after": [10000]  // Sort value of the last document on the previous page
}

// 2. Use scroll API (suitable for export)
POST /products/_search?scroll=1m
{
  "size": 100,
  "query": {"match_all": {}}
}

10. Summary

Elasticsearch perfectly solves various problems faced by traditional databases in search scenarios through its inverted index, distributed architecture, and powerful full-text search capabilities. Proper use of Elasticsearch can:

Improve search performance: From seconds to milliseconds
Improve search quality: Through relevance scoring and smart tokenization
Support complex analysis: Real-time aggregation and statistical analysis
Reduce operational costs: Fewer servers, higher efficiency

However, it is important to note that Elasticsearch is not a replacement for MySQL, but a supplement. In actual projects, the appropriate storage solution should be chosen based on specific scenarios, and a hybrid architecture of MySQL+Elasticsearch can usually leverage their respective strengths.

主题测试文章，只做测试使用。发布者：Walker，转转请注明出处：https://walker-learn.xyz/archives/6775

Go Engineering Comprehensive Course 009