Table of Contents

Other Features

Personal Center
Favorites
Manage Shipping Addresses (Add, Delete, Modify, Query)
Messages

Copy inventory_srv--> userop_srv and replace all inventory

Elasticsearch In-depth Analysis Document

1. What is Elasticsearch

Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene, capable of rapidly storing, searching, and analyzing massive amounts of data. It is a core component of the Elastic Stack (formerly ELK Stack).

2. Problems Faced by MySQL Search - In-depth Analysis

2.1 Detailed Explanation of Low Performance Issues

Problem Phenomenon:

-- 当数据量达到100万条时，以下查询可能需要数秒
SELECT * FROM products WHERE name LIKE '%手机%' OR description LIKE '%手机%';

Performance Comparison Data:

Data Volume	MySQL LIKE Query	Elasticsearch Full-Text Search	Performance Improvement
10,000 records	50ms	10ms	5x
100,000 records	500ms	15ms	33x
1 million records	5000ms	20ms	250x
10 million records	50000ms+	30ms	1600x+

Root Causes:

Full Table Scan: LIKE '%keyword%' cannot use B+ tree indexes, requiring a scan of all rows.
I/O Intensive: Each query needs to read a large amount of data from disk.
CPU Intensive: String matching operations are performed on every row of data.
Memory Pressure: A large amount of data is loaded into memory for processing.

Real-world Case:

An e-commerce platform's product table has 5 million records. Using MySQL fuzzy search for "Apple phone":
- Query time: 8.3 seconds
- CPU usage: Spiked to 85%
- With 10 concurrent queries, response time increased to over 30 seconds

2.2 Detailed Explanation of No Relevance Ranking Issues

Pain Points of MySQL Query Results:

-- MySQL只能按固定规则排序
SELECT * FROM products
WHERE name LIKE '%手机%'
ORDER BY price DESC;  -- 只能按价格、时间等字段排序

Elasticsearch's Relevance Scoring Mechanism:

搜索词："小米手机"

相关性评分计算：
┌─────────────────────────────────────┐
│ 文档1："小米手机12 Pro"              │
│ • 词频(TF)：2个关键词都出现           │
│ • 逆文档频率(IDF)：计算词的稀有度      │
│ • 字段长度：标题较短，权重更高         │
│ • 评分：9.8                         │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ 文档2："这是一款性价比很高的手机"      │
│ • 词频(TF)：只有"手机"出现            │
│ • 逆文档频率(IDF)："手机"较常见        │
│ • 字段长度：描述较长，权重降低         │
│ • 评分：3.2                         │
└─────────────────────────────────────┘

Detailed Explanation of Relevance Factors:

TF (Term Frequency): The frequency of a keyword appearing in a document.
IDF (Inverse Document Frequency): The rarity of a keyword across all documents.
Field Length Normalization: Matches in shorter fields have higher weight than in longer fields.
Field Weight Boost: Allows setting titles to be more important than content.
Query Time Weight: Allows specifying certain query terms as more important.

2.3 Detailed Explanation of Inability to Full-Text Search Issues

Limitations of MySQL Full-Text Indexing:

-- MySQL全文索引创建
ALTER TABLE products ADD FULLTEXT(name, description);

-- 问题1：最小词长限制（默认4个字符）
-- "手机" 可以搜索，但 "机" 搜不到

-- 问题2：中文分词支持差
-- "苹果手机" 被当作一个整体，搜索"苹果"找不到

Elasticsearch Full-Text Search Capabilities:

// ES的分析过程示例
输入文本："我想买一台苹果手机"

分词结果：
[我] [想] [买] [一台] [苹果] [手机] [苹果手机]

同义词扩展：
[苹果] → [Apple, iPhone]
[手机] → [手机, 电话, mobile]

拼写纠错：
"苹果手击" → 建议 "苹果手机"

2.4 Detailed Explanation of Inaccurate Search and No Word Segmentation Issues

Problems with MySQL String Matching:

-- 搜索"笔记本"
SELECT * FROM products WHERE name LIKE '%笔记本%';
-- 结果：能找到"笔记本电脑"
-- 问题：找不到"笔记 本子"、"notebook"、"手提电脑"

Elasticsearch Smart Word Segmentation Process:

原始文本："ThinkPad X1 Carbon超轻薄笔记本电脑"

标准分词器：
[ThinkPad] [X1] [Carbon] [超轻薄] [笔记本] [电脑]

IK分词器（中文）：
[ThinkPad] [X1] [Carbon] [超] [轻薄] [超轻薄]
[笔记] [本] [笔记本] [电脑] [笔记本电脑]

拼音分词器：
[si] [kao] [pad] → 可以通过拼音搜索

N-gram分词：
[Thi] [hin] [ink] [nkP] → 支持部分匹配

3. What is Full-Text Search - Core Principle Analysis

3.1 Structured Data vs Unstructured Data

结构化数据（MySQL存储方式）：
┌──────┬────────┬────────┬────────┐
│  ID  │  Name  │ Price  │ Stock  │
├──────┼────────┼────────┼────────┤
│  1   │iPhone  │ 5999   │  100   │
│  2   │ 小米   │ 2999   │  200   │
└──────┴────────┴────────┴────────┘

非结构化数据（文本内容）：
"这款iPhone手机采用A15处理器，性能强劲，
拍照效果出色，续航能力提升20%，
用户评价：'太棒了，物超所值！'"

3.2 Detailed Explanation of Inverted Index Principle

Forward Index (MySQL):

文档ID → 内容
Doc1 → "小米手机"
Doc2 → "苹果手机"
Doc3 → "小米电视"

Inverted Index (Elasticsearch):

词项 → 文档列表
"小米" → [Doc1, Doc3]
"手机" → [Doc1, Doc2]
"苹果" → [Doc2]
"电视" → [Doc3]

搜索"小米手机"：
1. 查找"小米" → 得到 [Doc1, Doc3]
2. 查找"手机" → 得到 [Doc1, Doc2]
3. 计算交集 → Doc1（最相关）

3.3 Detailed Structure of Inverted Index

完整的倒排索引结构：

词项："手机"
├── 文档频率(DF)：1000个文档包含此词
├── 倒排列表：
│   ├── Doc1：
│   │   ├── 词频(TF)：3次
│   │   ├── 位置：[5, 28, 102]
│   │   └── 字段：[title, description]
│   ├── Doc2：
│   │   ├── 词频(TF)：1次
│   │   ├── 位置：[15]
│   │   └── 字段：[title]
│   └── ...
└── 统计信息：最高词频、平均词频等

4. Elasticsearch Architecture Explained

4.1 Cluster Architecture

Elasticsearch集群架构图：

┌─────────────── ES Cluster ──────────────┐
│                                         │
│  ┌─────────────────────────────────┐   │
│  │     Master Node (主节点)         │   │
│  │  • 集群管理                      │   │
│  │  • 索引创建/删除                 │   │
│  │  • 分片分配                      │   │
│  └─────────────────────────────────┘   │
│                                         │
│  ┌──────────┐  ┌──────────┐           │
│  │ Data     │  │ Data     │           │
│  │ Node 1   │  │ Node 2   │           │
│  │ ┌──────┐ │  │ ┌──────┐ │           │
│  │ │ P0   │ │  │ │ R0   │ │           │
│  │ ├──────┤ │  │ ├──────┤ │           │
│  │ │ R1   │ │  │ │ P1   │ │           │
│  │ └──────┘ │  │ └──────┘ │           │
│  └──────────┘  └──────────┘           │
│                                         │
│  P = Primary Shard (主分片)            │
│  R = Replica Shard (副本分片)          │
└─────────────────────────────────────────┘

4.2 Data Write Process

写入流程详解：

客户端 → 协调节点 → 主分片 → 副本分片

1. 客户端发送写请求
   ↓
2. 协调节点通过hash路由确定分片
   ↓
3. 请求转发到主分片节点
   ↓
4. 主分片写入成功
   ↓
5. 并行复制到副本分片
   ↓
6. 所有副本确认
   ↓
7. 返回成功响应给客户端

时间线：
T0 ──→ T1 ──→ T2 ──→ T3 ──→ T4
接收   路由   主分片  副本   响应

4.3 Query Process

查询执行过程：

Phase 1: Query（查询阶段）
┌─────────────────────────────────┐
│ 协调节点向所有分片发送查询请求   │
│ 每个分片返回Top N的文档ID和分数  │
└─────────────────────────────────┘
           ↓
Phase 2: Fetch（获取阶段）
┌─────────────────────────────────┐
│ 协调节点整合所有结果并排序       │
│ 获取最终需要的文档完整内容       │
└─────────────────────────────────┘

5. Elasticsearch Core Features Explained

5.1 Query Types Explained

// 1. Match查询 - 全文搜索
{
  "query": {
    "match": {
      "title": {
        "query": "苹果手机",
        "operator": "and"  // 必须包含所有词
      }
    }
  }
}

// 2. Term查询 - 精确匹配
{
  "query": {
    "term": {
      "category.keyword": "手机"  // 不分词，精确匹配
    }
  }
}

// 3. Range查询 - 范围查询
{
  "query": {
    "range": {
      "price": {
        "gte": 1000,
        "lte": 5000
      }
    }
  }
}

// 4. Bool复合查询
{
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "手机"}}
      ],
      "filter": [
        {"range": {"price": {"lte": 5000}}}
      ],
      "should": [
        {"match": {"brand": "苹果"}}  // 加分项
      ],
      "must_not": [
        {"term": {"status": "discontinued"}}
      ]
    }
  }
}

5.2 Aggregation Analysis Functionality

// 销售数据分析示例
{
  "aggs": {
    "sales_per_category": {
      "terms": {
        "field": "category"
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        },
        "total_sales": {
          "sum": {
            "field": "sales_count"
          }
        },
        "price_ranges": {
          "range": {
            "field": "price",
            "ranges": [
              {"to": 1000},
              {"from": 1000, "to": 5000},
              {"from": 5000}
            ]
          }
        }
      }
    }
  }
}

6. Practical Application Case Studies

6.1 E-commerce Search Optimization Case Study

Comparison of an E-commerce Platform's Search Optimization Before and After:

Metric	MySQL Solution	Elasticsearch Solution	Improvement
Average Search Time	2.3 seconds	0.05 seconds	46x improvement
Search Accuracy	65%	92%	27% increase
Zero Result Rate	18%	3%	15% decrease
Number of Servers	8	3	62.5% cost savings
Concurrency Capability	100 QPS	5000 QPS	50x improvement

Implementation Details:

Data Synchronization Architecture:
MySQL(Primary Data) → Binlog → Logstash → Elasticsearch ↓ Scheduled Full Synchronization (Nightly)
Search Optimization Strategies:
Pinyin Search: Supports searching for "pinguo" to find "苹果" (Apple)
Synonyms: Configured "手机" (phone), "电话" (telephone), "mobile" as synonyms
Search Suggestions: Real-time suggestions for possible search terms
Correction Function: Automatically corrects common spelling errors

6.2 Log Analysis System Case Study

Log Analysis System of an Internet Company:

日志处理流程：

应用服务器 → Filebeat → Logstash → Elasticsearch → Kibana
     ↓           ↓          ↓            ↓            ↓
   产生日志    收集      处理转换      存储索引     可视化展示

处理规模：
• 日志量：每天100GB
• 日志条数：10亿条/天
• 查询响应：毫秒级
• 保存周期：30天热数据，1年冷数据

7. Performance Optimization Best Practices

7.1 Index Design Optimization

// 优化的Mapping设计
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart",
        "fields": {
          "keyword": {
            "type": "keyword"  // 支持精确匹配
          },
          "pinyin": {
            "type": "text",
            "analyzer": "pinyin"  // 支持拼音搜索
          }
        }
      },
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100  // 价格精度优化
      },
      "category": {
        "type": "keyword"  // 分类不需要分词
      },
      "description": {
        "type": "text",
        "analyzer": "ik_smart"
      },
      "created_time": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||epoch_millis"
      }
    }
  }
}

7.2 Query Performance Optimization Techniques

Use Filter Instead of Query (when scoring is not needed)
```json
// Before optimization: using query (calculates score)
{"query": {"term": {"status": "active"}}}

// After optimization: using filter (does not calculate score, can be cached)
{"query": {"bool": {"filter": {"term": {"status": "active"}}}}}
<ol> <li><strong>Set Shard Count Appropriately</strong>
Shard count formula reference:
Shard count = Data size (GB) / 30GB

Example:
- 100GB data: 3-4 shards
- 1TB data: 35-40 shards
```

Batch Operation Optimization
json // Use bulk API for batch indexing POST _bulk {"index": {"_index": "products", "_id": 1}} {"name": "iPhone", "price": 5999} {"index": {"_index": "products", "_id": 2}} {"name": "小米", "price": 2999}

8. Elasticsearch vs Traditional Databases

8.1 Comparison of Applicable Scenarios

Scenario	MySQL	Elasticsearch	Recommended Choice
Full-Text Search	❌ Poor	✅ Excellent	ES
Transaction Support	✅ Full ACID	❌ No Transactions	MySQL
Real-time Statistical Analysis	⚠️ Average	✅ Excellent	ES
Relational Queries	✅ Excellent	❌ Limited	MySQL
Geospatial Search	❌ Poor	✅ Excellent	ES
Log Analysis	❌ Unsuitable	✅ Specialty	ES
Precise Numerical Calculation	✅ Precise	⚠️ Approximate	MySQL

8.2 Hybrid Architecture Solution

推荐的混合架构：

        用户请求
           ↓
    ┌──────────────┐
    │   应用层     │
    └──────────────┘
           ↓
    ┌──────────────────────────┐
    │      搜索请求  → ES       │
    │      事务操作  → MySQL    │
    │      缓存     → Redis    │
    └──────────────────────────┘

数据同步：
MySQL(写) → Binlog → Canal/Debezium → Kafka → ES(读)

9. Common Problems and Solutions

9.1 Data Consistency Issues

Problem: Inconsistency between MySQL and ES data.

Solutions:
1. Dual-write Strategy: Write to both MySQL and ES simultaneously, using a message queue to ensure eventual consistency.
2. CDC (Change Data Capture): Real-time synchronization via Binlog.
3. Regular Verification: Scheduled tasks to compare data differences and fix them.

9.2 Deep Paging Issues

Problem: Extremely poor performance when querying data on the 10,000th page.

Solutions:

// 1. Use search_after (recommended)
{
  "size": 10,
  "sort": [{"_id": "asc"}],
  "search_after": [10000]  // The sort value of the last document on the previous page
}

// 2. Use scroll API (suitable for exporting)
POST /products/_search?scroll=1m
{
  "size": 100,
  "query": {"match_all": {}}
}

10. Summary

Elasticsearch perfectly solves various problems faced by traditional databases in search scenarios through its inverted index, distributed architecture, and powerful full-text search capabilities. Proper use of Elasticsearch can:

Improve search performance: From seconds to milliseconds.
Enhance search quality: Through relevance scoring and smart word segmentation.
Support complex analysis: Real-time aggregation and statistical analysis.
Reduce operational costs: Fewer servers, higher efficiency.

However, it is important to note that Elasticsearch is not a replacement for MySQL, but rather a complement. In actual projects, the appropriate storage solution should be chosen based on specific scenarios, and a hybrid architecture of MySQL + Elasticsearch can usually leverage their respective advantages.

主题测试文章，只做测试使用。发布者：Walker，转转请注明出处：https://walker-learn.xyz/archives/4782

Go Engineer Comprehensive Course 009 [Study Notes]