Other Features
- Personal Center
- Favorites
- Manage Shipping Addresses (Add, Delete, Modify, Query)
- Messages
Copy
inventory_srv-->userop_srvand replace allinventory
Elasticsearch In-depth Analysis Document
1. What is Elasticsearch
Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene, capable of rapidly storing, searching, and analyzing massive amounts of data. It is a core component of the Elastic Stack (formerly ELK Stack).
2. Problems Faced by MySQL Search - In-depth Analysis
2.1 Detailed Explanation of Low Performance Issues
Problem Phenomenon:
-- 当数据量达到100万条时,以下查询可能需要数秒
SELECT * FROM products WHERE name LIKE '%手机%' OR description LIKE '%手机%';
Performance Comparison Data:
| Data Volume | MySQL LIKE Query | Elasticsearch Full-Text Search | Performance Improvement |
|---|---|---|---|
| 10,000 records | 50ms | 10ms | 5x |
| 100,000 records | 500ms | 15ms | 33x |
| 1 million records | 5000ms | 20ms | 250x |
| 10 million records | 50000ms+ | 30ms | 1600x+ |
Root Causes:
- Full Table Scan: LIKE '%keyword%' cannot use B+ tree indexes, requiring a scan of all rows.
- I/O Intensive: Each query needs to read a large amount of data from disk.
- CPU Intensive: String matching operations are performed on every row of data.
- Memory Pressure: A large amount of data is loaded into memory for processing.
Real-world Case:
An e-commerce platform's product table has 5 million records. Using MySQL fuzzy search for "Apple phone":
- Query time: 8.3 seconds
- CPU usage: Spiked to 85%
- With 10 concurrent queries, response time increased to over 30 seconds
2.2 Detailed Explanation of No Relevance Ranking Issues
Pain Points of MySQL Query Results:
-- MySQL只能按固定规则排序
SELECT * FROM products
WHERE name LIKE '%手机%'
ORDER BY price DESC; -- 只能按价格、时间等字段排序
Elasticsearch's Relevance Scoring Mechanism:
搜索词:"小米手机"
相关性评分计算:
┌─────────────────────────────────────┐
│ 文档1:"小米手机12 Pro" │
│ • 词频(TF):2个关键词都出现 │
│ • 逆文档频率(IDF):计算词的稀有度 │
│ • 字段长度:标题较短,权重更高 │
│ • 评分:9.8 │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ 文档2:"这是一款性价比很高的手机" │
│ • 词频(TF):只有"手机"出现 │
│ • 逆文档频率(IDF):"手机"较常见 │
│ • 字段长度:描述较长,权重降低 │
│ • 评分:3.2 │
└─────────────────────────────────────┘
Detailed Explanation of Relevance Factors:
- TF (Term Frequency): The frequency of a keyword appearing in a document.
- IDF (Inverse Document Frequency): The rarity of a keyword across all documents.
- Field Length Normalization: Matches in shorter fields have higher weight than in longer fields.
- Field Weight Boost: Allows setting titles to be more important than content.
- Query Time Weight: Allows specifying certain query terms as more important.
2.3 Detailed Explanation of Inability to Full-Text Search Issues
Limitations of MySQL Full-Text Indexing:
-- MySQL全文索引创建
ALTER TABLE products ADD FULLTEXT(name, description);
-- 问题1:最小词长限制(默认4个字符)
-- "手机" 可以搜索,但 "机" 搜不到
-- 问题2:中文分词支持差
-- "苹果手机" 被当作一个整体,搜索"苹果"找不到
Elasticsearch Full-Text Search Capabilities:
// ES的分析过程示例
输入文本:"我想买一台苹果手机"
分词结果:
[我] [想] [买] [一台] [苹果] [手机] [苹果手机]
同义词扩展:
[苹果] → [Apple, iPhone]
[手机] → [手机, 电话, mobile]
拼写纠错:
"苹果手击" → 建议 "苹果手机"
2.4 Detailed Explanation of Inaccurate Search and No Word Segmentation Issues
Problems with MySQL String Matching:
-- 搜索"笔记本"
SELECT * FROM products WHERE name LIKE '%笔记本%';
-- 结果:能找到"笔记本电脑"
-- 问题:找不到"笔记 本子"、"notebook"、"手提电脑"
Elasticsearch Smart Word Segmentation Process:
原始文本:"ThinkPad X1 Carbon超轻薄笔记本电脑"
标准分词器:
[ThinkPad] [X1] [Carbon] [超轻薄] [笔记本] [电脑]
IK分词器(中文):
[ThinkPad] [X1] [Carbon] [超] [轻薄] [超轻薄]
[笔记] [本] [笔记本] [电脑] [笔记本电脑]
拼音分词器:
[si] [kao] [pad] → 可以通过拼音搜索
N-gram分词:
[Thi] [hin] [ink] [nkP] → 支持部分匹配
3. What is Full-Text Search - Core Principle Analysis
3.1 Structured Data vs Unstructured Data
结构化数据(MySQL存储方式):
┌──────┬────────┬────────┬────────┐
│ ID │ Name │ Price │ Stock │
├──────┼────────┼────────┼────────┤
│ 1 │iPhone │ 5999 │ 100 │
│ 2 │ 小米 │ 2999 │ 200 │
└──────┴────────┴────────┴────────┘
非结构化数据(文本内容):
"这款iPhone手机采用A15处理器,性能强劲,
拍照效果出色,续航能力提升20%,
用户评价:'太棒了,物超所值!'"
3.2 Detailed Explanation of Inverted Index Principle
Forward Index (MySQL):
文档ID → 内容
Doc1 → "小米手机"
Doc2 → "苹果手机"
Doc3 → "小米电视"
Inverted Index (Elasticsearch):
词项 → 文档列表
"小米" → [Doc1, Doc3]
"手机" → [Doc1, Doc2]
"苹果" → [Doc2]
"电视" → [Doc3]
搜索"小米手机":
1. 查找"小米" → 得到 [Doc1, Doc3]
2. 查找"手机" → 得到 [Doc1, Doc2]
3. 计算交集 → Doc1(最相关)
3.3 Detailed Structure of Inverted Index
完整的倒排索引结构:
词项:"手机"
├── 文档频率(DF):1000个文档包含此词
├── 倒排列表:
│ ├── Doc1:
│ │ ├── 词频(TF):3次
│ │ ├── 位置:[5, 28, 102]
│ │ └── 字段:[title, description]
│ ├── Doc2:
│ │ ├── 词频(TF):1次
│ │ ├── 位置:[15]
│ │ └── 字段:[title]
│ └── ...
└── 统计信息:最高词频、平均词频等
4. Elasticsearch Architecture Explained
4.1 Cluster Architecture
Elasticsearch集群架构图:
┌─────────────── ES Cluster ──────────────┐
│ │
│ ┌─────────────────────────────────┐ │
│ │ Master Node (主节点) │ │
│ │ • 集群管理 │ │
│ │ • 索引创建/删除 │ │
│ │ • 分片分配 │ │
│ └─────────────────────────────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ │
│ │ Data │ │ Data │ │
│ │ Node 1 │ │ Node 2 │ │
│ │ ┌──────┐ │ │ ┌──────┐ │ │
│ │ │ P0 │ │ │ │ R0 │ │ │
│ │ ├──────┤ │ │ ├──────┤ │ │
│ │ │ R1 │ │ │ │ P1 │ │ │
│ │ └──────┘ │ │ └──────┘ │ │
│ └──────────┘ └──────────┘ │
│ │
│ P = Primary Shard (主分片) │
│ R = Replica Shard (副本分片) │
└─────────────────────────────────────────┘
4.2 Data Write Process
写入流程详解:
客户端 → 协调节点 → 主分片 → 副本分片
1. 客户端发送写请求
↓
2. 协调节点通过hash路由确定分片
↓
3. 请求转发到主分片节点
↓
4. 主分片写入成功
↓
5. 并行复制到副本分片
↓
6. 所有副本确认
↓
7. 返回成功响应给客户端
时间线:
T0 ──→ T1 ──→ T2 ──→ T3 ──→ T4
接收 路由 主分片 副本 响应
4.3 Query Process
查询执行过程:
Phase 1: Query(查询阶段)
┌─────────────────────────────────┐
│ 协调节点向所有分片发送查询请求 │
│ 每个分片返回Top N的文档ID和分数 │
└─────────────────────────────────┘
↓
Phase 2: Fetch(获取阶段)
┌─────────────────────────────────┐
│ 协调节点整合所有结果并排序 │
│ 获取最终需要的文档完整内容 │
└─────────────────────────────────┘
5. Elasticsearch Core Features Explained
5.1 Query Types Explained
// 1. Match查询 - 全文搜索
{
"query": {
"match": {
"title": {
"query": "苹果手机",
"operator": "and" // 必须包含所有词
}
}
}
}
// 2. Term查询 - 精确匹配
{
"query": {
"term": {
"category.keyword": "手机" // 不分词,精确匹配
}
}
}
// 3. Range查询 - 范围查询
{
"query": {
"range": {
"price": {
"gte": 1000,
"lte": 5000
}
}
}
}
// 4. Bool复合查询
{
"query": {
"bool": {
"must": [
{"match": {"title": "手机"}}
],
"filter": [
{"range": {"price": {"lte": 5000}}}
],
"should": [
{"match": {"brand": "苹果"}} // 加分项
],
"must_not": [
{"term": {"status": "discontinued"}}
]
}
}
}
5.2 Aggregation Analysis Functionality
// 销售数据分析示例
{
"aggs": {
"sales_per_category": {
"terms": {
"field": "category"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
},
"total_sales": {
"sum": {
"field": "sales_count"
}
},
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{"to": 1000},
{"from": 1000, "to": 5000},
{"from": 5000}
]
}
}
}
}
}
}
6. Practical Application Case Studies
6.1 E-commerce Search Optimization Case Study
Comparison of an E-commerce Platform's Search Optimization Before and After:
| Metric | MySQL Solution | Elasticsearch Solution | Improvement |
|---|---|---|---|
| Average Search Time | 2.3 seconds | 0.05 seconds | 46x improvement |
| Search Accuracy | 65% | 92% | 27% increase |
| Zero Result Rate | 18% | 3% | 15% decrease |
| Number of Servers | 8 | 3 | 62.5% cost savings |
| Concurrency Capability | 100 QPS | 5000 QPS | 50x improvement |
Implementation Details:
-
Data Synchronization Architecture:
MySQL(Primary Data) → Binlog → Logstash → Elasticsearch
↓
Scheduled Full Synchronization (Nightly) - Search Optimization Strategies:
- Pinyin Search: Supports searching for "pinguo" to find "苹果" (Apple)
- Synonyms: Configured "手机" (phone), "电话" (telephone), "mobile" as synonyms
- Search Suggestions: Real-time suggestions for possible search terms
- Correction Function: Automatically corrects common spelling errors
6.2 Log Analysis System Case Study
Log Analysis System of an Internet Company:
日志处理流程:
应用服务器 → Filebeat → Logstash → Elasticsearch → Kibana
↓ ↓ ↓ ↓ ↓
产生日志 收集 处理转换 存储索引 可视化展示
处理规模:
• 日志量:每天100GB
• 日志条数:10亿条/天
• 查询响应:毫秒级
• 保存周期:30天热数据,1年冷数据
7. Performance Optimization Best Practices
7.1 Index Design Optimization
// 优化的Mapping设计
{
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart",
"fields": {
"keyword": {
"type": "keyword" // 支持精确匹配
},
"pinyin": {
"type": "text",
"analyzer": "pinyin" // 支持拼音搜索
}
}
},
"price": {
"type": "scaled_float",
"scaling_factor": 100 // 价格精度优化
},
"category": {
"type": "keyword" // 分类不需要分词
},
"description": {
"type": "text",
"analyzer": "ik_smart"
},
"created_time": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||epoch_millis"
}
}
}
}
7.2 Query Performance Optimization Techniques
- Use Filter Instead of Query (when scoring is not needed)
```json
// Before optimization: using query (calculates score)
{"query": {"term": {"status": "active"}}}
// After optimization: using filter (does not calculate score, can be cached)
{"query": {"bool": {"filter": {"term": {"status": "active"}}}}}
<ol>
<li><strong>Set Shard Count Appropriately</strong>
Shard count formula reference:
Shard count = Data size (GB) / 30GB
Example:
- 100GB data: 3-4 shards
- 1TB data: 35-40 shards
```
- Batch Operation Optimization
json
// Use bulk API for batch indexing
POST _bulk
{"index": {"_index": "products", "_id": 1}}
{"name": "iPhone", "price": 5999}
{"index": {"_index": "products", "_id": 2}}
{"name": "小米", "price": 2999}
8. Elasticsearch vs Traditional Databases
8.1 Comparison of Applicable Scenarios
| Scenario | MySQL | Elasticsearch | Recommended Choice |
|---|---|---|---|
| Full-Text Search | ❌ Poor | ✅ Excellent | ES |
| Transaction Support | ✅ Full ACID | ❌ No Transactions | MySQL |
| Real-time Statistical Analysis | ⚠️ Average | ✅ Excellent | ES |
| Relational Queries | ✅ Excellent | ❌ Limited | MySQL |
| Geospatial Search | ❌ Poor | ✅ Excellent | ES |
| Log Analysis | ❌ Unsuitable | ✅ Specialty | ES |
| Precise Numerical Calculation | ✅ Precise | ⚠️ Approximate | MySQL |
8.2 Hybrid Architecture Solution
推荐的混合架构:
用户请求
↓
┌──────────────┐
│ 应用层 │
└──────────────┘
↓
┌──────────────────────────┐
│ 搜索请求 → ES │
│ 事务操作 → MySQL │
│ 缓存 → Redis │
└──────────────────────────┘
数据同步:
MySQL(写) → Binlog → Canal/Debezium → Kafka → ES(读)
9. Common Problems and Solutions
9.1 Data Consistency Issues
Problem: Inconsistency between MySQL and ES data.
Solutions:
1. Dual-write Strategy: Write to both MySQL and ES simultaneously, using a message queue to ensure eventual consistency.
2. CDC (Change Data Capture): Real-time synchronization via Binlog.
3. Regular Verification: Scheduled tasks to compare data differences and fix them.
9.2 Deep Paging Issues
Problem: Extremely poor performance when querying data on the 10,000th page.
Solutions:
// 1. Use search_after (recommended)
{
"size": 10,
"sort": [{"_id": "asc"}],
"search_after": [10000] // The sort value of the last document on the previous page
}
// 2. Use scroll API (suitable for exporting)
POST /products/_search?scroll=1m
{
"size": 100,
"query": {"match_all": {}}
}
10. Summary
Elasticsearch perfectly solves various problems faced by traditional databases in search scenarios through its inverted index, distributed architecture, and powerful full-text search capabilities. Proper use of Elasticsearch can:
- Improve search performance: From seconds to milliseconds.
- Enhance search quality: Through relevance scoring and smart word segmentation.
- Support complex analysis: Real-time aggregation and statistical analysis.
- Reduce operational costs: Fewer servers, higher efficiency.
However, it is important to note that Elasticsearch is not a replacement for MySQL, but rather a complement. In actual projects, the appropriate storage solution should be chosen based on specific scenarios, and a hybrid architecture of MySQL + Elasticsearch can usually leverage their respective advantages.
主题测试文章,只做测试使用。发布者:Walker,转转请注明出处:https://walker-learn.xyz/archives/4782