Other Features
- User Center
- Favorites
- Manage Shipping Addresses (CRUD)
- Messages
Copy
inventory_srv-->userop_srvquery and replace allinventory
Elasticsearch In-depth Analysis Document
1. What is Elasticsearch
Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene, capable of rapidly storing, searching, and analyzing massive amounts of data. It is a core component of the Elastic Stack (formerly ELK Stack).
2. Problems Faced by MySQL Search - In-depth Analysis
2.1 Detailed Explanation of Low Performance Issues
Problem Phenomenon:
-- 当数据量达到100万条时,以下查询可能需要数秒
SELECT * FROM products WHERE name LIKE '%手机%' OR description LIKE '%手机%';
Performance Comparison Data:
| Data Volume | MySQL LIKE Query | Elasticsearch Full-Text Search | Performance Improvement |
|---|---|---|---|
| 10K records | 50ms | 10ms | 5x |
| 100K records | 500ms | 15ms | 33x |
| 1M records | 5000ms | 20ms | 250x |
| 10M records | 50000ms+ | 30ms | 1600x+ |
Root Causes:
- Full table scan: LIKE '%keyword%' cannot use B+ tree indexes and must scan all rows
- I/O intensive: Each query requires reading a large amount of data from disk
- CPU intensive: String matching operations are performed on every row of data
- Memory pressure: Large amounts of data are loaded into memory for processing
Real-world Case:
A product table on an e-commerce platform has 5 million records. Using MySQL fuzzy search for "Apple phone":
- Query time: 8.3 seconds
- CPU utilization: soared to 85%
- When 10 concurrent queries were made, response time increased to over 30 seconds
2.2 Detailed Explanation of No Relevance Ranking Issue
Pain Points of MySQL Query Results:
-- MySQL只能按固定规则排序
SELECT * FROM products
WHERE name LIKE '%手机%'
ORDER BY price DESC; -- 只能按价格、时间等字段排序
Elasticsearch's Relevance Scoring Mechanism:
Search term: "小米手机" (Xiaomi phone)
Relevance score calculation:
┌─────────────────────────────────────┐
│ Document 1: "小米手机12 Pro" (Xiaomi Phone 12 Pro) │
│ • Term Frequency (TF): Both keywords appear │
│ • Inverse Document Frequency (IDF): Calculates the rarity of the term │
│ • Field length: Shorter title, higher weight │
│ • Score: 9.8 │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Document 2: "这是一款性价比很高的手机" (This is a very cost-effective phone) │
│ • Term Frequency (TF): Only "手机" (phone) appears │
│ • Inverse Document Frequency (IDF): "手机" (phone) is more common │
│ • Field length: Longer description, lower weight │
│ • Score: 3.2 │
└─────────────────────────────────────┘
Detailed Explanation of Relevance Factors:
- TF (Term Frequency): The frequency of keywords appearing in a document
- IDF (Inverse Document Frequency): The rarity of keywords across all documents
- Field length normalization: Matches in shorter fields have higher weight than in longer fields
- Field weight boost: Can set titles to be more important than content
- Query-time weight: Can specify certain query terms as more important
2.3 Detailed Explanation of Inability to Perform Full-Text Search
Limitations of MySQL Full-Text Index:
-- MySQL全文索引创建
ALTER TABLE products ADD FULLTEXT(name, description);
-- Problem 1: Minimum word length limit (default 4 characters)
-- "手机" (phone) can be searched, but "机" (machine/device) cannot
-- Problem 2: Poor Chinese word segmentation support
-- "苹果手机" (Apple phone) is treated as a whole, searching for "苹果" (Apple) won't find it
Elasticsearch Full-Text Search Capabilities:
// ES analysis process example
Input text: "我想买一台苹果手机" (I want to buy an Apple phone)
Tokenization result:
[我] [想] [买] [一台] [苹果] [手机] [苹果手机]
Synonym expansion:
[苹果] → [Apple, iPhone]
[手机] → [phone, telephone, mobile]
Spell correction:
"苹果手击" (Apple hand strike) → Suggest "苹果手机" (Apple phone)
2.4 Detailed Explanation of Inaccurate Search and No Word Segmentation Issues
Problems with MySQL String Matching:
-- Search for "笔记本" (notebook)
SELECT * FROM products WHERE name LIKE '%笔记本%';
-- Result: Can find "笔记本电脑" (laptop computer)
-- Problem: Cannot find "笔记 本子" (notes notebook), "notebook", "手提电脑" (portable computer)
Elasticsearch Smart Tokenization Process:
Original text: "ThinkPad X1 Carbon超轻薄笔记本电脑" (ThinkPad X1 Carbon ultra-thin laptop computer)
Standard tokenizer:
[ThinkPad] [X1] [Carbon] [ultra-thin] [notebook] [computer]
IK tokenizer (Chinese):
[ThinkPad] [X1] [Carbon] [ultra] [thin] [ultra-thin]
[note] [book] [notebook] [computer] [laptop computer]
Pinyin tokenizer:
[si] [kao] [pad] → Can be searched by Pinyin
N-gram tokenization:
[Thi] [hin] [ink] [nkP] → Supports partial matching
3. What is Full-Text Search - Core Principle Analysis
3.1 Structured Data vs Unstructured Data
Structured Data (MySQL storage method):
┌──────┬────────┬────────┬────────┐
│ ID │ Name │ Price │ Stock │
├──────┼────────┼────────┼────────┤
│ 1 │iPhone │ 5999 │ 100 │
│ 2 │ Xiaomi │ 2999 │ 200 │
└──────┴────────┴────────┴────────┘
Unstructured Data (Text content):
"This iPhone uses an A15 processor, offering powerful performance,
excellent camera effects, and 20% improved battery life.
User review: 'Amazing, great value for money!'"
3.2 Detailed Explanation of Inverted Index Principle
Forward Index (MySQL):
Document ID → Content
Doc1 → "Xiaomi Phone"
Doc2 → "Apple Phone"
Doc3 → "Xiaomi TV"
Inverted Index (Elasticsearch):
Term → Document List
"Xiaomi" → [Doc1, Doc3]
"Phone" → [Doc1, Doc2]
"Apple" → [Doc2]
"TV" → [Doc3]
Search "Xiaomi Phone":
1. Search "Xiaomi" → Get [Doc1, Doc3]
2. Search "Phone" → Get [Doc1, Doc2]
3. Calculate intersection → Doc1 (most relevant)
3.3 Detailed Structure of Inverted Index
Complete Inverted Index Structure:
Term: "phone"
├── Document Frequency (DF): 1000 documents contain this term
├── Inverted List:
│ ├── Doc1:
│ │ ├── Term Frequency (TF): 3 times
│ │ ├── Positions: [5, 28, 102]
│ │ └── Fields: [title, description]
│ ├── Doc2:
│ │ ├── Term Frequency (TF): 1 time
│ │ ├── Positions: [15]
│ │ └── Fields: [title]
│ └── ...
└── Statistics: Highest term frequency, average term frequency, etc.
4. Detailed Explanation of Elasticsearch Architecture
4.1 Cluster Architecture
Elasticsearch Cluster Architecture Diagram:
┌─────────────── ES Cluster ──────────────┐
│ │
│ ┌─────────────────────────────────┐ │
│ │ Master Node │ │
│ │ • Cluster management │ │
│ │ • Index creation/deletion │ │
│ │ • Shard allocation │ │
│ └─────────────────────────────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ │
│ │ Data │ │ Data │ │
│ │ Node 1 │ │ Node 2 │ │
│ │ ┌──────┐ │ │ ┌──────┐ │ │
│ │ │ P0 │ │ │ │ R0 │ │ │
│ │ ├──────┤ │ │ ├──────┤ │ │
│ │ │ R1 │ │ │ │ P1 │ │ │
│ │ └──────┘ │ │ └──────┘ │ │
│ └──────────┘ └──────────┘ │
│ │
│ P = Primary Shard │
│ R = Replica Shard │
└─────────────────────────────────────────┘
4.2 Data Write Process
Detailed Write Process:
Client → Coordinating Node → Primary Shard → Replica Shard
1. Client sends write request
↓
2. Coordinating node determines shard via hash routing
↓
3. Request forwarded to primary shard node
↓
4. Primary shard writes successfully
↓
5. Replicated to replica shards in parallel
↓
6. All replicas acknowledge
↓
7. Returns success response to client
Timeline:
T0 ──→ T1 ──→ T2 ──→ T3 ──→ T4
Receive Route Primary Shard Replica Respond
4.3 Query Process
Query Execution Process:
Phase 1: Query
┌─────────────────────────────────┐
│ Coordinating node sends query requests to all shards │
│ Each shard returns Top N document IDs and scores │
└─────────────────────────────────┘
↓
Phase 2: Fetch
┌─────────────────────────────────┐
│ Coordinating node consolidates and sorts all results │
│ Retrieves the complete content of the final required documents │
└─────────────────────────────────┘
5. Detailed Explanation of Elasticsearch Core Features
5.1 Detailed Explanation of Query Types
// 1. Match Query - Full-text search
{
"query": {
"match": {
"title": {
"query": "苹果手机",
"operator": "and" // Must contain all terms
}
}
}
}
// 2. Term Query - Exact match
{
"query": {
"term": {
"category.keyword": "手机" // No tokenization, exact match
}
}
}
// 3. Range Query - Range search
{
"query": {
"range": {
"price": {
"gte": 1000,
"lte": 5000
}
}
}
}
// 4. Bool Compound Query
{
"query": {
"bool": {
"must": [
{"match": {"title": "手机"}}
],
"filter": [
{"range": {"price": {"lte": 5000}}}
],
"should": [
{"match": {"brand": "苹果"}} // Bonus item
],
"must_not": [
{"term": {"status": "discontinued"}}
]
}
}
}
5.2 Aggregation Analysis Function
// Sales data analysis example
{
"aggs": {
"sales_per_category": {
"terms": {
"field": "category"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
},
"total_sales": {
"sum": {
"field": "sales_count"
}
},
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{"to": 1000},
{"from": 1000, "to": 5000},
{"from": 5000}
]
}
}
}
}
}
}
6. Real-world Application Case Studies
6.1 E-commerce Search Optimization Case Study
Comparison of an e-commerce platform's search before and after optimization:
| Metric | MySQL Solution | Elasticsearch Solution | Improvement Effect |
|---|---|---|---|
| Average Search Time | 2.3 seconds | 0.05 seconds | 46x improvement |
| Search Accuracy | 65% | 92% | 27% improvement |
| Zero Result Rate | 18% | 3% | 15% reduction |
| Number of Servers | 8 units | 3 units | 62.5% cost savings |
| Concurrency Capability | 100 QPS | 5000 QPS | 50x improvement |
Implementation Details:
-
Data Synchronization Architecture:
MySQL(Primary Data) → Binlog → Logstash → Elasticsearch
↓
Scheduled full synchronization (nightly) -
Search Optimization Strategies:
- Pinyin search: Supports searching "pinguo" to find "Apple"
- Synonyms: Configure "手机" (phone), "电话" (telephone), "mobile" as synonyms
- Search suggestions: Real-time prompts for possible search terms
- Correction function: Automatically corrects common spelling errors
6.2 Log Analysis System Case Study
Log analysis system of an internet company:
Log Processing Flow:
Application Server → Filebeat → Logstash → Elasticsearch → Kibana
↓ ↓ ↓ ↓ ↓
Generate logs Collect Process & Transform Store & Index Visualize & Display
Processing Scale:
• Log volume: 100GB per day
• Number of log entries: 1 billion entries/day
• Query response: Millisecond level
• Retention period: 30 days hot data, 1 year cold data
7. Performance Optimization Best Practices
7.1 Index Design Optimization
// Optimized Mapping Design
{
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart",
"fields": {
"keyword": {
"type": "keyword" // Supports exact matching
},
"pinyin": {
"type": "text",
"analyzer": "pinyin" // Supports Pinyin search
}
}
},
"price": {
"type": "scaled_float",
"scaling_factor": 100 // Price precision optimization
},
"category": {
"type": "keyword" // Category does not require tokenization
},
"description": {
"type": "text",
"analyzer": "ik_smart"
},
"created_time": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||epoch_millis"
}
}
}
}
7.2 Query Performance Optimization Techniques
- Use Filter instead of Query (when scoring is not needed)
```json
// Before optimization: using query (calculates score)
{"query": {"term": {"status": "active"}}}
// After optimization: using filter (does not calculate score, can be cached)
{"query": {"bool": {"filter": {"term": {"status": "active"}}}}}
```
- Reasonably set the number of shards
```
Shard count reference formula:
Number of shards = Data volume (GB) / 30GB
Example:
- 100GB data: 3-4 shards
- 1TB data: 35-40 shards
```
- Batch operation optimization
json
// Use bulk API for batch indexing
POST _bulk
{"index": {"_index": "products", "_id": 1}}
{"name": "iPhone", "price": 5999}
{"index": {"_index": "products", "_id": 2}}
{"name": "Xiaomi", "price": 2999}
8. Elasticsearch vs Traditional Databases
8.1 Comparison of Applicable Scenarios
| Scenario | MySQL | Elasticsearch | Recommended Choice |
|---|---|---|---|
| Full-text Search | ❌ Poor | ✅ Excellent | ES |
| Transaction Support | ✅ Full ACID | ❌ No Transactions | MySQL |
| Real-time Statistical Analysis | ⚠️ Fair | ✅ Excellent | ES |
| Relational Queries | ✅ Excellent | ❌ Limited | MySQL |
| Geolocation Search | ❌ Poor | ✅ Excellent | ES |
| Log Analysis | ❌ Not suitable | ✅ Specialized | ES |
| Precise Numerical Calculations | ✅ Precise | ⚠️ Approximate | MySQL |
8.2 Hybrid Architecture Solution
Recommended Hybrid Architecture:
User Request
↓
┌──────────────┐
│ Application Layer │
└──────────────┘
↓
┌──────────────────────────┐
│ Search Request → ES │
│ Transactional Operations → MySQL │
│ Cache → Redis │
└──────────────────────────┘
Data Synchronization:
MySQL(Write) → Binlog → Canal/Debezium → Kafka → ES(Read)
9. Common Problems and Solutions
9.1 Data Consistency Issues
Problem: MySQL and ES data inconsistency
Solutions:
1. Dual-write strategy: Write to both MySQL and ES simultaneously, using a message queue to ensure eventual consistency
2. CDC (Change Data Capture): Real-time synchronization via Binlog
3. Regular verification: Scheduled tasks compare data differences and fix them
9.2 Deep Paging Issues
Problem: Extremely poor performance when querying data on the 10,000th page
Solutions:
// 1. Use search_after (recommended)
{
"size": 10,
"sort": [{"_id": "asc"}],
"search_after": [10000] // Sort value of the last document on the previous page
}
// 2. Use scroll API (suitable for export)
POST /products/_search?scroll=1m
{
"size": 100,
"query": {"match_all": {}}
}
10. Summary
Elasticsearch perfectly solves various problems faced by traditional databases in search scenarios through its inverted index, distributed architecture, and powerful full-text search capabilities. Proper use of Elasticsearch can:
- Improve search performance: From seconds to milliseconds
- Improve search quality: Through relevance scoring and smart tokenization
- Support complex analysis: Real-time aggregation and statistical analysis
- Reduce operational costs: Fewer servers, higher efficiency
However, it is important to note that Elasticsearch is not a replacement for MySQL, but a supplement. In actual projects, the appropriate storage solution should be chosen based on specific scenarios, and a hybrid architecture of MySQL+Elasticsearch can usually leverage their respective strengths.
主题测试文章,只做测试使用。发布者:Walker,转转请注明出处:https://walker-learn.xyz/archives/6775