# ES Installation
elasticsearch (understand as a library) kibana (understand as a connection tool)
The versions of ES and Kibana (5601) must be consistent
## Learning Elasticsearch (ES) with MySQL Comparison
### Terminology Comparison
| MySQL | Elasticsearch |
|---|---|
| database | index (index) |
| table | type (fixed as _doc from 7.x, completely removed in 8.x) |
| row | document (document) |
| column | field (field) |
| schema | mapping (mapping) |
| sql | DSL (Domain Specific Language query syntax) |
Note: Starting from ES 7.x, an index can only have one type, typically named _doc; in 8.x, the type concept is basically invisible externally, and modeling is done directly against "index + mapping".
### Core Concepts Overview
- Index (index):
- Similar to a "database" in a relational database, a logical collection of documents of the same type, stored internally with primary shards and replica shards.
- Document (document):
- Similar to a row of data, a JSON object, uniquely identified by
_id, can be auto-generated by ES or customized. - Field (field):
- An attribute of a document, similar to a column. Field type affects how inverted indexes are built and what queries are available.
- Mapping (mapping):
- Similar to table structure definition, declares field types, indexing, and tokenization methods. Once published, field types are basically immutable (requires rebuilding the index and re-importing data).
- Tokenization and Analyzer (analyzer):
- How text is split into terms and written to the inverted index, determining full-text search effectiveness (Chinese commonly uses third-party plugins like
ik_max_word/ik_smart).
### Modeling Guide (Comparing with MySQL Thinking)
- First determine query dimensions and retrieval methods, then design fields and mappings; don't blindly copy MySQL's third normal form.
- Moderate redundancy, eliminate joins: ES has no cross-index joins, queries are per-index; complex scenarios use denormalization or nested/parent-child.
- Distinguish clearly between numeric, time, keyword, and text:
keyword: exact matching, aggregation, sorting;text: full-text search (tokenized), not suitable for aggregation/sorting;- Use
datefor time,geo_pointfor geographic location, etc.
### Common Operations Comparison
- Create Index (with Mapping)
SQL (create database/table/fields):
-- MySQL example
CREATE DATABASE shop;
CREATE TABLE product (
id BIGINT PRIMARY KEY,
title VARCHAR(255),
price DECIMAL(10,2),
tags JSON
);
ES (create index + mapping):
PUT /shop_product
{
"settings": {"number_of_shards": 1, "number_of_replicas": 1},
"mappings": {
"properties": {
"title": {"type": "text", "analyzer": "standard"},
"price": {"type": "double"},
"tags": {"type": "keyword"},
"createdAt": {"type": "date"}
}
}
}
- Insert a Row/Document
SQL:
INSERT INTO product(id,title,price) VALUES(1,'iPhone',5999.00);
ES:
POST /shop_product/_doc/1
{
"title": "iPhone",
"price": 5999.00,
"tags": ["phone", "apple"],
"createdAt": "2025-09-18T12:00:00Z"
}
- Query by Primary Key
SQL:
SELECT * FROM product WHERE id=1;
ES:
GET /shop_product/_doc/1
- Conditional Query (DSL vs SQL)
SQL:
SELECT id,title FROM product
WHERE price BETWEEN 3000 AND 8000 AND title LIKE '%phone%'
ORDER BY price DESC LIMIT 10 OFFSET 0;
ES:
POST /shop_product/_search
{
"from": 0,
"size": 10,
"sort": [{"price": "desc"}],
"_source": ["id","title","price"],
"query": {
"bool": {
"must": [ {"match": {"title": "phone"}} ],
"filter": [ {"range": {"price": {"gte": 3000, "lte": 8000}}} ]
}
}
}
- Update and Delete
SQL: UPDATE ... WHERE id=? / DELETE FROM ... WHERE id=?
ES:
POST /shop_product/_update/1
{"doc": {"price": 5799}}
DELETE /shop_product/_doc/1
- Aggregation (GROUP BY Comparison)
SQL:
SELECT tags, COUNT(*) AS cnt FROM product GROUP BY tags;
ES:
POST /shop_product/_search
{
"size": 0,
"aggs": {
"by_tag": {"terms": {"field": "tags"}}
}
}
### Index Lifecycle and Performance Key Points
- The number of shards is determined when creating the index and can only be adjusted later by rebuilding; the number of replicas can be adjusted online.
- Write-heavy scenarios: reduce replicas, increase refresh interval; read-heavy: increase replicas appropriately, enable caching and suitable field
doc_values. - For major mapping/type changes, use index rebuild (reindex): new index -> import data -> switch alias.
### Kibana and Port
- Kibana default port is 5601, version must be consistent with ES (7.x with 7.x, 8.x with 8.x).
- You can directly paste the REST/DSL examples above in Kibana Dev Tools to execute.
## We Mainly Use ES's Query Functionality
GET _search?q=bobby // will query all indexes
Query through request body
### Request Body Query Detailed Explanation
ES supports two query methods:
- URL Parameter Query:
GET _search?q=field:value(simple and fast) - Request Body Query:
POST _search+ JSON body (powerful, recommended)
#### Why Recommend Request Body Query?
- Complete Functionality: supports complex queries, aggregations, sorting, pagination, etc.
- Strong Readability: JSON structure is clear, easy to maintain
- Better Performance: avoids URL length limits, supports more complex query logic
- Debug Friendly: directly paste and execute in Kibana Dev Tools
#### Basic Query Examples
- Simple Match Query
POST /shop_product/_search
{
"query": {
"match": {
"title": "iPhone"
}
}
}
- Multi-condition Combined Query
POST /shop_product/_search
{
"query": {
"bool": {
"must": [
{"match": {"title": "phone"}},
{"range": {"price": {"gte": 1000, "lte": 8000}}}
],
"must_not": [
{"term": {"status": "discontinued"}}
],
"should": [
{"match": {"tags": "apple"}}
]
}
}
}
- Exact Match vs Full-text Search
# Exact match (no tokenization)
POST /shop_product/_search
{
"query": {
"term": {
"tags": "phone"
}
}
}
# Full-text search (tokenized)
POST /shop_product/_search
{
"query": {
"match": {
"title": "iPhone 15 Pro"
}
}
}
- Pagination and Sorting
POST /shop_product/_search
{
"from": 0,
"size": 10,
"sort": [
{"price": {"order": "desc"}},
{"_score": {"order": "desc"}}
],
"query": {
"match_all": {}
}
}
- Specify Return Fields
POST /shop_product/_search
{
"_source": ["title", "price", "tags"],
"query": {
"match_all": {}
}
}
- Aggregation Query (Statistics)
POST /shop_product/_search
{
"size": 0,
"aggs": {
"price_stats": {
"stats": {"field": "price"}
},
"tags_count": {
"terms": {"field": "tags", "size": 10}
},
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{"to": 1000},
{"from": 1000, "to": 5000},
{"from": 5000}
]
}
}
}
}
- Highlighting
POST /shop_product/_search
{
"query": {
"match": {"title": "iPhone"}
},
"highlight": {
"fields": {
"title": {}
}
}
}
#### Common Query Type Comparison
| Requirement | SQL | ES Request Body |
|---|---|---|
| Full table scan | SELECT * FROM table |
{"query": {"match_all": {}}} |
| Exact match | WHERE id = 1 |
{"query": {"term": {"id": 1}}} |
| Fuzzy match | WHERE title LIKE '%phone%' |
{"query": {"match": {"title": "phone"}}} |
| Range query | WHERE price BETWEEN 1000 AND 5000 |
{"query": {"range": {"price": {"gte": 1000, "lte": 5000}}}} |
| Multiple conditions AND | WHERE a=1 AND b=2 |
{"query": {"bool": {"must": [{"term": {"a": 1}}, {"term": {"b": 2}}]}}} |
| Multiple conditions OR | WHERE a=1 OR b=2 |
{"query": {"bool": {"should": [{"term": {"a": 1}}, {"term": {"b": 2}}]}}} |
| Group statistics | SELECT tag, COUNT(*) FROM table GROUP BY tag |
{"aggs": {"by_tag": {"terms": {"field": "tag"}}}} |
#### Performance Optimization Recommendations
- Use filter instead of query: filter doesn't calculate relevance scores, better performance
POST /shop_product/_search
{
"query": {
"bool": {
"filter": [
{"range": {"price": {"gte": 1000}}}
]
}
}
}
- Use size reasonably: avoid returning large amounts of data at once
- Use _source filtering: only return needed fields
- Cache common queries: ES automatically caches filter query results
## POST Update Operations Detailed Explanation
ES has two update methods: **overwrite update** and **partial update**, understanding their differences is important.
### 1. Overwrite Update (PUT Method)
**Characteristics**: Completely replaces the entire document, unspecified fields will be deleted
# Original document
{
"id": 1,
"title": "iPhone 15",
"price": 5999,
"tags": ["phone", "apple"],
"description": "Latest iPhone",
"stock": 100
}
# Overwrite update (only specified fields are retained)
PUT /shop_product/_doc/1
{
"title": "iPhone 15 Pro",
"price": 7999
}
# Updated document (description and stock fields are deleted)
{
"id": 1,
"title": "iPhone 15 Pro",
"price": 7999
}
### 2. Partial Update (POST _update Method)
**Characteristics**: Only updates specified fields, other fields remain unchanged
# Original document
{
"id": 1,
"title": "iPhone 15",
"price": 5999,
"tags": ["phone", "apple"],
"description": "Latest iPhone",
"stock": 100
}
# Partial update (only specified fields are updated)
POST /shop_product/_update/1
{
"doc": {
"title": "iPhone 15 Pro",
"price": 7999
}
}
# Updated document (other fields remain unchanged)
{
"id": 1,
"title": "iPhone 15 Pro",
"price": 7999,
"tags": ["phone", "apple"],
"description": "Latest iPhone",
"stock": 100
}
### 3. Advanced Update Operations
#### 3.1 Conditional Update (upsert)
If the document doesn't exist, create it; if it exists, update it:
POST /shop_product/_update/999
{
"doc": {
"title": "New Product",
"price": 1000
},
"upsert": {
"title": "New Product",
"price": 1000,
"tags": ["new"],
"created_at": "2025-01-18"
}
}
#### 3.2 Script Update
Use scripts for complex updates:
# Increase stock
POST /shop_product/_update/1
{
"script": {
"source": "ctx._source.stock += params.increment",
"params": {
"increment": 50
}
}
}
# Conditional update (only update if price is greater than 5000)
POST /shop_product/_update/1
{
"script": {
"source": "if (ctx._source.price > 5000) { ctx._source.price = params.new_price }",
"params": {
"new_price": 7500
}
}
}
#### 3.3 Array Operations
# Add tag
POST /shop_product/_update/1
{
"script": {
"source": "if (ctx._source.tags == null) { ctx._source.tags = [] } ctx._source.tags.add(params.tag)",
"params": {
"tag": "premium"
}
}
}
# Remove tag
POST /shop_product/_update/1
{
"script": {
"source": "ctx._source.tags.removeIf(item -> item == params.tag)",
"params": {
"tag": "old"
}
}
}
### 4. Bulk Update
POST /shop_product/_bulk
{"update":{"_id":"1"}}
{"doc":{"price":5999}}
{"update":{"_id":"2"}}
{"doc":{"price":6999}}
{"update":{"_id":"3"}}
{"doc":{"price":7999}}
### 5. Update Operations Comparison Table
| Operation Method | Method | Characteristics | Use Cases |
|---|---|---|---|
| Overwrite update | PUT /index/_doc/id |
Completely replace document | Document structure changes significantly, need to delete fields |
| Partial update | POST /index/_update/id |
Only update specified fields | Daily business updates, retain other fields |
| Conditional update | POST /index/_update/id + upsert |
Update if exists, create if not | Uncertain whether document exists |
| Script update | POST /index/_update/id + script |
Complex logic updates | Updates requiring calculations and conditional logic |
### 6. Performance Considerations
- Partial updates perform better: only transmit changed fields, reduce network overhead
- Script updates are slower: require parsing and executing scripts, relatively lower performance
- Bulk operations: use
_bulkAPI for large updates to improve efficiency - Version control: ES automatically handles concurrent update conflicts through
_versionfield
## Delete Data Operations Detailed Explanation
ES has multiple delete methods, from deleting single documents to entire indexes. Understanding different deletion methods for different scenarios is important.
### 1. Delete Single Document
#### 1.1 Delete by ID
# Delete document with specified ID
DELETE /shop_product/_doc/1
# Response example
{
"_index": "shop_product",
"_id": "1",
"_version": 2,
"result": "deleted",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
}
}
#### 1.2 Conditional Delete (by Query)
# Delete all products with price less than 1000
POST /shop_product/_delete_by_query
{
"query": {
"range": {
"price": {
"lt": 1000
}
}
}
}
# Delete products with specific tag
POST /shop_product/_delete_by_query
{
"query": {
"term": {
"tags": "discontinued"
}
}
}
### 2. Bulk Delete
#### 2.1 Using _bulk API
POST /shop_product/_bulk
{"delete":{"_id":"1"}}
{"delete":{"_id":"2"}}
{"delete":{"_id":"3"}}
#### 2.2 Bulk Conditional Delete
# Delete products with multiple conditions
POST /shop_product/_delete_by_query
{
"query": {
"bool": {
"should": [
{"term": {"status": "discontinued"}},
{"range": {"last_updated": {"lt": "2020-01-01"}}}
]
}
}
}
### 3. Delete Entire Index
# Delete entire index (dangerous operation!)
DELETE /shop_product
# Response example
{
"acknowledged": true
}
### 4. Delete Type in Index (ES versions below 7.x)
# Delete specific type in index (only for 6.x and below)
DELETE /shop_product/product_type
### 5. Advanced Delete Operations
#### 5.1 Delete with Version Control
# Only delete if version matches (prevent concurrent deletes)
DELETE /shop_product/_doc/1?version=1&version_type=external
#### 5.2 Asynchronous Delete of Large Data
# Asynchronous delete (suitable for large data)
POST /shop_product/_delete_by_query
{
"query": {
"match_all": {}
},
"wait_for_completion": false,
"conflicts": "proceed"
}
# Response includes task ID for querying delete progress
{
"task": "r1A2WoRbTwKZ516z6NEs5A:36619"
}
# Query task status
GET /_tasks/r1A2WoRbTwKZ516z6NEs5A:36619
#### 5.3 Keep Snapshot Before Delete
# Create snapshot before delete (backup)
PUT /_snapshot/backup_repo/snapshot_before_delete
{
"indices": "shop_product",
"ignore_unavailable": true,
"include_global_state": false
}
# Then execute delete operation
POST /shop_product/_delete_by_query
{
"query": {
"range": {
"created_at": {
"lt": "2020-01-01"
}
}
}
}
### 6. Delete Operations Comparison Table
| Delete Method | Method | Characteristics | Use Cases |
|---|---|---|---|
| Delete by ID | DELETE /index/_doc/id |
Precisely delete single document | Delete when document ID is known |
| Conditional delete | POST /index/_delete_by_query |
Delete by query conditions | Bulk delete documents matching conditions |
| Bulk delete | POST /index/_bulk |
Delete multiple specified documents at once | Delete when multiple document IDs are known |
| Delete index | DELETE /index |
Delete entire index | Clean test data or rebuild index |
| Asynchronous delete | _delete_by_query + wait_for_completion:false |
Non-blocking, executes in background | Delete large data to avoid timeout |
### 7. Delete Operations Precautions
#### 7.1 Performance Considerations
# Use scroll query for better performance when deleting large amounts
POST /shop_product/_delete_by_query
{
"query": {
"range": {
"price": {
"lt": 100
}
}
},
"scroll_size": 1000,
"conflicts": "proceed"
}
#### 7.2 Safe Delete
# Query first to confirm before delete
POST /shop_product/_search
{
"query": {
"range": {
"price": {
"lt": 100
}
}
},
"size": 0
}
# Execute delete after confirming
POST /shop_product/_delete_by_query
{
"query": {
"range": {
"price": {
"lt": 100
}
}
}
}
#### 7.3 Delete Monitoring
# Monitor delete progress
GET /_tasks?detailed=true&actions=*delete*
# Cancel delete task
POST /_tasks/task_id/_cancel
### 8. Delete vs Soft Delete
In actual business, soft delete is usually used instead of physical delete:
# Soft delete: mark as deleted
POST /shop_product/_update/1
{
"doc": {
"deleted": true,
"deleted_at": "2025-01-18T12:00:00Z"
}
}
# Exclude deleted documents when querying
POST /shop_product/_search
{
"query": {
"bool": {
"must": [
{"match": {"title": "iPhone"}}
],
"must_not": [
{"term": {"deleted": true}}
]
}
}
}
### 9. Recover Deleted Data
ES deleted data cannot be directly recovered, but can be recovered through:
- Restore from snapshot: if snapshot was created previously
- Restore from backup: if data backup exists
- Re-import: re-import from original data source
# Restore index from snapshot
POST /_snapshot/backup_repo/snapshot_name/_restore
{
"indices": "shop_product",
"ignore_unavailable": true,
"include_global_state": false
}
## Bulk Insert Operations Detailed Explanation
Bulk insert in ES is an efficient way to handle large amounts of data. Through the _bulk API, multiple operations including insert, update, delete, etc. can be executed at once.
### 1. Basic Bulk Insert
#### 1.1 Insert Using _bulk API
POST /shop_product/_bulk
{"index":{"_id":"1"}}
{"title":"iPhone 15","price":5999,"tags":["phone","apple"],"stock":100}
{"index":{"_id":"2"}}
{"title":"Samsung Galaxy","price":4999,"tags":["phone","android"],"stock":50}
{"index":{"_id":"3"}}
{"title":"MacBook Pro","price":12999,"tags":["laptop","apple"],"stock":20}
#### 1.2 Bulk Insert with Auto-generated IDs
POST /shop_product/_bulk
{"index":{}}
{"title":"iPad Air","price":3999,"tags":["tablet","apple"],"stock":30}
{"index":{}}
{"title":"Dell XPS","price":8999,"tags":["laptop","windows"],"stock":15}
{"index":{}}
{"title":"Surface Pro","price":6999,"tags":["tablet","windows"],"stock":25}
### 2. Mixed Bulk Operations
POST /shop_product/_bulk
{"index":{"_id":"10"}}
{"title":"New Product 1","price":1000,"tags":["new"],"stock":100}
{"update":{"_id":"1"}}
{"doc":{"price":5799}}
{"delete":{"_id":"2"}}
{"index":{"_id":"11"}}
{"title":"New Product 2","price":2000,"tags":["new"],"stock":200}
### 3. Bulk Insert Response Handling
# Bulk operation response example
{
"took": 30,
"errors": false,
"items": [
{
"index": {
"_index": "shop_product",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 201
}
},
{
"index": {
"_index": "shop_product",
"_id": "2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 201
}
}
]
}
### 4. Error Handling
#### 4.1 Check for Errors in Bulk Operations
# Bulk operation with errors
POST /shop_product/_bulk
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}
{"index":{"_id":"1"}} # Duplicate ID, will produce error
{"title":"Product 2","price":2000}
# Error information in response
{
"took": 5,
"errors": true,
"items": [
{
"index": {
"_index": "shop_product",
"_id": "1",
"status": 201,
"result": "created"
}
},
{
"index": {
"_index": "shop_product",
"_id": "1",
"status": 409,
"error": {
"type": "version_conflict_engine_exception",
"reason": "[1]: version conflict, document already exists"
}
}
}
]
}
#### 4.2 Handle Partial Failures
# Use filter_path to return only error items
POST /shop_product/_bulk?filter_path=items.*.error
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}
{"index":{"_id":"1"}}
{"title":"Product 2","price":2000}
### 5. Performance Optimization
#### 5.1 Batch Size Control
# Recommended batch size: 1000-5000 documents
POST /shop_product/_bulk
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}
# ... more documents
{"index":{"_id":"1000"}}
{"title":"Product 1000","price":1000}
#### 5.2 Refresh Strategy
# Don't refresh immediately during bulk insert (improves performance)
POST /shop_product/_bulk?refresh=false
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}
{"index":{"_id":"2"}}
{"title":"Product 2","price":2000}
# Manually refresh after bulk insert
POST /shop_product/_refresh
#### 5.3 Concurrency Control
# Set timeout for bulk operation
POST /shop_product/_bulk?timeout=60s
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}
### 6. Bulk Import from File
#### 6.1 Prepare Data File
# data.json file content
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000,"tags":["tag1"],"stock":100}
{"index":{"_id":"2"}}
{"title":"Product 2","price":2000,"tags":["tag2"],"stock":200}
{"index":{"_id":"3"}}
{"title":"Product 3","price":3000,"tags":["tag3"],"stock":300}
#### 6.2 Import Using curl
# Bulk import from file
curl -X POST "localhost:9200/shop_product/_bulk" \
-H "Content-Type: application/json" \
--data-binary @data.json
### 7. Bulk Insert Best Practices
#### 7.1 Data Preprocessing
# Create index and mapping before bulk insert
PUT /shop_product
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"title": {"type": "text", "analyzer": "standard"},
"price": {"type": "double"},
"tags": {"type": "keyword"},
"stock": {"type": "integer"}
}
}
}
#### 7.2 Process Large Data in Batches
# Batch insert large amounts of data
POST /shop_product/_bulk
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}
# ... 1000 documents
# Wait a while then continue with next batch
POST /shop_product/_bulk
{"index":{"_id":"1001"}}
{"title":"Product 1001","price":1000}
# ... next 1000 documents
#### 7.3 Monitor Bulk Insert Progress
# Check index status
GET /shop_product/_stats
# Check document count
GET /shop_product/_count
# Check index health status
GET /_cluster/health/shop_product
### 8. Bulk Insert vs Single Insert Comparison
| Operation Method | Method | Performance | Use Cases |
|---|---|---|---|
| Single insert | POST /index/_doc |
Slower | Small amounts of data, real-time insert |
| Bulk insert | POST /index/_bulk |
Very fast | Large amounts of data, batch import |
| Mixed operations | POST /index/_bulk |
Medium | Need to insert, update, delete simultaneously |
### 9. Common Issues Resolution
#### 9.1 Insufficient Memory
# Reduce batch size
POST /shop_product/_bulk
# Only include 500 documents instead of 1000
#### 9.2 Timeout Issues
# Increase timeout
POST /shop_product/_bulk?timeout=120s
#### 9.3 Version Conflict
# Use upsert to avoid version conflict
POST /shop_product/_bulk
{"update":{"_id":"1"}}
{"doc":{"price":1000},"upsert":{"title":"Product 1","price":1000}}
### 10. Bulk Insert Monitoring
# Monitor bulk operation performance
GET /_nodes/stats/indices/indexing
# View index statistics
GET /shop_product/_stats/indexing
# Monitor cluster status
GET /_cluster/health?pretty
## mget Bulk Get Operations Detailed Explanation
The mget (multi-get) API in ES allows fetching multiple documents at once, more efficient than individual fetches, especially suitable for scenarios requiring batch reading.
### 1. Basic Bulk Get
#### 1.1 Get Multiple Documents from Same Index
POST /shop_product/_mget
{
"docs": [
{"_id": "1"},
{"_id": "2"},
{"_id": "3"}
]
}
#### 1.2 Get Documents from Different Indexes
POST /_mget
{
"docs": [
{"_index": "shop_product", "_id": "1"},
{"_index": "shop_user", "_id": "100"},
{"_index": "shop_order", "_id": "200"}
]
}
#### 1.3 Specify Return Fields
POST /shop_product/_mget
{
"docs": [
{
"_id": "1",
"_source": ["title", "price"]
},
{
"_id": "2",
"_source": ["title", "tags"]
}
]
}
### 2. Bulk Get Response Handling
# mget response example
{
"docs": [
{
"_index": "shop_product",
"_id": "1",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"title": "iPhone 15",
"price": 5999,
"tags": ["phone", "apple"],
"stock": 100
}
},
{
"_index": "shop_product",
"_id": "2",
"_version": 1,
"_seq_no": 1,
"_primary_term": 1,
"found": true,
"_source": {
"title": "Samsung Galaxy",
"price": 4999,
"tags": ["phone", "android"],
"stock": 50
}
},
{
"_index": "shop_product",
"_id": "999",
"found": false
}
]
}
### 3. Advanced Bulk Get
#### 3.1 Using ids Parameter (Simplified Syntax)
POST /shop_product/_mget
{
"ids": ["1", "2", "3", "999"]
}
#### 3.2 Exclude Unnecessary Fields
POST /shop_product/_mget
{
"docs": [
{
"_id": "1",
"_source": {
"excludes": ["description", "created_at"]
}
},
{
"_id": "2",
"_source": {
"includes": ["title", "price"]
}
}
]
}
#### 3.3 Get Stored Fields
POST /shop_product/_mget
{
"docs": [
{
"_id": "1",
"stored_fields": ["title", "price"]
}
]
}
### 4. Bulk Get Performance Optimization
#### 4.1 Control Batch Size Reasonably
# Recommended batch size: 100-1000 documents
POST /shop_product/_mget
{
"ids": [
"1", "2", "3", "4", "5",
# ... more IDs
"100"
]
}
#### 4.2 Use Routing for Optimization
POST /shop_product/_mget
{
"docs": [
{
"_id": "1",
"routing": "user123"
},
{
"_id": "2",
"routing": "user123"
}
]
}
### 5. Error Handling
#### 5.1 Handle Non-existent Documents
POST /shop_product/_mget
{
"ids": ["1", "999", "2"]
}
# Response includes documents with found: false
{
"docs": [
{
"_index": "shop_product",
"_id": "1",
"found": true,
"_source": {...}
},
{
"_index": "shop_product",
"_id": "999",
"found": false
},
{
"_index": "shop_product",
"_id": "2",
"found": true,
"_source": {...}
}
]
}
#### 5.2 Filter Non-existent Documents
# Return only found documents
POST /shop_product/_mget?filter_path=docs._source
{
"ids": ["1", "999", "2"]
}
### 6. Real-world Application Scenarios
#### 6.1 Get Shopping Cart Product Information
# Bulk get product information by product IDs in shopping cart
POST /shop_product/_mget
{
"ids": ["cart_item_1", "cart_item_2", "cart_item_3"]
}
#### 6.2 Get User Order Details
# Bulk get all user order details
POST /shop_order/_mget
{
"docs": [
{"_id": "order_001", "_source": ["order_id", "total", "status"]},
{"_id": "order_002", "_source": ["order_id", "total", "status"]},
{"_id": "order_003", "_source": ["order_id", "total", "status"]}
]
}
#### 6.3 Cross-index Data Association
# Get user information and user orders simultaneously
POST /_mget
{
"docs": [
{"_index": "shop_user", "_id": "user_123", "_source": ["name", "email"]},
{"_index": "shop_order", "_id": "order_456", "_source": ["order_id", "total"]},
{"_index": "shop_product", "_id": "product_789", "_source": ["title", "price"]}
]
}
### 7. mget vs Single Get Comparison
| Operation Method | Method | Performance | Use Cases |
|---|---|---|---|
| Single get | GET /index/_doc/id |
Slower | Get single document |
| Bulk get | POST /index/_mget |
Very fast | Bulk get multiple documents |
| Search get | POST /index/_search |
Medium | Get documents by conditions |
### 8. Bulk Get Best Practices
#### 8.1 Data Preprocessing
# Check index status before bulk get
GET /shop_product/_stats
# Check if documents exist
GET /shop_product/_count
#### 8.2 Process Large Data in Batches
# Batch get large amounts of documents
POST /shop_product/_mget
{
"ids": ["1", "2", "3", "4", "5"]
# First batch of 100 documents
}
# Continue with next batch
POST /shop_product/_mget
{
"ids": ["6", "7", "8", "9", "10"]
# Next batch of 100 documents
}
#### 8.3 Caching Strategy
# Use caching to improve performance
POST /shop_product/_mget?preference=_local
{
"ids": ["1", "2", "3"]
}
### 9. Common Issues Resolution
#### 9.1 Insufficient Memory
# Reduce batch size
POST /shop_product/_mget
{
"ids": ["1", "2", "3"] # Only get 3 documents
}
#### 9.2 Timeout Issues
# Increase timeout
POST /shop_product/_mget?timeout=60s
{
"ids": ["1", "2", "3"]
}
#### 9.3 Index Doesn't Exist
# Handle case where index doesn't exist
POST /_mget
{
"docs": [
{"_index": "shop_product", "_id": "1"},
{"_index": "non_existent_index", "_id": "2"}
]
}
### 10. Bulk Get Monitoring
# Monitor bulk get performance
GET /_nodes/stats/indices/search
# View index search statistics
GET /shop_product/_stats/search
# Monitor cluster status
GET /_cluster/health?pretty
### 11. Combine with Other Bulk Operations
#### 11.1 Bulk Get + Bulk Update
# First bulk get
POST /shop_product/_mget
{
"ids": ["1", "2", "3"]
}
# Then bulk update
POST /shop_product/_bulk
{"update":{"_id":"1"}}
{"doc":{"stock":95}}
{"update":{"_id":"2"}}
{"doc":{"stock":45}}
{"update":{"_id":"3"}}
{"doc":{"stock":15}}
#### 11.2 Bulk Get + Search
# First search to get relevant document IDs
POST /shop_product/_search
{
"query": {"match": {"tags": "phone"}},
"_source": false,
"size": 10
}
# Then bulk get detailed information
POST /shop_product/_mget
{
"ids": ["1", "2", "3", "4", "5"]
}
## ES's query:{} Functionality Detailed Explanation
The query in ES is the core part of a search request, used to define search conditions and logic. query:{} is an empty query object, typically used with match_all.
### 1. query Object Basic Structure
POST /shop_product/_search
{
"query": {
// Query conditions defined here
}
}
### 2. Common Query Types
#### 2.1 match_all Query (Full Table Scan)
POST /shop_product/_search
{
"query": {
"match_all": {}
}
}
**Functionality**:
- Matches all documents in the index
- Equivalent to
SELECT * FROM tablein SQL - Commonly used for getting all data or as base query
#### 2.2 match Query (Full-text Search)
POST /shop_product/_search
{
"query": {
"match": {
"title": "iPhone"
}
}
}
**Functionality**:
- Performs full-text search on specified field
- Tokenizes query terms
- Supports fuzzy matching and relevance scoring
#### 2.3 term Query (Exact Match)
POST /shop_product/_search
{
"query": {
"term": {
"tags": "phone"
}
}
}
**Functionality**:
- Exact match without tokenization
- Suitable for keyword type fields
- Better performance than match query
#### 2.4 range Query (Range Query)
POST /shop_product/_search
{
"query": {
"range": {
"price": {
"gte": 1000,
"lte": 5000
}
}
}
}
**Functionality**:
- Range query on numeric, date and other fields
- Supports
gte(greater than or equal),gt(greater than),lte(less than or equal),lt(less than) - Commonly used for price range, time range queries
#### 2.5 bool Query (Compound Query)
POST /shop_product/_search
{
"query": {
"bool": {
"must": [
{"match": {"title": "phone"}},
{"range": {"price": {"gte": 1000}}}
],
"must_not": [
{"term": {"status": "discontinued"}}
],
"should": [
{"match": {"tags": "apple"}}
],
"filter": [
{"term": {"category": "electronics"}}
]
}
}
}
**Functionality**:
must: must match, affects relevance scoringmust_not: must not match, doesn't affect scoringshould: should match, increases relevance scoringfilter: must match, but doesn't affect scoring, better performance
#### 2.6 wildcard Query (Wildcard Query)
POST /shop_product/_search
{
"query": {
"wildcard": {
"title": {
"value": "iPh*"
}
}
}
}
**Functionality**:
- Supports
*(match any characters) and?(match single character) - Slower performance, not recommended on large data
- Suitable for fuzzy matching scenarios
#### 2.7 prefix Query (Prefix Query)
POST /shop_product/_search
{
"query": {
"prefix": {
"title": "iPh"
}
}
}
**Functionality**:
- Matches documents starting with specified prefix
- Suitable for auto-complete, search suggestions
- Better performance than wildcard query
#### 2.8 fuzzy Query (Fuzzy Query)
POST /shop_product/_search
{
"query": {
"fuzzy": {
"title": {
"value": "iphone",
"fuzziness": "AUTO"
}
}
}
}
**Functionality**:
- Supports typo-tolerant queries
fuzzinessparameter controls tolerance level- Suitable for search correction scenarios
### 3. Query Performance Optimization
#### 3.1 Use filter Instead of query
POST /shop_product/_search
{
"query": {
"bool": {
"filter": [
{"range": {"price": {"gte": 1000}}}
]
}
}
}
**Advantages**:
- filter doesn't calculate relevance scores, better performance
- Results are cached, repeated queries are faster
- Suitable for exact match conditions
#### 3.2 Use _source Filtering Reasonably
POST /shop_product/_search
{
"_source": ["title", "price"],
"query": {
"match_all": {}
}
}
**Advantages**:
- Only return needed fields, reduce network transmission
- Improve query performance
- Lower memory usage
#### 3.3 Use size to Control Return Quantity
POST /shop_product/_search
{
"size": 10,
"query": {
"match_all": {}
}
}
**Advantages**:
- Avoid returning large amounts of data at once
- Improve query response speed
- Reduce memory consumption
### 4. Query Debugging Techniques
#### 4.1 Use explain Parameter
POST /shop_product/_search
{
"explain": true,
"query": {
"match": {
"title": "iPhone"
}
}
}
**Functionality**:
- Shows scoring calculation process for each document
- Helps understand why results are ranked this way
- Used for query optimization and debugging
#### 4.2 Use profile Parameter
POST /shop_product/_search
{
"profile": true,
"query": {
"bool": {
"must": [
{"match": {"title": "phone"}},
{"range": {"price": {"gte": 1000}}}
]
}
}
}
**Functionality**:
- Shows detailed timing information of query execution
- Helps identify performance bottlenecks
- Used for query performance optimization
### 5. Query Best Practices
#### 5.1 Query Structure Optimization
# Good query structure
POST /shop_product/_search
{
"query": {
"bool": {
"must": [
{"match": {"title": "phone"}}
],
"filter": [
{"range": {"price": {"gte": 1000, "lte": 5000}}},
{"term": {"status": "active"}}
]
}
},
"sort": [{"price": "desc"}],
"size": 20
}
#### 5.2 Avoid Deep Pagination
# Use search_after instead of from/size
POST /shop_product/_search
{
"query": {"match_all": {}},
"size": 100,
"sort": [{"_id": "asc"}],
"search_after": ["last_doc_id"]
}
#### 5.3 Use Aggregations Reasonably
POST /shop_product/_search
{
"size": 0,
"aggs": {
"price_stats": {
"stats": {"field": "price"}
},
"tags_count": {
"terms": {"field": "tags", "size": 10}
}
}
}
### 6. Common Query Patterns
#### 6.1 Search + Filter
POST /shop_product/_search
{
"query": {
"bool": {
"must": [
{"match": {"title": "user search term"}}
],
"filter": [
{"term": {"category": "electronics"}},
{"range": {"price": {"gte": 1000}}}
]
}
}
}
#### 6.2 Multi-field Search
POST /shop_product/_search
{
"query": {
"multi_match": {
"query": "iPhone",
"fields": ["title^2", "description", "tags"]
}
}
}
#### 6.3 Nested Object Query
POST /shop_product/_search
{
"query": {
"nested": {
"path": "reviews",
"query": {
"bool": {
"must": [
{"match": {"reviews.comment": "good"}},
{"range": {"reviews.rating": {"gte": 4}}}
]
}
}
}
}
}
### 7. Query Performance Monitoring
# Monitor query performance
GET /_nodes/stats/indices/search
# View slow query logs
GET /_nodes/stats/indices/search?filter_path=*.search.query_time_in_millis
# Monitor cluster query status
GET /_cluster/health?pretty
### 8. Query Error Handling
#### 8.1 Handle Query Syntax Errors
# Incorrect query
POST /shop_product/_search
{
"query": {
"match": {
"title": "iPhone"
// Missing closing bracket
}
}
}
# Error response
{
"error": {
"type": "parsing_exception",
"reason": "Unexpected end-of-input"
}
}
#### 8.2 Handle Field Not Exist Error
# Query non-existent field
POST /shop_product/_search
{
"query": {
"match": {
"non_existent_field": "value"
}
}
}
# Response (no error, but no documents matched)
{
"hits": {
"total": {"value": 0, "relation": "eq"},
"hits": []
}
}
### 9. Query Caching Strategy
# Use query caching
POST /shop_product/_search
{
"query": {
"bool": {
"filter": [
{"term": {"category": "electronics"}}
]
}
}
}
# Cache will be automatically used, improving performance for repeated queries
### 10. Summary
The query:{} functionality in ES is the core of search. Mastering various query types and optimization techniques is crucial for building high-performance search applications:
- Basic queries: match_all, match, term, range
- Compound queries: bool query combining multiple conditions
- Performance optimization: reasonable use of filter, _source filtering, size control
- Debugging techniques: explain and profile parameters
- Best practices: avoid deep pagination, reasonable aggregation use, performance monitoring
By properly using these query features, you can build efficient and accurate search systems.
主题测试文章,只做测试使用。发布者:Walker,转转请注明出处:https://walker-learn.xyz/archives/4783