Go Engineer System Course 010 [Study Notes]

# ES Installation

elasticsearch (understand as a library) kibana (understand as a connection tool)
The versions of ES and Kibana (5601) must be consistent

## Learning Elasticsearch (ES) with MySQL Comparison

### Terminology Comparison

MySQL	Elasticsearch
database	index (index)
table	type (fixed as `_doc` from 7.x, completely removed in 8.x)
row	document (document)
column	field (field)
schema	mapping (mapping)
sql	DSL (Domain Specific Language query syntax)

Note: Starting from ES 7.x, an index can only have one type, typically named _doc; in 8.x, the type concept is basically invisible externally, and modeling is done directly against "index + mapping".

### Core Concepts Overview

Index (index):
Similar to a "database" in a relational database, a logical collection of documents of the same type, stored internally with primary shards and replica shards.
Document (document):
Similar to a row of data, a JSON object, uniquely identified by _id, can be auto-generated by ES or customized.
Field (field):
An attribute of a document, similar to a column. Field type affects how inverted indexes are built and what queries are available.
Mapping (mapping):
Similar to table structure definition, declares field types, indexing, and tokenization methods. Once published, field types are basically immutable (requires rebuilding the index and re-importing data).
Tokenization and Analyzer (analyzer):
How text is split into terms and written to the inverted index, determining full-text search effectiveness (Chinese commonly uses third-party plugins like ik_max_word/ik_smart).

### Modeling Guide (Comparing with MySQL Thinking)

First determine query dimensions and retrieval methods, then design fields and mappings; don't blindly copy MySQL's third normal form.
Moderate redundancy, eliminate joins: ES has no cross-index joins, queries are per-index; complex scenarios use denormalization or nested/parent-child.
Distinguish clearly between numeric, time, keyword, and text:
keyword: exact matching, aggregation, sorting;
text: full-text search (tokenized), not suitable for aggregation/sorting;
Use date for time, geo_point for geographic location, etc.

### Common Operations Comparison

Create Index (with Mapping)

SQL (create database/table/fields):

-- MySQL example
CREATE DATABASE shop;
CREATE TABLE product (
  id BIGINT PRIMARY KEY,
  title VARCHAR(255),
  price DECIMAL(10,2),
  tags JSON
);

ES (create index + mapping):

PUT /shop_product
{
  "settings": {"number_of_shards": 1, "number_of_replicas": 1},
  "mappings": {
    "properties": {
      "title": {"type": "text", "analyzer": "standard"},
      "price": {"type": "double"},
      "tags": {"type": "keyword"},
      "createdAt": {"type": "date"}
    }
  }
}

Insert a Row/Document

SQL:

INSERT INTO product(id,title,price) VALUES(1,'iPhone',5999.00);

ES:

POST /shop_product/_doc/1
{
  "title": "iPhone",
  "price": 5999.00,
  "tags": ["phone", "apple"],
  "createdAt": "2025-09-18T12:00:00Z"
}

Query by Primary Key

SQL:

SELECT * FROM product WHERE id=1;

ES:

GET /shop_product/_doc/1

Conditional Query (DSL vs SQL)

SQL:

SELECT id,title FROM product
WHERE price BETWEEN 3000 AND 8000 AND title LIKE '%phone%'
ORDER BY price DESC LIMIT 10 OFFSET 0;

ES:

POST /shop_product/_search
{
  "from": 0,
  "size": 10,
  "sort": [{"price": "desc"}],
  "_source": ["id","title","price"],
  "query": {
    "bool": {
      "must": [ {"match": {"title": "phone"}} ],
      "filter": [ {"range": {"price": {"gte": 3000, "lte": 8000}}} ]
    }
  }
}

Update and Delete

SQL: UPDATE ... WHERE id=? / DELETE FROM ... WHERE id=?

ES:

POST /shop_product/_update/1
{"doc": {"price": 5799}}

DELETE /shop_product/_doc/1

Aggregation (GROUP BY Comparison)

SQL:

SELECT tags, COUNT(*) AS cnt FROM product GROUP BY tags;

ES:

POST /shop_product/_search
{
  "size": 0,
  "aggs": {
    "by_tag": {"terms": {"field": "tags"}}
  }
}

### Index Lifecycle and Performance Key Points

The number of shards is determined when creating the index and can only be adjusted later by rebuilding; the number of replicas can be adjusted online.
Write-heavy scenarios: reduce replicas, increase refresh interval; read-heavy: increase replicas appropriately, enable caching and suitable field doc_values.
For major mapping/type changes, use index rebuild (reindex): new index -> import data -> switch alias.

### Kibana and Port

Kibana default port is 5601, version must be consistent with ES (7.x with 7.x, 8.x with 8.x).
You can directly paste the REST/DSL examples above in Kibana Dev Tools to execute.

## We Mainly Use ES's Query Functionality

GET _search?q=bobby // will query all indexes

Query through request body

### Request Body Query Detailed Explanation

ES supports two query methods:

URL Parameter Query: GET _search?q=field:value (simple and fast)
Request Body Query: POST _search + JSON body (powerful, recommended)

#### Why Recommend Request Body Query?

Complete Functionality: supports complex queries, aggregations, sorting, pagination, etc.
Strong Readability: JSON structure is clear, easy to maintain
Better Performance: avoids URL length limits, supports more complex query logic
Debug Friendly: directly paste and execute in Kibana Dev Tools

#### Basic Query Examples

Simple Match Query

POST /shop_product/_search
{
  "query": {
    "match": {
      "title": "iPhone"
    }
  }
}

Multi-condition Combined Query

POST /shop_product/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "phone"}},
        {"range": {"price": {"gte": 1000, "lte": 8000}}}
      ],
      "must_not": [
        {"term": {"status": "discontinued"}}
      ],
      "should": [
        {"match": {"tags": "apple"}}
      ]
    }
  }
}

Exact Match vs Full-text Search

# Exact match (no tokenization)
POST /shop_product/_search
{
  "query": {
    "term": {
      "tags": "phone"
    }
  }
}

# Full-text search (tokenized)
POST /shop_product/_search
{
  "query": {
    "match": {
      "title": "iPhone 15 Pro"
    }
  }
}

Pagination and Sorting

POST /shop_product/_search
{
  "from": 0,
  "size": 10,
  "sort": [
    {"price": {"order": "desc"}},
    {"_score": {"order": "desc"}}
  ],
  "query": {
    "match_all": {}
  }
}

Specify Return Fields

POST /shop_product/_search
{
  "_source": ["title", "price", "tags"],
  "query": {
    "match_all": {}
  }
}

Aggregation Query (Statistics)

POST /shop_product/_search
{
  "size": 0,
  "aggs": {
    "price_stats": {
      "stats": {"field": "price"}
    },
    "tags_count": {
      "terms": {"field": "tags", "size": 10}
    },
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          {"to": 1000},
          {"from": 1000, "to": 5000},
          {"from": 5000}
        ]
      }
    }
  }
}

Highlighting

POST /shop_product/_search
{
  "query": {
    "match": {"title": "iPhone"}
  },
  "highlight": {
    "fields": {
      "title": {}
    }
  }
}

#### Common Query Type Comparison

Requirement	SQL	ES Request Body
Full table scan	`SELECT * FROM table`	`{"query": {"match_all": {}}}`
Exact match	`WHERE id = 1`	`{"query": {"term": {"id": 1}}}`
Fuzzy match	`WHERE title LIKE '%phone%'`	`{"query": {"match": {"title": "phone"}}}`
Range query	`WHERE price BETWEEN 1000 AND 5000`	`{"query": {"range": {"price": {"gte": 1000, "lte": 5000}}}}`
Multiple conditions AND	`WHERE a=1 AND b=2`	`{"query": {"bool": {"must": [{"term": {"a": 1}}, {"term": {"b": 2}}]}}}`
Multiple conditions OR	`WHERE a=1 OR b=2`	`{"query": {"bool": {"should": [{"term": {"a": 1}}, {"term": {"b": 2}}]}}}`
Group statistics	`SELECT tag, COUNT(*) FROM table GROUP BY tag`	`{"aggs": {"by_tag": {"terms": {"field": "tag"}}}}`

#### Performance Optimization Recommendations

Use filter instead of query: filter doesn't calculate relevance scores, better performance

POST /shop_product/_search
{
  "query": {
    "bool": {
      "filter": [
        {"range": {"price": {"gte": 1000}}}
      ]
    }
  }
}

Use size reasonably: avoid returning large amounts of data at once
Use _source filtering: only return needed fields
Cache common queries: ES automatically caches filter query results

## POST Update Operations Detailed Explanation

ES has two update methods: **overwrite update** and **partial update**, understanding their differences is important.

### 1. Overwrite Update (PUT Method)

**Characteristics**: Completely replaces the entire document, unspecified fields will be deleted

# Original document
{
  "id": 1,
  "title": "iPhone 15",
  "price": 5999,
  "tags": ["phone", "apple"],
  "description": "Latest iPhone",
  "stock": 100
}

# Overwrite update (only specified fields are retained)
PUT /shop_product/_doc/1
{
  "title": "iPhone 15 Pro",
  "price": 7999
}

# Updated document (description and stock fields are deleted)
{
  "id": 1,
  "title": "iPhone 15 Pro",
  "price": 7999
}

### 2. Partial Update (POST _update Method)

**Characteristics**: Only updates specified fields, other fields remain unchanged

# Original document
{
  "id": 1,
  "title": "iPhone 15",
  "price": 5999,
  "tags": ["phone", "apple"],
  "description": "Latest iPhone",
  "stock": 100
}

# Partial update (only specified fields are updated)
POST /shop_product/_update/1
{
  "doc": {
    "title": "iPhone 15 Pro",
    "price": 7999
  }
}

# Updated document (other fields remain unchanged)
{
  "id": 1,
  "title": "iPhone 15 Pro",
  "price": 7999,
  "tags": ["phone", "apple"],
  "description": "Latest iPhone",
  "stock": 100
}

### 3. Advanced Update Operations

#### 3.1 Conditional Update (upsert)

If the document doesn't exist, create it; if it exists, update it:

POST /shop_product/_update/999
{
  "doc": {
    "title": "New Product",
    "price": 1000
  },
  "upsert": {
    "title": "New Product",
    "price": 1000,
    "tags": ["new"],
    "created_at": "2025-01-18"
  }
}

#### 3.2 Script Update

Use scripts for complex updates:

# Increase stock
POST /shop_product/_update/1
{
  "script": {
    "source": "ctx._source.stock += params.increment",
    "params": {
      "increment": 50
    }
  }
}

# Conditional update (only update if price is greater than 5000)
POST /shop_product/_update/1
{
  "script": {
    "source": "if (ctx._source.price > 5000) { ctx._source.price = params.new_price }",
    "params": {
      "new_price": 7500
    }
  }
}

#### 3.3 Array Operations

# Add tag
POST /shop_product/_update/1
{
  "script": {
    "source": "if (ctx._source.tags == null) { ctx._source.tags = [] } ctx._source.tags.add(params.tag)",
    "params": {
      "tag": "premium"
    }
  }
}

# Remove tag
POST /shop_product/_update/1
{
  "script": {
    "source": "ctx._source.tags.removeIf(item -> item == params.tag)",
    "params": {
      "tag": "old"
    }
  }
}

### 4. Bulk Update

POST /shop_product/_bulk
{"update":{"_id":"1"}}
{"doc":{"price":5999}}
{"update":{"_id":"2"}}
{"doc":{"price":6999}}
{"update":{"_id":"3"}}
{"doc":{"price":7999}}

### 5. Update Operations Comparison Table

Operation Method	Method	Characteristics	Use Cases
Overwrite update	`PUT /index/_doc/id`	Completely replace document	Document structure changes significantly, need to delete fields
Partial update	`POST /index/_update/id`	Only update specified fields	Daily business updates, retain other fields
Conditional update	`POST /index/_update/id` + upsert	Update if exists, create if not	Uncertain whether document exists
Script update	`POST /index/_update/id` + script	Complex logic updates	Updates requiring calculations and conditional logic

### 6. Performance Considerations

Partial updates perform better: only transmit changed fields, reduce network overhead
Script updates are slower: require parsing and executing scripts, relatively lower performance
Bulk operations: use _bulk API for large updates to improve efficiency
Version control: ES automatically handles concurrent update conflicts through _version field

## Delete Data Operations Detailed Explanation

ES has multiple delete methods, from deleting single documents to entire indexes. Understanding different deletion methods for different scenarios is important.

### 1. Delete Single Document

#### 1.1 Delete by ID

# Delete document with specified ID
DELETE /shop_product/_doc/1

# Response example
{
  "_index": "shop_product",
  "_id": "1",
  "_version": 2,
  "result": "deleted",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  }
}

#### 1.2 Conditional Delete (by Query)

# Delete all products with price less than 1000
POST /shop_product/_delete_by_query
{
  "query": {
    "range": {
      "price": {
        "lt": 1000
      }
    }
  }
}

# Delete products with specific tag
POST /shop_product/_delete_by_query
{
  "query": {
    "term": {
      "tags": "discontinued"
    }
  }
}

### 2. Bulk Delete

#### 2.1 Using _bulk API

POST /shop_product/_bulk
{"delete":{"_id":"1"}}
{"delete":{"_id":"2"}}
{"delete":{"_id":"3"}}

#### 2.2 Bulk Conditional Delete

# Delete products with multiple conditions
POST /shop_product/_delete_by_query
{
  "query": {
    "bool": {
      "should": [
        {"term": {"status": "discontinued"}},
        {"range": {"last_updated": {"lt": "2020-01-01"}}}
      ]
    }
  }
}

### 3. Delete Entire Index

# Delete entire index (dangerous operation!)
DELETE /shop_product

# Response example
{
  "acknowledged": true
}

### 4. Delete Type in Index (ES versions below 7.x)

# Delete specific type in index (only for 6.x and below)
DELETE /shop_product/product_type

### 5. Advanced Delete Operations

#### 5.1 Delete with Version Control

# Only delete if version matches (prevent concurrent deletes)
DELETE /shop_product/_doc/1?version=1&version_type=external

#### 5.2 Asynchronous Delete of Large Data

# Asynchronous delete (suitable for large data)
POST /shop_product/_delete_by_query
{
  "query": {
    "match_all": {}
  },
  "wait_for_completion": false,
  "conflicts": "proceed"
}

# Response includes task ID for querying delete progress
{
  "task": "r1A2WoRbTwKZ516z6NEs5A:36619"
}

# Query task status
GET /_tasks/r1A2WoRbTwKZ516z6NEs5A:36619

#### 5.3 Keep Snapshot Before Delete

# Create snapshot before delete (backup)
PUT /_snapshot/backup_repo/snapshot_before_delete
{
  "indices": "shop_product",
  "ignore_unavailable": true,
  "include_global_state": false
}

# Then execute delete operation
POST /shop_product/_delete_by_query
{
  "query": {
    "range": {
      "created_at": {
        "lt": "2020-01-01"
      }
    }
  }
}

### 6. Delete Operations Comparison Table

Delete Method	Method	Characteristics	Use Cases
Delete by ID	`DELETE /index/_doc/id`	Precisely delete single document	Delete when document ID is known
Conditional delete	`POST /index/_delete_by_query`	Delete by query conditions	Bulk delete documents matching conditions
Bulk delete	`POST /index/_bulk`	Delete multiple specified documents at once	Delete when multiple document IDs are known
Delete index	`DELETE /index`	Delete entire index	Clean test data or rebuild index
Asynchronous delete	`_delete_by_query` + `wait_for_completion:false`	Non-blocking, executes in background	Delete large data to avoid timeout

### 7. Delete Operations Precautions

#### 7.1 Performance Considerations

# Use scroll query for better performance when deleting large amounts
POST /shop_product/_delete_by_query
{
  "query": {
    "range": {
      "price": {
        "lt": 100
      }
    }
  },
  "scroll_size": 1000,
  "conflicts": "proceed"
}

#### 7.2 Safe Delete

# Query first to confirm before delete
POST /shop_product/_search
{
  "query": {
    "range": {
      "price": {
        "lt": 100
      }
    }
  },
  "size": 0
}

# Execute delete after confirming
POST /shop_product/_delete_by_query
{
  "query": {
    "range": {
      "price": {
        "lt": 100
      }
    }
  }
}

#### 7.3 Delete Monitoring

# Monitor delete progress
GET /_tasks?detailed=true&actions=*delete*

# Cancel delete task
POST /_tasks/task_id/_cancel

### 8. Delete vs Soft Delete

In actual business, soft delete is usually used instead of physical delete:

# Soft delete: mark as deleted
POST /shop_product/_update/1
{
  "doc": {
    "deleted": true,
    "deleted_at": "2025-01-18T12:00:00Z"
  }
}

# Exclude deleted documents when querying
POST /shop_product/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "iPhone"}}
      ],
      "must_not": [
        {"term": {"deleted": true}}
      ]
    }
  }
}

### 9. Recover Deleted Data

ES deleted data cannot be directly recovered, but can be recovered through:

Restore from snapshot: if snapshot was created previously
Restore from backup: if data backup exists
Re-import: re-import from original data source

# Restore index from snapshot
POST /_snapshot/backup_repo/snapshot_name/_restore
{
  "indices": "shop_product",
  "ignore_unavailable": true,
  "include_global_state": false
}

## Bulk Insert Operations Detailed Explanation

Bulk insert in ES is an efficient way to handle large amounts of data. Through the _bulk API, multiple operations including insert, update, delete, etc. can be executed at once.

### 1. Basic Bulk Insert

#### 1.1 Insert Using _bulk API

POST /shop_product/_bulk
{"index":{"_id":"1"}}
{"title":"iPhone 15","price":5999,"tags":["phone","apple"],"stock":100}
{"index":{"_id":"2"}}
{"title":"Samsung Galaxy","price":4999,"tags":["phone","android"],"stock":50}
{"index":{"_id":"3"}}
{"title":"MacBook Pro","price":12999,"tags":["laptop","apple"],"stock":20}

#### 1.2 Bulk Insert with Auto-generated IDs

POST /shop_product/_bulk
{"index":{}}
{"title":"iPad Air","price":3999,"tags":["tablet","apple"],"stock":30}
{"index":{}}
{"title":"Dell XPS","price":8999,"tags":["laptop","windows"],"stock":15}
{"index":{}}
{"title":"Surface Pro","price":6999,"tags":["tablet","windows"],"stock":25}

### 2. Mixed Bulk Operations

POST /shop_product/_bulk
{"index":{"_id":"10"}}
{"title":"New Product 1","price":1000,"tags":["new"],"stock":100}
{"update":{"_id":"1"}}
{"doc":{"price":5799}}
{"delete":{"_id":"2"}}
{"index":{"_id":"11"}}
{"title":"New Product 2","price":2000,"tags":["new"],"stock":200}

### 3. Bulk Insert Response Handling

# Bulk operation response example
{
  "took": 30,
  "errors": false,
  "items": [
    {
      "index": {
        "_index": "shop_product",
        "_id": "1",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "status": 201
      }
    },
    {
      "index": {
        "_index": "shop_product",
        "_id": "2",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "status": 201
      }
    }
  ]
}

### 4. Error Handling

#### 4.1 Check for Errors in Bulk Operations

# Bulk operation with errors
POST /shop_product/_bulk
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}
{"index":{"_id":"1"}}  # Duplicate ID, will produce error
{"title":"Product 2","price":2000}

# Error information in response
{
  "took": 5,
  "errors": true,
  "items": [
    {
      "index": {
        "_index": "shop_product",
        "_id": "1",
        "status": 201,
        "result": "created"
      }
    },
    {
      "index": {
        "_index": "shop_product",
        "_id": "1",
        "status": 409,
        "error": {
          "type": "version_conflict_engine_exception",
          "reason": "[1]: version conflict, document already exists"
        }
      }
    }
  ]
}

#### 4.2 Handle Partial Failures

# Use filter_path to return only error items
POST /shop_product/_bulk?filter_path=items.*.error
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}
{"index":{"_id":"1"}}
{"title":"Product 2","price":2000}

### 5. Performance Optimization

#### 5.1 Batch Size Control

# Recommended batch size: 1000-5000 documents
POST /shop_product/_bulk
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}
# ... more documents
{"index":{"_id":"1000"}}
{"title":"Product 1000","price":1000}

#### 5.2 Refresh Strategy

# Don't refresh immediately during bulk insert (improves performance)
POST /shop_product/_bulk?refresh=false
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}
{"index":{"_id":"2"}}
{"title":"Product 2","price":2000}

# Manually refresh after bulk insert
POST /shop_product/_refresh

#### 5.3 Concurrency Control

# Set timeout for bulk operation
POST /shop_product/_bulk?timeout=60s
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}

### 6. Bulk Import from File

#### 6.1 Prepare Data File

# data.json file content
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000,"tags":["tag1"],"stock":100}
{"index":{"_id":"2"}}
{"title":"Product 2","price":2000,"tags":["tag2"],"stock":200}
{"index":{"_id":"3"}}
{"title":"Product 3","price":3000,"tags":["tag3"],"stock":300}

#### 6.2 Import Using curl

# Bulk import from file
curl -X POST "localhost:9200/shop_product/_bulk" \
  -H "Content-Type: application/json" \
  --data-binary @data.json

### 7. Bulk Insert Best Practices

#### 7.1 Data Preprocessing

# Create index and mapping before bulk insert
PUT /shop_product
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "title": {"type": "text", "analyzer": "standard"},
      "price": {"type": "double"},
      "tags": {"type": "keyword"},
      "stock": {"type": "integer"}
    }
  }
}

#### 7.2 Process Large Data in Batches

# Batch insert large amounts of data
POST /shop_product/_bulk
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}
# ... 1000 documents

# Wait a while then continue with next batch
POST /shop_product/_bulk
{"index":{"_id":"1001"}}
{"title":"Product 1001","price":1000}
# ... next 1000 documents

#### 7.3 Monitor Bulk Insert Progress

# Check index status
GET /shop_product/_stats

# Check document count
GET /shop_product/_count

# Check index health status
GET /_cluster/health/shop_product

### 8. Bulk Insert vs Single Insert Comparison

Operation Method	Method	Performance	Use Cases
Single insert	`POST /index/_doc`	Slower	Small amounts of data, real-time insert
Bulk insert	`POST /index/_bulk`	Very fast	Large amounts of data, batch import
Mixed operations	`POST /index/_bulk`	Medium	Need to insert, update, delete simultaneously

### 9. Common Issues Resolution

#### 9.1 Insufficient Memory

# Reduce batch size
POST /shop_product/_bulk
# Only include 500 documents instead of 1000

#### 9.2 Timeout Issues

# Increase timeout
POST /shop_product/_bulk?timeout=120s

#### 9.3 Version Conflict

# Use upsert to avoid version conflict
POST /shop_product/_bulk
{"update":{"_id":"1"}}
{"doc":{"price":1000},"upsert":{"title":"Product 1","price":1000}}

### 10. Bulk Insert Monitoring

# Monitor bulk operation performance
GET /_nodes/stats/indices/indexing

# View index statistics
GET /shop_product/_stats/indexing

# Monitor cluster status
GET /_cluster/health?pretty

## mget Bulk Get Operations Detailed Explanation

The mget (multi-get) API in ES allows fetching multiple documents at once, more efficient than individual fetches, especially suitable for scenarios requiring batch reading.

### 1. Basic Bulk Get

#### 1.1 Get Multiple Documents from Same Index

POST /shop_product/_mget
{
  "docs": [
    {"_id": "1"},
    {"_id": "2"},
    {"_id": "3"}
  ]
}

#### 1.2 Get Documents from Different Indexes

POST /_mget
{
  "docs": [
    {"_index": "shop_product", "_id": "1"},
    {"_index": "shop_user", "_id": "100"},
    {"_index": "shop_order", "_id": "200"}
  ]
}

#### 1.3 Specify Return Fields

POST /shop_product/_mget
{
  "docs": [
    {
      "_id": "1",
      "_source": ["title", "price"]
    },
    {
      "_id": "2",
      "_source": ["title", "tags"]
    }
  ]
}

### 2. Bulk Get Response Handling

# mget response example
{
  "docs": [
    {
      "_index": "shop_product",
      "_id": "1",
      "_version": 1,
      "_seq_no": 0,
      "_primary_term": 1,
      "found": true,
      "_source": {
        "title": "iPhone 15",
        "price": 5999,
        "tags": ["phone", "apple"],
        "stock": 100
      }
    },
    {
      "_index": "shop_product",
      "_id": "2",
      "_version": 1,
      "_seq_no": 1,
      "_primary_term": 1,
      "found": true,
      "_source": {
        "title": "Samsung Galaxy",
        "price": 4999,
        "tags": ["phone", "android"],
        "stock": 50
      }
    },
    {
      "_index": "shop_product",
      "_id": "999",
      "found": false
    }
  ]
}

### 3. Advanced Bulk Get

#### 3.1 Using ids Parameter (Simplified Syntax)

POST /shop_product/_mget
{
  "ids": ["1", "2", "3", "999"]
}

#### 3.2 Exclude Unnecessary Fields

POST /shop_product/_mget
{
  "docs": [
    {
      "_id": "1",
      "_source": {
        "excludes": ["description", "created_at"]
      }
    },
    {
      "_id": "2",
      "_source": {
        "includes": ["title", "price"]
      }
    }
  ]
}

#### 3.3 Get Stored Fields

POST /shop_product/_mget
{
  "docs": [
    {
      "_id": "1",
      "stored_fields": ["title", "price"]
    }
  ]
}

### 4. Bulk Get Performance Optimization

#### 4.1 Control Batch Size Reasonably

# Recommended batch size: 100-1000 documents
POST /shop_product/_mget
{
  "ids": [
    "1", "2", "3", "4", "5",
    # ... more IDs
    "100"
  ]
}

#### 4.2 Use Routing for Optimization

POST /shop_product/_mget
{
  "docs": [
    {
      "_id": "1",
      "routing": "user123"
    },
    {
      "_id": "2",
      "routing": "user123"
    }
  ]
}

### 5. Error Handling

#### 5.1 Handle Non-existent Documents

POST /shop_product/_mget
{
  "ids": ["1", "999", "2"]
}

# Response includes documents with found: false
{
  "docs": [
    {
      "_index": "shop_product",
      "_id": "1",
      "found": true,
      "_source": {...}
    },
    {
      "_index": "shop_product",
      "_id": "999",
      "found": false
    },
    {
      "_index": "shop_product",
      "_id": "2",
      "found": true,
      "_source": {...}
    }
  ]
}

#### 5.2 Filter Non-existent Documents

# Return only found documents
POST /shop_product/_mget?filter_path=docs._source
{
  "ids": ["1", "999", "2"]
}

### 6. Real-world Application Scenarios

#### 6.1 Get Shopping Cart Product Information

# Bulk get product information by product IDs in shopping cart
POST /shop_product/_mget
{
  "ids": ["cart_item_1", "cart_item_2", "cart_item_3"]
}

#### 6.2 Get User Order Details

# Bulk get all user order details
POST /shop_order/_mget
{
  "docs": [
    {"_id": "order_001", "_source": ["order_id", "total", "status"]},
    {"_id": "order_002", "_source": ["order_id", "total", "status"]},
    {"_id": "order_003", "_source": ["order_id", "total", "status"]}
  ]
}

#### 6.3 Cross-index Data Association

# Get user information and user orders simultaneously
POST /_mget
{
  "docs": [
    {"_index": "shop_user", "_id": "user_123", "_source": ["name", "email"]},
    {"_index": "shop_order", "_id": "order_456", "_source": ["order_id", "total"]},
    {"_index": "shop_product", "_id": "product_789", "_source": ["title", "price"]}
  ]
}

### 7. mget vs Single Get Comparison

Operation Method	Method	Performance	Use Cases
Single get	`GET /index/_doc/id`	Slower	Get single document
Bulk get	`POST /index/_mget`	Very fast	Bulk get multiple documents
Search get	`POST /index/_search`	Medium	Get documents by conditions

### 8. Bulk Get Best Practices

#### 8.1 Data Preprocessing

# Check index status before bulk get
GET /shop_product/_stats

# Check if documents exist
GET /shop_product/_count

#### 8.2 Process Large Data in Batches

# Batch get large amounts of documents
POST /shop_product/_mget
{
  "ids": ["1", "2", "3", "4", "5"]
  # First batch of 100 documents
}

# Continue with next batch
POST /shop_product/_mget
{
  "ids": ["6", "7", "8", "9", "10"]
  # Next batch of 100 documents
}

#### 8.3 Caching Strategy

# Use caching to improve performance
POST /shop_product/_mget?preference=_local
{
  "ids": ["1", "2", "3"]
}

### 9. Common Issues Resolution

#### 9.1 Insufficient Memory

# Reduce batch size
POST /shop_product/_mget
{
  "ids": ["1", "2", "3"]  # Only get 3 documents
}

#### 9.2 Timeout Issues

# Increase timeout
POST /shop_product/_mget?timeout=60s
{
  "ids": ["1", "2", "3"]
}

#### 9.3 Index Doesn't Exist

# Handle case where index doesn't exist
POST /_mget
{
  "docs": [
    {"_index": "shop_product", "_id": "1"},
    {"_index": "non_existent_index", "_id": "2"}
  ]
}

### 10. Bulk Get Monitoring

# Monitor bulk get performance
GET /_nodes/stats/indices/search

# View index search statistics
GET /shop_product/_stats/search

# Monitor cluster status
GET /_cluster/health?pretty

### 11. Combine with Other Bulk Operations

#### 11.1 Bulk Get + Bulk Update

# First bulk get
POST /shop_product/_mget
{
  "ids": ["1", "2", "3"]
}

# Then bulk update
POST /shop_product/_bulk
{"update":{"_id":"1"}}
{"doc":{"stock":95}}
{"update":{"_id":"2"}}
{"doc":{"stock":45}}
{"update":{"_id":"3"}}
{"doc":{"stock":15}}

#### 11.2 Bulk Get + Search

# First search to get relevant document IDs
POST /shop_product/_search
{
  "query": {"match": {"tags": "phone"}},
  "_source": false,
  "size": 10
}

# Then bulk get detailed information
POST /shop_product/_mget
{
  "ids": ["1", "2", "3", "4", "5"]
}

## ES's query:{} Functionality Detailed Explanation

The query in ES is the core part of a search request, used to define search conditions and logic. query:{} is an empty query object, typically used with match_all.

### 1. query Object Basic Structure

POST /shop_product/_search
{
  "query": {
    // Query conditions defined here
  }
}

### 2. Common Query Types

#### 2.1 match_all Query (Full Table Scan)

POST /shop_product/_search
{
  "query": {
    "match_all": {}
  }
}

**Functionality**:

Matches all documents in the index
Equivalent to SELECT * FROM table in SQL
Commonly used for getting all data or as base query

#### 2.2 match Query (Full-text Search)

POST /shop_product/_search
{
  "query": {
    "match": {
      "title": "iPhone"
    }
  }
}

**Functionality**:

Performs full-text search on specified field
Tokenizes query terms
Supports fuzzy matching and relevance scoring

#### 2.3 term Query (Exact Match)

POST /shop_product/_search
{
  "query": {
    "term": {
      "tags": "phone"
    }
  }
}

**Functionality**:

Exact match without tokenization
Suitable for keyword type fields
Better performance than match query

#### 2.4 range Query (Range Query)

POST /shop_product/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 1000,
        "lte": 5000
      }
    }
  }
}

**Functionality**:

Range query on numeric, date and other fields
Supports gte (greater than or equal), gt (greater than), lte (less than or equal), lt (less than)
Commonly used for price range, time range queries

#### 2.5 bool Query (Compound Query)

POST /shop_product/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "phone"}},
        {"range": {"price": {"gte": 1000}}}
      ],
      "must_not": [
        {"term": {"status": "discontinued"}}
      ],
      "should": [
        {"match": {"tags": "apple"}}
      ],
      "filter": [
        {"term": {"category": "electronics"}}
      ]
    }
  }
}

**Functionality**:

must: must match, affects relevance scoring
must_not: must not match, doesn't affect scoring
should: should match, increases relevance scoring
filter: must match, but doesn't affect scoring, better performance

#### 2.6 wildcard Query (Wildcard Query)

POST /shop_product/_search
{
  "query": {
    "wildcard": {
      "title": {
        "value": "iPh*"
      }
    }
  }
}

**Functionality**:

Supports * (match any characters) and ? (match single character)
Slower performance, not recommended on large data
Suitable for fuzzy matching scenarios

#### 2.7 prefix Query (Prefix Query)

POST /shop_product/_search
{
  "query": {
    "prefix": {
      "title": "iPh"
    }
  }
}

**Functionality**:

Matches documents starting with specified prefix
Suitable for auto-complete, search suggestions
Better performance than wildcard query

#### 2.8 fuzzy Query (Fuzzy Query)

POST /shop_product/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "iphone",
        "fuzziness": "AUTO"
      }
    }
  }
}

**Functionality**:

Supports typo-tolerant queries
fuzziness parameter controls tolerance level
Suitable for search correction scenarios

### 3. Query Performance Optimization

#### 3.1 Use filter Instead of query

POST /shop_product/_search
{
  "query": {
    "bool": {
      "filter": [
        {"range": {"price": {"gte": 1000}}}
      ]
    }
  }
}

**Advantages**:

filter doesn't calculate relevance scores, better performance
Results are cached, repeated queries are faster
Suitable for exact match conditions

#### 3.2 Use _source Filtering Reasonably

POST /shop_product/_search
{
  "_source": ["title", "price"],
  "query": {
    "match_all": {}
  }
}

**Advantages**:

Only return needed fields, reduce network transmission
Improve query performance
Lower memory usage

#### 3.3 Use size to Control Return Quantity

POST /shop_product/_search
{
  "size": 10,
  "query": {
    "match_all": {}
  }
}

**Advantages**:

Avoid returning large amounts of data at once
Improve query response speed
Reduce memory consumption

### 4. Query Debugging Techniques

#### 4.1 Use explain Parameter

POST /shop_product/_search
{
  "explain": true,
  "query": {
    "match": {
      "title": "iPhone"
    }
  }
}

**Functionality**:

Shows scoring calculation process for each document
Helps understand why results are ranked this way
Used for query optimization and debugging

#### 4.2 Use profile Parameter

POST /shop_product/_search
{
  "profile": true,
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "phone"}},
        {"range": {"price": {"gte": 1000}}}
      ]
    }
  }
}

**Functionality**:

Shows detailed timing information of query execution
Helps identify performance bottlenecks
Used for query performance optimization

### 5. Query Best Practices

#### 5.1 Query Structure Optimization

# Good query structure
POST /shop_product/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "phone"}}
      ],
      "filter": [
        {"range": {"price": {"gte": 1000, "lte": 5000}}},
        {"term": {"status": "active"}}
      ]
    }
  },
  "sort": [{"price": "desc"}],
  "size": 20
}

#### 5.2 Avoid Deep Pagination

# Use search_after instead of from/size
POST /shop_product/_search
{
  "query": {"match_all": {}},
  "size": 100,
  "sort": [{"_id": "asc"}],
  "search_after": ["last_doc_id"]
}

#### 5.3 Use Aggregations Reasonably

POST /shop_product/_search
{
  "size": 0,
  "aggs": {
    "price_stats": {
      "stats": {"field": "price"}
    },
    "tags_count": {
      "terms": {"field": "tags", "size": 10}
    }
  }
}

### 6. Common Query Patterns

#### 6.1 Search + Filter

POST /shop_product/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "user search term"}}
      ],
      "filter": [
        {"term": {"category": "electronics"}},
        {"range": {"price": {"gte": 1000}}}
      ]
    }
  }
}

#### 6.2 Multi-field Search

POST /shop_product/_search
{
  "query": {
    "multi_match": {
      "query": "iPhone",
      "fields": ["title^2", "description", "tags"]
    }
  }
}

#### 6.3 Nested Object Query

POST /shop_product/_search
{
  "query": {
    "nested": {
      "path": "reviews",
      "query": {
        "bool": {
          "must": [
            {"match": {"reviews.comment": "good"}},
            {"range": {"reviews.rating": {"gte": 4}}}
          ]
        }
      }
    }
  }
}

### 7. Query Performance Monitoring

# Monitor query performance
GET /_nodes/stats/indices/search

# View slow query logs
GET /_nodes/stats/indices/search?filter_path=*.search.query_time_in_millis

# Monitor cluster query status
GET /_cluster/health?pretty

### 8. Query Error Handling

#### 8.1 Handle Query Syntax Errors

# Incorrect query
POST /shop_product/_search
{
  "query": {
    "match": {
      "title": "iPhone"
      // Missing closing bracket
    }
  }
}

# Error response
{
  "error": {
    "type": "parsing_exception",
    "reason": "Unexpected end-of-input"
  }
}

#### 8.2 Handle Field Not Exist Error

# Query non-existent field
POST /shop_product/_search
{
  "query": {
    "match": {
      "non_existent_field": "value"
    }
  }
}

# Response (no error, but no documents matched)
{
  "hits": {
    "total": {"value": 0, "relation": "eq"},
    "hits": []
  }
}

### 9. Query Caching Strategy

# Use query caching
POST /shop_product/_search
{
  "query": {
    "bool": {
      "filter": [
        {"term": {"category": "electronics"}}
      ]
    }
  }
}

# Cache will be automatically used, improving performance for repeated queries

### 10. Summary

The query:{} functionality in ES is the core of search. Mastering various query types and optimization techniques is crucial for building high-performance search applications:

Basic queries: match_all, match, term, range
Compound queries: bool query combining multiple conditions
Performance optimization: reasonable use of filter, _source filtering, size control
Debugging techniques: explain and profile parameters
Best practices: avoid deep pagination, reasonable aggregation use, performance monitoring

By properly using these query features, you can build efficient and accurate search systems.

主题测试文章，只做测试使用。发布者：Walker，转转请注明出处：https://walker-learn.xyz/archives/4783

Related Posts

Go Engineer Comprehensive Course: protoc-gen-validate Study Notes

In-depth Understanding of ES6 007 [Study Notes]

From 0 to 1: Implementing Micro-frontend Architecture 001 [Study Notes]

Go Engineer Systematic Course 002 [Study Notes]

In-depth Understanding of ES6 006 [Study Notes]