Go Engineer System Course 010 [Study Notes]

Install Elasticsearch (understand as a database) and Kibana (understand as a connection tool). The versions of ES and Kibana (port 5601) must be consistent.

Learning Elasticsearch (ES) by comparison with MySQL: Terminology Mapping
MySQL | Elasticsearch
database | index (索引)
table | type (fixed as _doc from 7.x, multiple types completely removed in 8.x...)

# ES Installation

elasticsearch (understand as a library) kibana (understand as a connection tool)
The versions of ES and Kibana (5601) must be consistent

## Learning Elasticsearch (ES) with MySQL Comparison

### Terminology Comparison

MySQL Elasticsearch
database index (index)
table type (fixed as _doc from 7.x, completely removed in 8.x)
row document (document)
column field (field)
schema mapping (mapping)
sql DSL (Domain Specific Language query syntax)

Note: Starting from ES 7.x, an index can only have one type, typically named _doc; in 8.x, the type concept is basically invisible externally, and modeling is done directly against "index + mapping".

### Core Concepts Overview

  • Index (index):
  • Similar to a "database" in a relational database, a logical collection of documents of the same type, stored internally with primary shards and replica shards.
  • Document (document):
  • Similar to a row of data, a JSON object, uniquely identified by _id, can be auto-generated by ES or customized.
  • Field (field):
  • An attribute of a document, similar to a column. Field type affects how inverted indexes are built and what queries are available.
  • Mapping (mapping):
  • Similar to table structure definition, declares field types, indexing, and tokenization methods. Once published, field types are basically immutable (requires rebuilding the index and re-importing data).
  • Tokenization and Analyzer (analyzer):
  • How text is split into terms and written to the inverted index, determining full-text search effectiveness (Chinese commonly uses third-party plugins like ik_max_word/ik_smart).

### Modeling Guide (Comparing with MySQL Thinking)

  • First determine query dimensions and retrieval methods, then design fields and mappings; don't blindly copy MySQL's third normal form.
  • Moderate redundancy, eliminate joins: ES has no cross-index joins, queries are per-index; complex scenarios use denormalization or nested/parent-child.
  • Distinguish clearly between numeric, time, keyword, and text:
  • keyword: exact matching, aggregation, sorting;
  • text: full-text search (tokenized), not suitable for aggregation/sorting;
  • Use date for time, geo_point for geographic location, etc.

### Common Operations Comparison

  1. Create Index (with Mapping)

SQL (create database/table/fields):

-- MySQL example
CREATE DATABASE shop;
CREATE TABLE product (
  id BIGINT PRIMARY KEY,
  title VARCHAR(255),
  price DECIMAL(10,2),
  tags JSON
);

ES (create index + mapping):

PUT /shop_product
{
  "settings": {"number_of_shards": 1, "number_of_replicas": 1},
  "mappings": {
    "properties": {
      "title": {"type": "text", "analyzer": "standard"},
      "price": {"type": "double"},
      "tags": {"type": "keyword"},
      "createdAt": {"type": "date"}
    }
  }
}
  1. Insert a Row/Document

SQL:

INSERT INTO product(id,title,price) VALUES(1,'iPhone',5999.00);

ES:

POST /shop_product/_doc/1
{
  "title": "iPhone",
  "price": 5999.00,
  "tags": ["phone", "apple"],
  "createdAt": "2025-09-18T12:00:00Z"
}
  1. Query by Primary Key

SQL:

SELECT * FROM product WHERE id=1;

ES:

GET /shop_product/_doc/1
  1. Conditional Query (DSL vs SQL)

SQL:

SELECT id,title FROM product
WHERE price BETWEEN 3000 AND 8000 AND title LIKE '%phone%'
ORDER BY price DESC LIMIT 10 OFFSET 0;

ES:

POST /shop_product/_search
{
  "from": 0,
  "size": 10,
  "sort": [{"price": "desc"}],
  "_source": ["id","title","price"],
  "query": {
    "bool": {
      "must": [ {"match": {"title": "phone"}} ],
      "filter": [ {"range": {"price": {"gte": 3000, "lte": 8000}}} ]
    }
  }
}
  1. Update and Delete

SQL: UPDATE ... WHERE id=? / DELETE FROM ... WHERE id=?

ES:

POST /shop_product/_update/1
{"doc": {"price": 5799}}

DELETE /shop_product/_doc/1
  1. Aggregation (GROUP BY Comparison)

SQL:

SELECT tags, COUNT(*) AS cnt FROM product GROUP BY tags;

ES:

POST /shop_product/_search
{
  "size": 0,
  "aggs": {
    "by_tag": {"terms": {"field": "tags"}}
  }
}

### Index Lifecycle and Performance Key Points

  • The number of shards is determined when creating the index and can only be adjusted later by rebuilding; the number of replicas can be adjusted online.
  • Write-heavy scenarios: reduce replicas, increase refresh interval; read-heavy: increase replicas appropriately, enable caching and suitable field doc_values.
  • For major mapping/type changes, use index rebuild (reindex): new index -> import data -> switch alias.

### Kibana and Port

  • Kibana default port is 5601, version must be consistent with ES (7.x with 7.x, 8.x with 8.x).
  • You can directly paste the REST/DSL examples above in Kibana Dev Tools to execute.

## We Mainly Use ES's Query Functionality

GET _search?q=bobby // will query all indexes

Query through request body

### Request Body Query Detailed Explanation

ES supports two query methods:

  1. URL Parameter Query: GET _search?q=field:value (simple and fast)
  2. Request Body Query: POST _search + JSON body (powerful, recommended)

#### Why Recommend Request Body Query?

  • Complete Functionality: supports complex queries, aggregations, sorting, pagination, etc.
  • Strong Readability: JSON structure is clear, easy to maintain
  • Better Performance: avoids URL length limits, supports more complex query logic
  • Debug Friendly: directly paste and execute in Kibana Dev Tools

#### Basic Query Examples

  1. Simple Match Query
POST /shop_product/_search
{
  "query": {
    "match": {
      "title": "iPhone"
    }
  }
}
  1. Multi-condition Combined Query
POST /shop_product/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "phone"}},
        {"range": {"price": {"gte": 1000, "lte": 8000}}}
      ],
      "must_not": [
        {"term": {"status": "discontinued"}}
      ],
      "should": [
        {"match": {"tags": "apple"}}
      ]
    }
  }
}
  1. Exact Match vs Full-text Search
# Exact match (no tokenization)
POST /shop_product/_search
{
  "query": {
    "term": {
      "tags": "phone"
    }
  }
}

# Full-text search (tokenized)
POST /shop_product/_search
{
  "query": {
    "match": {
      "title": "iPhone 15 Pro"
    }
  }
}
  1. Pagination and Sorting
POST /shop_product/_search
{
  "from": 0,
  "size": 10,
  "sort": [
    {"price": {"order": "desc"}},
    {"_score": {"order": "desc"}}
  ],
  "query": {
    "match_all": {}
  }
}
  1. Specify Return Fields
POST /shop_product/_search
{
  "_source": ["title", "price", "tags"],
  "query": {
    "match_all": {}
  }
}
  1. Aggregation Query (Statistics)
POST /shop_product/_search
{
  "size": 0,
  "aggs": {
    "price_stats": {
      "stats": {"field": "price"}
    },
    "tags_count": {
      "terms": {"field": "tags", "size": 10}
    },
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          {"to": 1000},
          {"from": 1000, "to": 5000},
          {"from": 5000}
        ]
      }
    }
  }
}
  1. Highlighting
POST /shop_product/_search
{
  "query": {
    "match": {"title": "iPhone"}
  },
  "highlight": {
    "fields": {
      "title": {}
    }
  }
}

#### Common Query Type Comparison

Requirement SQL ES Request Body
Full table scan SELECT * FROM table {"query": {"match_all": {}}}
Exact match WHERE id = 1 {"query": {"term": {"id": 1}}}
Fuzzy match WHERE title LIKE '%phone%' {"query": {"match": {"title": "phone"}}}
Range query WHERE price BETWEEN 1000 AND 5000 {"query": {"range": {"price": {"gte": 1000, "lte": 5000}}}}
Multiple conditions AND WHERE a=1 AND b=2 {"query": {"bool": {"must": [{"term": {"a": 1}}, {"term": {"b": 2}}]}}}
Multiple conditions OR WHERE a=1 OR b=2 {"query": {"bool": {"should": [{"term": {"a": 1}}, {"term": {"b": 2}}]}}}
Group statistics SELECT tag, COUNT(*) FROM table GROUP BY tag {"aggs": {"by_tag": {"terms": {"field": "tag"}}}}

#### Performance Optimization Recommendations

  1. Use filter instead of query: filter doesn't calculate relevance scores, better performance
POST /shop_product/_search
{
  "query": {
    "bool": {
      "filter": [
        {"range": {"price": {"gte": 1000}}}
      ]
    }
  }
}
  1. Use size reasonably: avoid returning large amounts of data at once
  2. Use _source filtering: only return needed fields
  3. Cache common queries: ES automatically caches filter query results

## POST Update Operations Detailed Explanation

ES has two update methods: **overwrite update** and **partial update**, understanding their differences is important.

### 1. Overwrite Update (PUT Method)

**Characteristics**: Completely replaces the entire document, unspecified fields will be deleted

# Original document
{
  "id": 1,
  "title": "iPhone 15",
  "price": 5999,
  "tags": ["phone", "apple"],
  "description": "Latest iPhone",
  "stock": 100
}

# Overwrite update (only specified fields are retained)
PUT /shop_product/_doc/1
{
  "title": "iPhone 15 Pro",
  "price": 7999
}

# Updated document (description and stock fields are deleted)
{
  "id": 1,
  "title": "iPhone 15 Pro",
  "price": 7999
}

### 2. Partial Update (POST _update Method)

**Characteristics**: Only updates specified fields, other fields remain unchanged

# Original document
{
  "id": 1,
  "title": "iPhone 15",
  "price": 5999,
  "tags": ["phone", "apple"],
  "description": "Latest iPhone",
  "stock": 100
}

# Partial update (only specified fields are updated)
POST /shop_product/_update/1
{
  "doc": {
    "title": "iPhone 15 Pro",
    "price": 7999
  }
}

# Updated document (other fields remain unchanged)
{
  "id": 1,
  "title": "iPhone 15 Pro",
  "price": 7999,
  "tags": ["phone", "apple"],
  "description": "Latest iPhone",
  "stock": 100
}

### 3. Advanced Update Operations

#### 3.1 Conditional Update (upsert)

If the document doesn't exist, create it; if it exists, update it:

POST /shop_product/_update/999
{
  "doc": {
    "title": "New Product",
    "price": 1000
  },
  "upsert": {
    "title": "New Product",
    "price": 1000,
    "tags": ["new"],
    "created_at": "2025-01-18"
  }
}

#### 3.2 Script Update

Use scripts for complex updates:

# Increase stock
POST /shop_product/_update/1
{
  "script": {
    "source": "ctx._source.stock += params.increment",
    "params": {
      "increment": 50
    }
  }
}

# Conditional update (only update if price is greater than 5000)
POST /shop_product/_update/1
{
  "script": {
    "source": "if (ctx._source.price > 5000) { ctx._source.price = params.new_price }",
    "params": {
      "new_price": 7500
    }
  }
}

#### 3.3 Array Operations

# Add tag
POST /shop_product/_update/1
{
  "script": {
    "source": "if (ctx._source.tags == null) { ctx._source.tags = [] } ctx._source.tags.add(params.tag)",
    "params": {
      "tag": "premium"
    }
  }
}

# Remove tag
POST /shop_product/_update/1
{
  "script": {
    "source": "ctx._source.tags.removeIf(item -> item == params.tag)",
    "params": {
      "tag": "old"
    }
  }
}

### 4. Bulk Update

POST /shop_product/_bulk
{"update":{"_id":"1"}}
{"doc":{"price":5999}}
{"update":{"_id":"2"}}
{"doc":{"price":6999}}
{"update":{"_id":"3"}}
{"doc":{"price":7999}}

### 5. Update Operations Comparison Table

Operation Method Method Characteristics Use Cases
Overwrite update PUT /index/_doc/id Completely replace document Document structure changes significantly, need to delete fields
Partial update POST /index/_update/id Only update specified fields Daily business updates, retain other fields
Conditional update POST /index/_update/id + upsert Update if exists, create if not Uncertain whether document exists
Script update POST /index/_update/id + script Complex logic updates Updates requiring calculations and conditional logic

### 6. Performance Considerations

  1. Partial updates perform better: only transmit changed fields, reduce network overhead
  2. Script updates are slower: require parsing and executing scripts, relatively lower performance
  3. Bulk operations: use _bulk API for large updates to improve efficiency
  4. Version control: ES automatically handles concurrent update conflicts through _version field

## Delete Data Operations Detailed Explanation

ES has multiple delete methods, from deleting single documents to entire indexes. Understanding different deletion methods for different scenarios is important.

### 1. Delete Single Document

#### 1.1 Delete by ID

# Delete document with specified ID
DELETE /shop_product/_doc/1

# Response example
{
  "_index": "shop_product",
  "_id": "1",
  "_version": 2,
  "result": "deleted",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  }
}

#### 1.2 Conditional Delete (by Query)

# Delete all products with price less than 1000
POST /shop_product/_delete_by_query
{
  "query": {
    "range": {
      "price": {
        "lt": 1000
      }
    }
  }
}

# Delete products with specific tag
POST /shop_product/_delete_by_query
{
  "query": {
    "term": {
      "tags": "discontinued"
    }
  }
}

### 2. Bulk Delete

#### 2.1 Using _bulk API

POST /shop_product/_bulk
{"delete":{"_id":"1"}}
{"delete":{"_id":"2"}}
{"delete":{"_id":"3"}}

#### 2.2 Bulk Conditional Delete

# Delete products with multiple conditions
POST /shop_product/_delete_by_query
{
  "query": {
    "bool": {
      "should": [
        {"term": {"status": "discontinued"}},
        {"range": {"last_updated": {"lt": "2020-01-01"}}}
      ]
    }
  }
}

### 3. Delete Entire Index

# Delete entire index (dangerous operation!)
DELETE /shop_product

# Response example
{
  "acknowledged": true
}

### 4. Delete Type in Index (ES versions below 7.x)

# Delete specific type in index (only for 6.x and below)
DELETE /shop_product/product_type

### 5. Advanced Delete Operations

#### 5.1 Delete with Version Control

# Only delete if version matches (prevent concurrent deletes)
DELETE /shop_product/_doc/1?version=1&version_type=external

#### 5.2 Asynchronous Delete of Large Data

# Asynchronous delete (suitable for large data)
POST /shop_product/_delete_by_query
{
  "query": {
    "match_all": {}
  },
  "wait_for_completion": false,
  "conflicts": "proceed"
}

# Response includes task ID for querying delete progress
{
  "task": "r1A2WoRbTwKZ516z6NEs5A:36619"
}

# Query task status
GET /_tasks/r1A2WoRbTwKZ516z6NEs5A:36619

#### 5.3 Keep Snapshot Before Delete

# Create snapshot before delete (backup)
PUT /_snapshot/backup_repo/snapshot_before_delete
{
  "indices": "shop_product",
  "ignore_unavailable": true,
  "include_global_state": false
}

# Then execute delete operation
POST /shop_product/_delete_by_query
{
  "query": {
    "range": {
      "created_at": {
        "lt": "2020-01-01"
      }
    }
  }
}

### 6. Delete Operations Comparison Table

Delete Method Method Characteristics Use Cases
Delete by ID DELETE /index/_doc/id Precisely delete single document Delete when document ID is known
Conditional delete POST /index/_delete_by_query Delete by query conditions Bulk delete documents matching conditions
Bulk delete POST /index/_bulk Delete multiple specified documents at once Delete when multiple document IDs are known
Delete index DELETE /index Delete entire index Clean test data or rebuild index
Asynchronous delete _delete_by_query + wait_for_completion:false Non-blocking, executes in background Delete large data to avoid timeout

### 7. Delete Operations Precautions

#### 7.1 Performance Considerations

# Use scroll query for better performance when deleting large amounts
POST /shop_product/_delete_by_query
{
  "query": {
    "range": {
      "price": {
        "lt": 100
      }
    }
  },
  "scroll_size": 1000,
  "conflicts": "proceed"
}

#### 7.2 Safe Delete

# Query first to confirm before delete
POST /shop_product/_search
{
  "query": {
    "range": {
      "price": {
        "lt": 100
      }
    }
  },
  "size": 0
}

# Execute delete after confirming
POST /shop_product/_delete_by_query
{
  "query": {
    "range": {
      "price": {
        "lt": 100
      }
    }
  }
}

#### 7.3 Delete Monitoring

# Monitor delete progress
GET /_tasks?detailed=true&actions=*delete*

# Cancel delete task
POST /_tasks/task_id/_cancel

### 8. Delete vs Soft Delete

In actual business, soft delete is usually used instead of physical delete:

# Soft delete: mark as deleted
POST /shop_product/_update/1
{
  "doc": {
    "deleted": true,
    "deleted_at": "2025-01-18T12:00:00Z"
  }
}

# Exclude deleted documents when querying
POST /shop_product/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "iPhone"}}
      ],
      "must_not": [
        {"term": {"deleted": true}}
      ]
    }
  }
}

### 9. Recover Deleted Data

ES deleted data cannot be directly recovered, but can be recovered through:

  1. Restore from snapshot: if snapshot was created previously
  2. Restore from backup: if data backup exists
  3. Re-import: re-import from original data source
# Restore index from snapshot
POST /_snapshot/backup_repo/snapshot_name/_restore
{
  "indices": "shop_product",
  "ignore_unavailable": true,
  "include_global_state": false
}

## Bulk Insert Operations Detailed Explanation

Bulk insert in ES is an efficient way to handle large amounts of data. Through the _bulk API, multiple operations including insert, update, delete, etc. can be executed at once.

### 1. Basic Bulk Insert

#### 1.1 Insert Using _bulk API

POST /shop_product/_bulk
{"index":{"_id":"1"}}
{"title":"iPhone 15","price":5999,"tags":["phone","apple"],"stock":100}
{"index":{"_id":"2"}}
{"title":"Samsung Galaxy","price":4999,"tags":["phone","android"],"stock":50}
{"index":{"_id":"3"}}
{"title":"MacBook Pro","price":12999,"tags":["laptop","apple"],"stock":20}

#### 1.2 Bulk Insert with Auto-generated IDs

POST /shop_product/_bulk
{"index":{}}
{"title":"iPad Air","price":3999,"tags":["tablet","apple"],"stock":30}
{"index":{}}
{"title":"Dell XPS","price":8999,"tags":["laptop","windows"],"stock":15}
{"index":{}}
{"title":"Surface Pro","price":6999,"tags":["tablet","windows"],"stock":25}

### 2. Mixed Bulk Operations

POST /shop_product/_bulk
{"index":{"_id":"10"}}
{"title":"New Product 1","price":1000,"tags":["new"],"stock":100}
{"update":{"_id":"1"}}
{"doc":{"price":5799}}
{"delete":{"_id":"2"}}
{"index":{"_id":"11"}}
{"title":"New Product 2","price":2000,"tags":["new"],"stock":200}

### 3. Bulk Insert Response Handling

# Bulk operation response example
{
  "took": 30,
  "errors": false,
  "items": [
    {
      "index": {
        "_index": "shop_product",
        "_id": "1",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "status": 201
      }
    },
    {
      "index": {
        "_index": "shop_product",
        "_id": "2",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "status": 201
      }
    }
  ]
}

### 4. Error Handling

#### 4.1 Check for Errors in Bulk Operations

# Bulk operation with errors
POST /shop_product/_bulk
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}
{"index":{"_id":"1"}}  # Duplicate ID, will produce error
{"title":"Product 2","price":2000}

# Error information in response
{
  "took": 5,
  "errors": true,
  "items": [
    {
      "index": {
        "_index": "shop_product",
        "_id": "1",
        "status": 201,
        "result": "created"
      }
    },
    {
      "index": {
        "_index": "shop_product",
        "_id": "1",
        "status": 409,
        "error": {
          "type": "version_conflict_engine_exception",
          "reason": "[1]: version conflict, document already exists"
        }
      }
    }
  ]
}

#### 4.2 Handle Partial Failures

# Use filter_path to return only error items
POST /shop_product/_bulk?filter_path=items.*.error
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}
{"index":{"_id":"1"}}
{"title":"Product 2","price":2000}

### 5. Performance Optimization

#### 5.1 Batch Size Control

# Recommended batch size: 1000-5000 documents
POST /shop_product/_bulk
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}
# ... more documents
{"index":{"_id":"1000"}}
{"title":"Product 1000","price":1000}

#### 5.2 Refresh Strategy

# Don't refresh immediately during bulk insert (improves performance)
POST /shop_product/_bulk?refresh=false
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}
{"index":{"_id":"2"}}
{"title":"Product 2","price":2000}

# Manually refresh after bulk insert
POST /shop_product/_refresh

#### 5.3 Concurrency Control

# Set timeout for bulk operation
POST /shop_product/_bulk?timeout=60s
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}

### 6. Bulk Import from File

#### 6.1 Prepare Data File

# data.json file content
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000,"tags":["tag1"],"stock":100}
{"index":{"_id":"2"}}
{"title":"Product 2","price":2000,"tags":["tag2"],"stock":200}
{"index":{"_id":"3"}}
{"title":"Product 3","price":3000,"tags":["tag3"],"stock":300}

#### 6.2 Import Using curl

# Bulk import from file
curl -X POST "localhost:9200/shop_product/_bulk" \
  -H "Content-Type: application/json" \
  --data-binary @data.json

### 7. Bulk Insert Best Practices

#### 7.1 Data Preprocessing

# Create index and mapping before bulk insert
PUT /shop_product
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "title": {"type": "text", "analyzer": "standard"},
      "price": {"type": "double"},
      "tags": {"type": "keyword"},
      "stock": {"type": "integer"}
    }
  }
}

#### 7.2 Process Large Data in Batches

# Batch insert large amounts of data
POST /shop_product/_bulk
{"index":{"_id":"1"}}
{"title":"Product 1","price":1000}
# ... 1000 documents

# Wait a while then continue with next batch
POST /shop_product/_bulk
{"index":{"_id":"1001"}}
{"title":"Product 1001","price":1000}
# ... next 1000 documents

#### 7.3 Monitor Bulk Insert Progress

# Check index status
GET /shop_product/_stats

# Check document count
GET /shop_product/_count

# Check index health status
GET /_cluster/health/shop_product

### 8. Bulk Insert vs Single Insert Comparison

Operation Method Method Performance Use Cases
Single insert POST /index/_doc Slower Small amounts of data, real-time insert
Bulk insert POST /index/_bulk Very fast Large amounts of data, batch import
Mixed operations POST /index/_bulk Medium Need to insert, update, delete simultaneously

### 9. Common Issues Resolution

#### 9.1 Insufficient Memory

# Reduce batch size
POST /shop_product/_bulk
# Only include 500 documents instead of 1000

#### 9.2 Timeout Issues

# Increase timeout
POST /shop_product/_bulk?timeout=120s

#### 9.3 Version Conflict

# Use upsert to avoid version conflict
POST /shop_product/_bulk
{"update":{"_id":"1"}}
{"doc":{"price":1000},"upsert":{"title":"Product 1","price":1000}}

### 10. Bulk Insert Monitoring

# Monitor bulk operation performance
GET /_nodes/stats/indices/indexing

# View index statistics
GET /shop_product/_stats/indexing

# Monitor cluster status
GET /_cluster/health?pretty

## mget Bulk Get Operations Detailed Explanation

The mget (multi-get) API in ES allows fetching multiple documents at once, more efficient than individual fetches, especially suitable for scenarios requiring batch reading.

### 1. Basic Bulk Get

#### 1.1 Get Multiple Documents from Same Index

POST /shop_product/_mget
{
  "docs": [
    {"_id": "1"},
    {"_id": "2"},
    {"_id": "3"}
  ]
}

#### 1.2 Get Documents from Different Indexes

POST /_mget
{
  "docs": [
    {"_index": "shop_product", "_id": "1"},
    {"_index": "shop_user", "_id": "100"},
    {"_index": "shop_order", "_id": "200"}
  ]
}

#### 1.3 Specify Return Fields

POST /shop_product/_mget
{
  "docs": [
    {
      "_id": "1",
      "_source": ["title", "price"]
    },
    {
      "_id": "2",
      "_source": ["title", "tags"]
    }
  ]
}

### 2. Bulk Get Response Handling

# mget response example
{
  "docs": [
    {
      "_index": "shop_product",
      "_id": "1",
      "_version": 1,
      "_seq_no": 0,
      "_primary_term": 1,
      "found": true,
      "_source": {
        "title": "iPhone 15",
        "price": 5999,
        "tags": ["phone", "apple"],
        "stock": 100
      }
    },
    {
      "_index": "shop_product",
      "_id": "2",
      "_version": 1,
      "_seq_no": 1,
      "_primary_term": 1,
      "found": true,
      "_source": {
        "title": "Samsung Galaxy",
        "price": 4999,
        "tags": ["phone", "android"],
        "stock": 50
      }
    },
    {
      "_index": "shop_product",
      "_id": "999",
      "found": false
    }
  ]
}

### 3. Advanced Bulk Get

#### 3.1 Using ids Parameter (Simplified Syntax)

POST /shop_product/_mget
{
  "ids": ["1", "2", "3", "999"]
}

#### 3.2 Exclude Unnecessary Fields

POST /shop_product/_mget
{
  "docs": [
    {
      "_id": "1",
      "_source": {
        "excludes": ["description", "created_at"]
      }
    },
    {
      "_id": "2",
      "_source": {
        "includes": ["title", "price"]
      }
    }
  ]
}

#### 3.3 Get Stored Fields

POST /shop_product/_mget
{
  "docs": [
    {
      "_id": "1",
      "stored_fields": ["title", "price"]
    }
  ]
}

### 4. Bulk Get Performance Optimization

#### 4.1 Control Batch Size Reasonably

# Recommended batch size: 100-1000 documents
POST /shop_product/_mget
{
  "ids": [
    "1", "2", "3", "4", "5",
    # ... more IDs
    "100"
  ]
}

#### 4.2 Use Routing for Optimization

POST /shop_product/_mget
{
  "docs": [
    {
      "_id": "1",
      "routing": "user123"
    },
    {
      "_id": "2",
      "routing": "user123"
    }
  ]
}

### 5. Error Handling

#### 5.1 Handle Non-existent Documents

POST /shop_product/_mget
{
  "ids": ["1", "999", "2"]
}

# Response includes documents with found: false
{
  "docs": [
    {
      "_index": "shop_product",
      "_id": "1",
      "found": true,
      "_source": {...}
    },
    {
      "_index": "shop_product",
      "_id": "999",
      "found": false
    },
    {
      "_index": "shop_product",
      "_id": "2",
      "found": true,
      "_source": {...}
    }
  ]
}

#### 5.2 Filter Non-existent Documents

# Return only found documents
POST /shop_product/_mget?filter_path=docs._source
{
  "ids": ["1", "999", "2"]
}

### 6. Real-world Application Scenarios

#### 6.1 Get Shopping Cart Product Information

# Bulk get product information by product IDs in shopping cart
POST /shop_product/_mget
{
  "ids": ["cart_item_1", "cart_item_2", "cart_item_3"]
}

#### 6.2 Get User Order Details

# Bulk get all user order details
POST /shop_order/_mget
{
  "docs": [
    {"_id": "order_001", "_source": ["order_id", "total", "status"]},
    {"_id": "order_002", "_source": ["order_id", "total", "status"]},
    {"_id": "order_003", "_source": ["order_id", "total", "status"]}
  ]
}

#### 6.3 Cross-index Data Association

# Get user information and user orders simultaneously
POST /_mget
{
  "docs": [
    {"_index": "shop_user", "_id": "user_123", "_source": ["name", "email"]},
    {"_index": "shop_order", "_id": "order_456", "_source": ["order_id", "total"]},
    {"_index": "shop_product", "_id": "product_789", "_source": ["title", "price"]}
  ]
}

### 7. mget vs Single Get Comparison

Operation Method Method Performance Use Cases
Single get GET /index/_doc/id Slower Get single document
Bulk get POST /index/_mget Very fast Bulk get multiple documents
Search get POST /index/_search Medium Get documents by conditions

### 8. Bulk Get Best Practices

#### 8.1 Data Preprocessing

# Check index status before bulk get
GET /shop_product/_stats

# Check if documents exist
GET /shop_product/_count

#### 8.2 Process Large Data in Batches

# Batch get large amounts of documents
POST /shop_product/_mget
{
  "ids": ["1", "2", "3", "4", "5"]
  # First batch of 100 documents
}

# Continue with next batch
POST /shop_product/_mget
{
  "ids": ["6", "7", "8", "9", "10"]
  # Next batch of 100 documents
}

#### 8.3 Caching Strategy

# Use caching to improve performance
POST /shop_product/_mget?preference=_local
{
  "ids": ["1", "2", "3"]
}

### 9. Common Issues Resolution

#### 9.1 Insufficient Memory

# Reduce batch size
POST /shop_product/_mget
{
  "ids": ["1", "2", "3"]  # Only get 3 documents
}

#### 9.2 Timeout Issues

# Increase timeout
POST /shop_product/_mget?timeout=60s
{
  "ids": ["1", "2", "3"]
}

#### 9.3 Index Doesn't Exist

# Handle case where index doesn't exist
POST /_mget
{
  "docs": [
    {"_index": "shop_product", "_id": "1"},
    {"_index": "non_existent_index", "_id": "2"}
  ]
}

### 10. Bulk Get Monitoring

# Monitor bulk get performance
GET /_nodes/stats/indices/search

# View index search statistics
GET /shop_product/_stats/search

# Monitor cluster status
GET /_cluster/health?pretty

### 11. Combine with Other Bulk Operations

#### 11.1 Bulk Get + Bulk Update

# First bulk get
POST /shop_product/_mget
{
  "ids": ["1", "2", "3"]
}

# Then bulk update
POST /shop_product/_bulk
{"update":{"_id":"1"}}
{"doc":{"stock":95}}
{"update":{"_id":"2"}}
{"doc":{"stock":45}}
{"update":{"_id":"3"}}
{"doc":{"stock":15}}

#### 11.2 Bulk Get + Search

# First search to get relevant document IDs
POST /shop_product/_search
{
  "query": {"match": {"tags": "phone"}},
  "_source": false,
  "size": 10
}

# Then bulk get detailed information
POST /shop_product/_mget
{
  "ids": ["1", "2", "3", "4", "5"]
}

## ES's query:{} Functionality Detailed Explanation

The query in ES is the core part of a search request, used to define search conditions and logic. query:{} is an empty query object, typically used with match_all.

### 1. query Object Basic Structure

POST /shop_product/_search
{
  "query": {
    // Query conditions defined here
  }
}

### 2. Common Query Types

#### 2.1 match_all Query (Full Table Scan)

POST /shop_product/_search
{
  "query": {
    "match_all": {}
  }
}

**Functionality**:

  • Matches all documents in the index
  • Equivalent to SELECT * FROM table in SQL
  • Commonly used for getting all data or as base query

#### 2.2 match Query (Full-text Search)

POST /shop_product/_search
{
  "query": {
    "match": {
      "title": "iPhone"
    }
  }
}

**Functionality**:

  • Performs full-text search on specified field
  • Tokenizes query terms
  • Supports fuzzy matching and relevance scoring

#### 2.3 term Query (Exact Match)

POST /shop_product/_search
{
  "query": {
    "term": {
      "tags": "phone"
    }
  }
}

**Functionality**:

  • Exact match without tokenization
  • Suitable for keyword type fields
  • Better performance than match query

#### 2.4 range Query (Range Query)

POST /shop_product/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 1000,
        "lte": 5000
      }
    }
  }
}

**Functionality**:

  • Range query on numeric, date and other fields
  • Supports gte (greater than or equal), gt (greater than), lte (less than or equal), lt (less than)
  • Commonly used for price range, time range queries

#### 2.5 bool Query (Compound Query)

POST /shop_product/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "phone"}},
        {"range": {"price": {"gte": 1000}}}
      ],
      "must_not": [
        {"term": {"status": "discontinued"}}
      ],
      "should": [
        {"match": {"tags": "apple"}}
      ],
      "filter": [
        {"term": {"category": "electronics"}}
      ]
    }
  }
}

**Functionality**:

  • must: must match, affects relevance scoring
  • must_not: must not match, doesn't affect scoring
  • should: should match, increases relevance scoring
  • filter: must match, but doesn't affect scoring, better performance

#### 2.6 wildcard Query (Wildcard Query)

POST /shop_product/_search
{
  "query": {
    "wildcard": {
      "title": {
        "value": "iPh*"
      }
    }
  }
}

**Functionality**:

  • Supports * (match any characters) and ? (match single character)
  • Slower performance, not recommended on large data
  • Suitable for fuzzy matching scenarios

#### 2.7 prefix Query (Prefix Query)

POST /shop_product/_search
{
  "query": {
    "prefix": {
      "title": "iPh"
    }
  }
}

**Functionality**:

  • Matches documents starting with specified prefix
  • Suitable for auto-complete, search suggestions
  • Better performance than wildcard query

#### 2.8 fuzzy Query (Fuzzy Query)

POST /shop_product/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "iphone",
        "fuzziness": "AUTO"
      }
    }
  }
}

**Functionality**:

  • Supports typo-tolerant queries
  • fuzziness parameter controls tolerance level
  • Suitable for search correction scenarios

### 3. Query Performance Optimization

#### 3.1 Use filter Instead of query

POST /shop_product/_search
{
  "query": {
    "bool": {
      "filter": [
        {"range": {"price": {"gte": 1000}}}
      ]
    }
  }
}

**Advantages**:

  • filter doesn't calculate relevance scores, better performance
  • Results are cached, repeated queries are faster
  • Suitable for exact match conditions

#### 3.2 Use _source Filtering Reasonably

POST /shop_product/_search
{
  "_source": ["title", "price"],
  "query": {
    "match_all": {}
  }
}

**Advantages**:

  • Only return needed fields, reduce network transmission
  • Improve query performance
  • Lower memory usage

#### 3.3 Use size to Control Return Quantity

POST /shop_product/_search
{
  "size": 10,
  "query": {
    "match_all": {}
  }
}

**Advantages**:

  • Avoid returning large amounts of data at once
  • Improve query response speed
  • Reduce memory consumption

### 4. Query Debugging Techniques

#### 4.1 Use explain Parameter

POST /shop_product/_search
{
  "explain": true,
  "query": {
    "match": {
      "title": "iPhone"
    }
  }
}

**Functionality**:

  • Shows scoring calculation process for each document
  • Helps understand why results are ranked this way
  • Used for query optimization and debugging

#### 4.2 Use profile Parameter

POST /shop_product/_search
{
  "profile": true,
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "phone"}},
        {"range": {"price": {"gte": 1000}}}
      ]
    }
  }
}

**Functionality**:

  • Shows detailed timing information of query execution
  • Helps identify performance bottlenecks
  • Used for query performance optimization

### 5. Query Best Practices

#### 5.1 Query Structure Optimization

# Good query structure
POST /shop_product/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "phone"}}
      ],
      "filter": [
        {"range": {"price": {"gte": 1000, "lte": 5000}}},
        {"term": {"status": "active"}}
      ]
    }
  },
  "sort": [{"price": "desc"}],
  "size": 20
}

#### 5.2 Avoid Deep Pagination

# Use search_after instead of from/size
POST /shop_product/_search
{
  "query": {"match_all": {}},
  "size": 100,
  "sort": [{"_id": "asc"}],
  "search_after": ["last_doc_id"]
}

#### 5.3 Use Aggregations Reasonably

POST /shop_product/_search
{
  "size": 0,
  "aggs": {
    "price_stats": {
      "stats": {"field": "price"}
    },
    "tags_count": {
      "terms": {"field": "tags", "size": 10}
    }
  }
}

### 6. Common Query Patterns

#### 6.1 Search + Filter

POST /shop_product/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "user search term"}}
      ],
      "filter": [
        {"term": {"category": "electronics"}},
        {"range": {"price": {"gte": 1000}}}
      ]
    }
  }
}

#### 6.2 Multi-field Search

POST /shop_product/_search
{
  "query": {
    "multi_match": {
      "query": "iPhone",
      "fields": ["title^2", "description", "tags"]
    }
  }
}

#### 6.3 Nested Object Query

POST /shop_product/_search
{
  "query": {
    "nested": {
      "path": "reviews",
      "query": {
        "bool": {
          "must": [
            {"match": {"reviews.comment": "good"}},
            {"range": {"reviews.rating": {"gte": 4}}}
          ]
        }
      }
    }
  }
}

### 7. Query Performance Monitoring

# Monitor query performance
GET /_nodes/stats/indices/search

# View slow query logs
GET /_nodes/stats/indices/search?filter_path=*.search.query_time_in_millis

# Monitor cluster query status
GET /_cluster/health?pretty

### 8. Query Error Handling

#### 8.1 Handle Query Syntax Errors

# Incorrect query
POST /shop_product/_search
{
  "query": {
    "match": {
      "title": "iPhone"
      // Missing closing bracket
    }
  }
}

# Error response
{
  "error": {
    "type": "parsing_exception",
    "reason": "Unexpected end-of-input"
  }
}

#### 8.2 Handle Field Not Exist Error

# Query non-existent field
POST /shop_product/_search
{
  "query": {
    "match": {
      "non_existent_field": "value"
    }
  }
}

# Response (no error, but no documents matched)
{
  "hits": {
    "total": {"value": 0, "relation": "eq"},
    "hits": []
  }
}

### 9. Query Caching Strategy

# Use query caching
POST /shop_product/_search
{
  "query": {
    "bool": {
      "filter": [
        {"term": {"category": "electronics"}}
      ]
    }
  }
}

# Cache will be automatically used, improving performance for repeated queries

### 10. Summary

The query:{} functionality in ES is the core of search. Mastering various query types and optimization techniques is crucial for building high-performance search applications:

  • Basic queries: match_all, match, term, range
  • Compound queries: bool query combining multiple conditions
  • Performance optimization: reasonable use of filter, _source filtering, size control
  • Debugging techniques: explain and profile parameters
  • Best practices: avoid deep pagination, reasonable aggregation use, performance monitoring

By properly using these query features, you can build efficient and accurate search systems.

主题测试文章,只做测试使用。发布者:Walker,转转请注明出处:https://walker-learn.xyz/archives/4783

(0)
Walker的头像Walker
上一篇 Mar 10, 2026 00:00
下一篇 Mar 8, 2026 15:40

Related Posts

  • Go Engineer System Course 007 [Study Notes]

    Goods Microservice Entity Structure Description This module contains the following core entities: Goods (Goods) Goods Category (Category) Brand (Brands) Carousel (Banner) Goods Category Brand (GoodsCategoryBrand) 1. Goods (Goods) Describes the product information actually displayed and sold on the platform. Field Description Field Name Type Description name String Product name, required brand Pointer …

    Personal Nov 25, 2025
    28300
  • In-depth Understanding of ES6 009 [Learning Notes]

    Classes in JavaScript function PersonType(name){ this.name = name; } PersonType.prototype.sayName = function(){ console.log(this.name) } var person = new PersonType("Nicholas") p…

    Personal Mar 8, 2025
    1.3K00
  • In-depth Understanding of ES6 001 [Study Notes]

    Block-Level Scope Binding
    Previously, `var` variable declarations, regardless of where they were declared, were considered to be declared at the top of their scope. Since functions are first-class citizens, the typical order was `function functionName()`, followed by `var variable`.

    Block-Level Declarations
    Block-level declarations are used to declare variables that cannot be accessed outside the scope of a specified block. Block-level scope exists in:
    - Inside functions
    - Within blocks (the region between `{` and `}`)

    Temporal Dead Zone
    When the JavaScript engine scans code and finds variable declarations, it either hoists them to the top of the scope...

    Personal Mar 8, 2025
    1.7K00
  • Love sports, challenge limits, embrace nature.

    Passion. In this fast-paced era, we are surrounded by the pressures of work and life, often neglecting our body's needs. However, exercise is not just a way to keep fit; it's a lifestyle that allows us to unleash ourselves, challenge our limits, and dance with nature. Whether it's skiing, rock climbing, surfing, or running, cycling, yoga, every sport allows us to find our inner passion and feel the vibrancy of life. Sport is a self-challenge. Challenging limits is not exclusive to professional athletes; it's a goal that everyone who loves sports can pursue. It can...

    Personal Feb 26, 2025
    1.4K00
  • Nuxt3: Beginner's Guide and Principles Introduction [Learning Notes]

    Nuxt 3: Getting Started and Principles 💡 What is Nuxt 3? Nuxt 3 is a full-stack frontend framework built on Vue 3 and Vite, supporting: Server-Side Rendering (SSR) Static Site Generation (SSG) Single-Page Applications (SPA) Building full-stack applications (with API support) Nuxt 3 is an "enhanced version" of Vue, helping you simplify project structure and development workflow. 🔧 Core Principles Feature How Nuxt Handles It ✅ Page Routing Automatic root...

    Personal Apr 6, 2025
    2.2K00
EN
简体中文 繁體中文 English