elasticsearch

Python Elasticsearch 客户端

Elasticsearch 的官方低级客户端。其目标是为所有 Elasticsearch 相关的 Python 代码提供共同基础；因此，它试图做到不偏不倚，并且非常易于扩展。

对于范围更有限的更高级的客户端库，请查看 elasticsearch-dsl - 一个位于 elasticsearch-py 之上的更具 Python 风格的库。

它提供了一种更方便、更惯用的方式来编写和操作查询。它与 Elasticsearch JSON DSL 保持密切联系，反映了其术语和结构，同时使用定义的类或类似查询集的表达式从 Python 公开整个 DSL 范围。

它还提供了一个可选的[持久层]（https://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html#doctype），用于以类似 ORM 的方式将文档作为 Python 对象进行处理：定义映射、检索和保存文档、将文档数据包装在用户定义的类中。

Compatibility

该库与自 0.90.x 以来的所有 Elasticsearch 版本兼容，但您必须使用匹配的主版本：

对于 Elasticsearch 5.0 及更高版本，请使用库的主版本 5（5.x.y）。

对于 Elasticsearch 2.0 及更高版本，请使用库的主版本 2（2.x.y）。

对于 Elasticsearch 1.0 及更高版本，请使用库的主版本 1（1.x.y）。

对于 Elasticsearch 0.90.x，请使用库的 0.4.x 版本。

在 setup.py 或 requirements.txt 中设置要求的推荐方法是：

# Elasticsearch 5.x
elasticsearch>=5.0.0,<6.0.0

# Elasticsearch 2.x
elasticsearch>=2.0.0,<3.0.0

# Elasticsearch 1.x
elasticsearch>=1.0.0,<2.0.0

# Elasticsearch 0.90.x
elasticsearch<1.0.0

The development is happening on master and 2.x branches respectively.

Installation

Install the elasticsearch package with pip:

pip install elasticsearch

Run Elasticsearch in a Container

To run elasticsearch in a container, optionally set the ES_VERSION environment evariable to either 5.4, 5.3 or 2.4. ES_VERSION is defaulted to latest. Then run ./start_elasticsearch.sh:

export ES_VERSION=5.4
./start_elasticsearch.sh

This will run a version fo Elastic Search in a Docker container suitable for running the tests. To check that elasticearch is running first wait for a healthy status in docker ps:

$ docker ps
CONTAINER ID        IMAGE                      COMMAND                  CREATED             STATUS                   PORTS                              NAMES
955e57564e53        7d2ad83f8446               "/docker-entrypoin..."   6 minutes ago       Up 6 minutes (healthy)   0.0.0.0:9200->9200/tcp, 9300/tcp   trusting_brattain

Then you can navigate to locahost:9200 in your browser.

Example use

Simple use-case:

>>> from datetime import datetime
>>> from elasticsearch import Elasticsearch

# by default we connect to localhost:9200
>>> es = Elasticsearch()

# create an index in elasticsearch, ignore status code 400 (index already exists)
>>> es.indices.create(index='my-index', ignore=400)
{u'acknowledged': True}

# datetimes will be serialized
>>> es.index(index="my-index", doc_type="test-type", id=42, body={"any": "data", "timestamp": datetime.now()})
{u'_id': u'42', u'_index': u'my-index', u'_type': u'test-type', u'_version': 1, u'ok': True}

# but not deserialized
>>> es.get(index="my-index", doc_type="test-type", id=42)['_source']
{u'any': u'data', u'timestamp': u'2013-05-12T19:45:31.804229'}

Full documentation.

Features

The client's features include:

translating basic Python data types to and from json (datetimes are not decoded for performance reasons)
configurable automatic discovery of cluster nodes
persistent connections
load balancing (with pluggable selection strategy) across all available nodes
failed connection penalization (time based - failed connections won't be retried until a timeout is reached)
support for ssl and http authentication
thread safety
pluggable architecture

示例

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from elasticsearch import Elasticsearch

import json

es = Elasticsearch([{'host':'192.168.200.96','port':9200}])
# es = Elasticsearch([{'host':'127.0.0.1','port':9200}], http_auth=('xiao', '123456'), timeout=3600)

print(es.info())

query = es.search(index='redis_info_4', scroll ='1m')

# es查询出的结果第一页
results = query['hits']['hits']

# es查询出的结果总量
total = query['hits']['total']
#print(total)

# 游标用于输出es查询出的所有结果
scroll_id = query['_scroll_id']
#print(scroll_id)

for i in range(0, int(total / 10) + 1):
    # scroll参数必须制定否则会报错
    query_scroll = es.scroll(scroll_id=scroll_id, scroll="1m")['hits']['hits']

    for j in query_scroll:
        print(j['_source']['used_memory_pct'])