elasticsearch

github

Python Elasticsearch 客户端

Elasticsearch 的官方低级客户端。其目标是为所有 Elasticsearch 相关的 Python 代码提供共同基础;因此,它试图做到不偏不倚,并且非常易于扩展。

对于范围更有限的更高级的客户端库,请查看 elasticsearch-dsl - 一个位于 elasticsearch-py 之上的更具 Python 风格的库。

它提供了一种更方便、更惯用的方式来编写和操作 查询。 它与 Elasticsearch JSON DSL 保持密切联系,反映了其术语和结构,同时使用定义的类或类似查询集的表达式从 Python 公开整个 DSL 范围。

它还提供了一个可选的[持久层](https://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html#doctype),用于以类似 ORM 的方式将文档作为 Python 对象进行处理:定义映射、检索和保存文档、将文档数据包装在用户定义的类中。

Compatibility

该库与自 0.90.x 以来的所有 Elasticsearch 版本兼容,但您必须使用匹配的主版本

对于 Elasticsearch 5.0 及更高版本,请使用库的主版本 5(5.x.y)。

对于 Elasticsearch 2.0 及更高版本,请使用库的主版本 2(2.x.y)。

对于 Elasticsearch 1.0 及更高版本,请使用库的主版本 1(1.x.y)。

对于 Elasticsearch 0.90.x,请使用库的 0.4.x 版本。

在 setup.py 或 requirements.txt 中设置要求的推荐方法是:

# Elasticsearch 5.x
elasticsearch>=5.0.0,<6.0.0

# Elasticsearch 2.x
elasticsearch>=2.0.0,<3.0.0

# Elasticsearch 1.x
elasticsearch>=1.0.0,<2.0.0

# Elasticsearch 0.90.x
elasticsearch<1.0.0

The development is happening on master and 2.x branches respectively.

Installation

Install the elasticsearch package with pip:

pip install elasticsearch

Run Elasticsearch in a Container

To run elasticsearch in a container, optionally set the ES_VERSION environment evariable to either 5.4, 5.3 or 2.4. ES_VERSION is defaulted to latest. Then run ./start_elasticsearch.sh:

export ES_VERSION=5.4
./start_elasticsearch.sh

This will run a version fo Elastic Search in a Docker container suitable for running the tests. To check that elasticearch is running first wait for a healthy status in docker ps:

$ docker ps
CONTAINER ID        IMAGE                      COMMAND                  CREATED             STATUS                   PORTS                              NAMES
955e57564e53        7d2ad83f8446               "/docker-entrypoin..."   6 minutes ago       Up 6 minutes (healthy)   0.0.0.0:9200->9200/tcp, 9300/tcp   trusting_brattain

Then you can navigate to locahost:9200 in your browser.

Example use

Simple use-case:

>>> from datetime import datetime
>>> from elasticsearch import Elasticsearch

# by default we connect to localhost:9200
>>> es = Elasticsearch()

# create an index in elasticsearch, ignore status code 400 (index already exists)
>>> es.indices.create(index='my-index', ignore=400)
{u'acknowledged': True}

# datetimes will be serialized
>>> es.index(index="my-index", doc_type="test-type", id=42, body={"any": "data", "timestamp": datetime.now()})
{u'_id': u'42', u'_index': u'my-index', u'_type': u'test-type', u'_version': 1, u'ok': True}

# but not deserialized
>>> es.get(index="my-index", doc_type="test-type", id=42)['_source']
{u'any': u'data', u'timestamp': u'2013-05-12T19:45:31.804229'}

Full documentation.

Features

The client's features include:

示例

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from elasticsearch import Elasticsearch

import json

es = Elasticsearch([{'host':'192.168.200.96','port':9200}])
# es = Elasticsearch([{'host':'127.0.0.1','port':9200}], http_auth=('xiao', '123456'), timeout=3600)

print(es.info())

query = es.search(index='redis_info_4', scroll ='1m')

# es查询出的结果第一页
results = query['hits']['hits']

# es查询出的结果总量
total = query['hits']['total']
#print(total)

# 游标用于输出es查询出的所有结果
scroll_id = query['_scroll_id']
#print(scroll_id)

for i in range(0, int(total / 10) + 1):
    # scroll参数必须制定否则会报错
    query_scroll = es.scroll(scroll_id=scroll_id, scroll="1m")['hits']['hits']

    for j in query_scroll:
        print(j['_source']['used_memory_pct'])