Python Elasticsearch 客户端
Elasticsearch 的官方低级客户端。其目标是为所有 Elasticsearch 相关的 Python 代码提供共同基础;因此,它试图做到不偏不倚,并且非常易于扩展。
对于范围更有限的更高级的客户端库,请查看 elasticsearch-dsl - 一个位于 elasticsearch-py 之上的更具 Python 风格的库。
它提供了一种更方便、更惯用的方式来编写和操作 查询。 它与 Elasticsearch JSON DSL 保持密切联系,反映了其术语和结构,同时使用定义的类或类似查询集的表达式从 Python 公开整个 DSL 范围。
它还提供了一个可选的[持久层](https://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html#doctype),用于以类似 ORM 的方式将文档作为 Python 对象进行处理:定义映射、检索和保存文档、将文档数据包装在用户定义的类中。
该库与自 0.90.x 以来的所有 Elasticsearch 版本兼容,但您必须使用匹配的主版本:
对于 Elasticsearch 5.0 及更高版本,请使用库的主版本 5(5.x.y)。
对于 Elasticsearch 2.0 及更高版本,请使用库的主版本 2(2.x.y)。
对于 Elasticsearch 1.0 及更高版本,请使用库的主版本 1(1.x.y)。
对于 Elasticsearch 0.90.x,请使用库的 0.4.x 版本。
在 setup.py 或 requirements.txt 中设置要求的推荐方法是:
# Elasticsearch 5.x
elasticsearch>=5.0.0,<6.0.0
# Elasticsearch 2.x
elasticsearch>=2.0.0,<3.0.0
# Elasticsearch 1.x
elasticsearch>=1.0.0,<2.0.0
# Elasticsearch 0.90.x
elasticsearch<1.0.0
The development is happening on master and 2.x branches respectively.
Install the elasticsearch package with pip:
pip install elasticsearch
To run elasticsearch in a container, optionally set the ES_VERSION environment evariable to either 5.4, 5.3 or 2.4. ES_VERSION is defaulted to latest. Then run ./start_elasticsearch.sh:
export ES_VERSION=5.4
./start_elasticsearch.sh
This will run a version fo Elastic Search in a Docker container suitable for running the tests. To check that elasticearch is running first wait for a healthy status in docker ps:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
955e57564e53 7d2ad83f8446 "/docker-entrypoin..." 6 minutes ago Up 6 minutes (healthy) 0.0.0.0:9200->9200/tcp, 9300/tcp trusting_brattain
Then you can navigate to locahost:9200 in your browser.
Simple use-case:
>>> from datetime import datetime
>>> from elasticsearch import Elasticsearch
# by default we connect to localhost:9200
>>> es = Elasticsearch()
# create an index in elasticsearch, ignore status code 400 (index already exists)
>>> es.indices.create(index='my-index', ignore=400)
{u'acknowledged': True}
# datetimes will be serialized
>>> es.index(index="my-index", doc_type="test-type", id=42, body={"any": "data", "timestamp": datetime.now()})
{u'_id': u'42', u'_index': u'my-index', u'_type': u'test-type', u'_version': 1, u'ok': True}
# but not deserialized
>>> es.get(index="my-index", doc_type="test-type", id=42)['_source']
{u'any': u'data', u'timestamp': u'2013-05-12T19:45:31.804229'}
The client's features include:
- translating basic Python data types to and from json (datetimes are not decoded for performance reasons)
- configurable automatic discovery of cluster nodes
- persistent connections
- load balancing (with pluggable selection strategy) across all available nodes
- failed connection penalization (time based - failed connections won't be retried until a timeout is reached)
- support for ssl and http authentication
- thread safety
- pluggable architecture
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from elasticsearch import Elasticsearch
import json
es = Elasticsearch([{'host':'192.168.200.96','port':9200}])
# es = Elasticsearch([{'host':'127.0.0.1','port':9200}], http_auth=('xiao', '123456'), timeout=3600)
print(es.info())
query = es.search(index='redis_info_4', scroll ='1m')
# es查询出的结果第一页
results = query['hits']['hits']
# es查询出的结果总量
total = query['hits']['total']
#print(total)
# 游标用于输出es查询出的所有结果
scroll_id = query['_scroll_id']
#print(scroll_id)
for i in range(0, int(total / 10) + 1):
# scroll参数必须制定否则会报错
query_scroll = es.scroll(scroll_id=scroll_id, scroll="1m")['hits']['hits']
for j in query_scroll:
print(j['_source']['used_memory_pct'])