Graylog

graylog-logo

介绍

Graylog是一个开源的日志聚合、分析、审计、展现和预警工具。Graylog完全基于JAVA代码编写,运行时需要依赖JDK。

Graylog优点

零开发:从收集->存储->分析->呈现完整流程。
部署维护简单:一体化解决方案,不像ELK三个独立系统集成。
多日志源接入:syslog、Filebeat、Log4j、Logstash等。
多协议接入:UDP、TCP、HTTP、AMQP。
自定义面板:提供曲线图、饼状图、世界地图等丰富的图形列表。
全文搜索:支持按语法进行过滤搜索全部日志。
支持报警:具有报警功能的日志分析平台。
权限管理:灵活的权限分配和管理。
支持集群:可以根据应用扩展平台性能。

Graylog组件

Graylog 提供Graylog对外接口,提供用户界面;
Elasticsearch 日志文件的持久化存储、检索;
MongoDB 存储元信息和配置数据;
Prometheus 日志告警。

硬件配置

在为Graylog扩展资源时,有一些经验法则:
Graylog节点应重点关注CPU功能。这些还可以为浏览器提供用户界面。
Elasticsearch节点应具有尽可能多的RAM和可以获得的最快磁盘。一切都取决于此处的I/O速度。
MongoDB存储元信息和配置数据,不需要很多资源。

最小化安装

这是最小化的Graylog安装,可用于较小,非关键或测试的设置。没有多余的组件,可以简单且快速安装。
graylog_architec_small_setup

集群安装

这是用于较大生产环境的安装方式。它在负载均衡器后面有几个Graylog节点,用于分配处理负载。
负载均衡器可以通过Graylog REST API上的HTTP来对Graylog节点执行ping操作,以检查它们是否仍在运行并将死节点从群集中删除。
graylog_architec_bigger_setup

Graylog 数据流

Graylog日志系统除了Graylog server,还包括代理程序(graylog sidecar)第三方日志收集器(如:filebeat、syslog),主要数据流如下图

Graylog内部处理流程

日志采集器将一条日志信息发送给Graylog,主要经过如下处理过程:

安装

github
Graylog对基础软件的版本要求如下:

安装Java

sudo yum install java-1.8.0-openjdk-headless.x86_64

安装 MongoDB

/etc/yum.repos.d/mongodb-org.repo

[mongodb-org-4.0]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/4.0/x86_64/
gpgcheck=1
enabled=1
gpgkey=https://www.mongodb.org/static/pgp/server-4.0.asc

sudo yum install mongodb-org

sudo systemctl daemon-reload
sudo systemctl enable mongod.service
sudo systemctl restart mongod.service
sudo systemctl status mongod.service
sudo systemctl stop mongod.service

sudo chkconfig --add mongod
sudo chkconfig mongod on
sudo service mongod restart
sudo service mongod status
sudo service mongod stop

安装 Elasticsearch

rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
vi /etc/yum.repos.d/elasticsearch.repo

[elasticsearch-6.x]
name=Elasticsearch repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/oss-6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

[elasticsearch]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

https://mirrors.tuna.tsinghua.edu.cn/elasticstack/

[elasticsearch]
name=Elasticsearch repository for 7.x packages
baseurl=https://mirrors.tuna.tsinghua.edu.cn/elasticstack/7.x/yum
gpgcheck=0
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

sudo yum install elasticsearch-oss

修改配置

vi /etc/elasticsearch/elasticsearch.yml

Cluster name settingedit
集群名称设置

A node can only join a cluster when it shares its cluster.name with all the other nodes in the cluster. The default name is elasticsearch, but you should change it to an appropriate name that describes the purpose of the cluster.

cluster.name: logging-prod
Do not reuse the same cluster names in different environments. Otherwise, nodes might join the wrong cluster.
当节点与集群中的所有其他节点共享其cluster.name时,该节点只能加入集群。默认名称是elasticsearch,但是您应该将其更改为描述集群用途的适当名称。


Node name setting
节点名称设置

Elasticsearch uses node.name as a human-readable identifier for a particular instance of Elasticsearch. This name is included in the response of many APIs. The node name defaults to the hostname of the machine when Elasticsearch starts, but can be configured explicitly in elasticsearch.yml:
Elasticsearch使用node.name作为Elasticsearch特定实例的人类可读标识符。此名称包含在许多API的响应中。当Elasticsearch启动时,节点名称默认为计算机的主机名,但可以在elasticsearch.yml中显式配置:

node.name: prod-data-2


Network host setting
网络主机设置

By default, Elasticsearch only binds to loopback addresses such as 127.0.0.1 and [::1]. This is sufficient to run a cluster of one or more nodes on a single server for development and testing, but a resilient production cluster must involve nodes on other servers. There are many network settings but usually all you need to configure is network.host:
默认情况下,Elasticsearch仅绑定到环回地址,例如127.0.0.1和[:: 1]。这足以在单个服务器上运行一个或多个节点的集群以进行开发和测试,但是弹性生产集群必须涉及其他服务器上的节点。网络设置很多,但通常只需配置network.host:

network.host: 192.168.1.10

When you provide a value for network.host, Elasticsearch assumes that you are moving from development mode to production mode, and upgrades a number of system startup checks from warnings to exceptions. See the differences between development and production modes.
★★★当您提供network.host的值时,Elasticsearch会假定您正在从开发模式转换为生产模式,并将许多系统启动检查从警告升级为异常。查看开发和生产模式之间的差异。

★★★这些异常将阻止您的Elasticsearch节点启动。 这是一项重要的安全措施,可确保您不会因服务器配置错误而丢失数据。

network.bind_host: 192.168.0.1
设置绑定的ip地址,可以是ipv4或ipv6的,默认为0.0.0.0。

network.publish_host: 192.168.0.1
设置其它节点和该节点交互的ip地址,如果不设置它会自动判断,值必须是个真实的ip地址。

network.host: 192.168.0.1
这个参数是用来同时设置bind_host和publish_host上面两个参数。


Discovery and cluster formation settings
发现和集群形成设置

Configure two important discovery and cluster formation settings before going to production so that nodes in the cluster can discover each other and elect a master node.
在投入生产之前,请配置两个重要的发现和集群形成设置,以便集群中的节点可以彼此发现并选举一个主节点。

★★★discovery.seed_hosts 和  discovery.initial_master_nodes只设置master节点配置,纯data节点不写。

discovery.seed_hosts
对应旧版中的discovery.zen.ping.unicast.hosts

Out of the box, without any network configuration, Elasticsearch will bind to the available loopback addresses and scan local ports 9300 to 9305 to connect with other nodes running on the same server. This behavior provides an auto-clustering experience without having to do any configuration.
开箱即用,无需任何网络配置,Elasticsearch将绑定到可用的环回地址并扫描本地端口9300至9305,以与在同一服务器上运行的其他节点连接。此行为无需进行任何配置即可提供自动群集体验。

When you want to form a cluster with nodes on other hosts, use the static discovery.seed_hosts setting. This setting provides a list of other nodes in the cluster that are master-eligible and likely to be live and contactable to seed the discovery process. This setting accepts a YAML sequence or array of the addresses of all the master-eligible nodes in the cluster. Each address can be either an IP address or a hostname that resolves to one or more IP addresses via DNS.
如果要与其他主机上的节点组成集群,请使用静态 Discovery.seed_hosts设置。此设置提供了群集中其他主机节点的列表,这些节点符合候选主节点,并且可能处于活动状态并且可以联系以播种发现过程。此设置接受群集中所有主资格节点的YAML序列或地址数组。每个地址可以是IP地址,也可以是通过DNS解析为一个或多个IP地址的主机名。

discovery.seed_hosts:
   - 192.168.1.10:9300
   - 192.168.1.11
   - seeds.mydomain.com
   - [0:0:0:0:0:ffff:c0a8:10c]:9301

discovery.seed_hosts: ["es-01", "es-02"]

discovery.seed_hosts: ["10.10.100.119", "10.10.100.120"]

如果配置为域名,需写入/etc/hosts
es-01   10.10.100.119
es-02   10.10.100.120

The port is optional and defaults to 9300, but can be overridden.
该端口是可选的,默认为9300,但可以覆盖。

If a hostname resolves to multiple IP addresses, the node will attempt to discover other nodes at all resolved addresses.
如果主机名解析为多个IP地址,则该节点将尝试在所有解析的地址处发现其他节点。

IPv6 addresses must be enclosed in square brackets.
IPv6地址必须放在方括号中。

If your master-eligible nodes do not have fixed names or addresses, use an alternative hosts provider to find their addresses dynamically.
如果符合主机资格的节点没有固定的名称或地址,请使用备用主机提供程序动态查找其地址。

cluster.initial_master_nodes
这是7.*新引入的配置项。

When you start an Elasticsearch cluster for the first time, a cluster bootstrapping step determines the set of master-eligible nodes whose votes are counted in the first election. In development mode, with no discovery settings configured, this step is performed automatically by the nodes themselves.
首次启动Elasticsearch集群时,集群引导步骤将确定其主节点符合资格的节点集,该节点的票数将在第一次选举中进行计数。在开发模式下,未配置发现设置,此步骤由节点自身自动执行。

Because auto-bootstrapping is inherently unsafe, when starting a new cluster in production mode, you must explicitly list the master-eligible nodes whose votes should be counted in the very first election. You set this list using the cluster.initial_master_nodes setting.
因为自动引导本质上是不安全的,所以在生产模式下启动新集群时,必须明确列出要在第一个选举中计算其票数的符合主机资格的节点。您可以使用cluster.initial_master_nodes设置来设置此列表。

After the cluster forms successfully for the first time, remove the cluster.initial_master_nodes setting from each nodes' configuration. Do not use this setting when restarting a cluster or adding a new node to an existing cluster.
集群首次成功形成后,从每个节点的配置中删除cluster.initial_master_nodes设置。重新启动群集或将新节点添加到现有群集时,请勿使用此设置。
仅在集群首次启动会使用,其他阶段可以去掉。详见案例五。不过,规范管理起见,配置上不用动就可以了。

discovery.seed_hosts:
   - 192.168.1.10:9300
   - 192.168.1.11
   - seeds.mydomain.com
   - [0:0:0:0:0:ffff:c0a8:10c]:9301
cluster.initial_master_nodes:
   - master-node-a
   - master-node-b
   - master-node-c

Identify the initial master nodes by their node.name, which defaults to their hostname. Ensure that the value in cluster.initial_master_nodes matches the node.name exactly. If you use a fully-qualified domain name (FQDN) such as master-node-a.example.com for your node names, then you must use the FQDN in this list. Conversely, if node.name is a bare hostname without any trailing qualifiers, you must also omit the trailing qualifiers in cluster.initial_master_nodes.
通过默认的主机名node.name标识初始主节点。确保cluster.initial_master_nodes中的值与node.name完全匹配。如果将完全限定的域名(FQDN)(例如master-node-a.example.com)用作节点名称,则必须在此列表中使用FQDN。相反,如果node.name是没有任何尾随限定符的裸主机名,则还必须在cluster.initial_master_nodes中省略尾随限定符。

See bootstrapping a cluster and discovery and cluster formation settings.
请参阅引导群集以及发现和群集形成设置。


index.number_of_shards: 5

设置默认索引分片个数,默认为5片。

index.number_of_replicas: 1

设置默认索引副本个数,默认为1个副本。


#ES默认开启了内存地址锁定,为了避免内存交换提高性能。但是Centos6不支持SecComp功能,启动会报错,所以需要将其设置为false
bootstrap.memory_lock: false

#如果使用elasticsearch-header对集群进行监控时,设置跨域
# 是否支持跨域
http.cors.enabled: true

# *表示支持所有域名
http.cors.allow-origin: "*"

启动服务

sudo systemctl daemon-reload
sudo systemctl enable elasticsearch.service
sudo systemctl restart elasticsearch.service
sudo systemctl status elasticsearch.service
sudo systemctl stop elasticsearch.service

sudo chkconfig --add elasticsearch
sudo chkconfig elasticsearch on
sudo service elasticsearch restart
sudo service elasticsearch status
sudo service elasticsearch stop

安装 Kibana

rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

vi /etc/yum.repos.d/kibana.repo

[kibana-7.x]
name=Kibana repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

repo配置同elasticsearch

sudo yum install kibana

修改配置

vi /etc/kibana/kibana.yml
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://localhost:9200"]
i18n.locale: "zh-CN"

启动服务

sudo systemctl daemon-reload
sudo systemctl enable kibana.service
sudo systemctl restart kibana.service
sudo systemctl status kibana.service
sudo systemctl stop kibana.service

sudo chkconfig --add kibana
service kibana restart
service kibana status
service kibana stop

安装 Logstash

repo配置同elasticsearch

sudo yum install logstash

sudo systemctl daemon-reload
sudo systemctl enable logstash.service
sudo systemctl restart logstash.service
sudo systemctl status logstash.service
sudo systemctl stop logstash.service

安装 Graylog3

sudo rpm -Uvh https://packages.graylog2.org/repo/packages/graylog-3.3-repository_latest.rpm
sudo yum update && sudo yum install graylog-server graylog-enterprise-plugins graylog-integrations-plugins graylog-enterprise-integrations-plugins

sudo yum install graylog-server

修改配置

vi /etc/graylog/server/server.conf
passworde_secret: 用于密码加密加盐,集群中的服务器使用同一个密码,必须配置。可以通过命令:pwgen -N 1 -s 96 来随机生成。
root_password_sha2: 用于默认的admin用户登录web的密码,必须配置,可以通过以下命令生成:echo -n yourpassword | sha256sum

修改默认admin用户名:
root_username = jzinfograylog

修改时区为中国PRC:
root_timezone = PRC

配置绑定地址:
http_bind_address = 0.0.0.0:9000

配置外网地址,默认为http_bind_address,因使用了nginx代理,需指定网卡:
http_publish_uri = http://192.168.103.80:9000/

修改elasticsearch主机地址:
elasticsearch_hosts = http://192.168.200.96:9200

修改MongoDB连接地址:
mongodb_uri = mongodb://192.168.203.80:27017/graylog

启动服务

sudo systemctl daemon-reload
sudo systemctl enable graylog-server.service
sudo systemctl restart graylog-server.service
sudo systemctl status graylog-server.service
sudo systemctl stop graylog-server.service

sudo chkconfig --add graylog-server
sudo chkconfig graylog-server on
sudo service graylog-server restart
sudo service graylog-server status
sudo service graylog-server stop

安装 graylog-sidecar

/etc/graylog/sidecar/sidecar.yml
# server_url and server_api_token

sudo graylog-sidecar -service install
sudo systemctl enable graylog-sidecar
sudo systemctl restart graylog-sidecar
sudo systemctl status graylog-sidecar
sudo systemctl stop graylog-sidecar

sudo graylog-sidecar -service install
chkconfig graylog-sidecar on
service graylog-sidecar restart
service graylog-sidecar status
service graylog-sidecar stop

安装 Filebeat

filebeat 不需要启动

#sudo filebeat -service install
#sudo systemctl enable filebeat
#sudo systemctl restart filebeat
sudo systemctl status filebeat
#sudo systemctl stop filebeat

#sudo filebeat -service install
#chkconfig filebeat on
#service filebeat restart
service filebeat status
#service filebeat stop

编辑配置文件

/etc/mongod.conf

/etc/graylog/sidecar/sidecar.yml
C:\Program Files\Graylog\sidecar\sidecar.yml


/var/lib/graylog-sidecar/generated/filebeat.conf

登陆Graylog Web

使用浏览器访问 http://外网ip:9000/,用户名默认为admin、密码在安装Graylog3中生成。

防火墙

# Graylog Rest API:
firewall-cmd --permanent --zone=public --add-port=9000/tcp

# Graylog Rest INPUTS:
firewall-cmd --permanent --zone=public --add-port=5044/tcp
firewall-cmd --permanent --zone=public --add-port=5045/tcp
firewall-cmd --permanent --zone=public --add-port=5046/tcp
firewall-cmd --permanent --zone=public --add-port=5047/tcp
firewall-cmd --permanent --zone=public --add-port=5047/udp

firewall-cmd --permanent --zone=public --add-port=1514/tcp
firewall-cmd --permanent --zone=public --add-port=1515/tcp

# mongodb:
firewall-cmd --permanent --zone=public --add-port=27017/tcp

# Elasticsearch:

#http.port:对外服务的http端口
firewall-cmd --permanent --zone=public --add-port=9200/tcp

#transport.tcp.port:节点间交互的tcp端口
firewall-cmd --permanent --zone=public --add-port=9300/tcp

# Python2
firewall-cmd --permanent --zone=public --add-port=8000/tcp
# PrometheusAlert
firewall-cmd --permanent --zone=public --add-port=8080/tcp

#重新载入,更新防火墙规则
firewall-cmd --reload

#列出开放的端口
firewall-cmd --list-ports

防火墙

iptables -I INPUT -i eth0 -p tcp --dport 9000 -j ACCEPT

iptables -I INPUT -i eth0 -p tcp --dport 5044 -j ACCEPT
iptables -I INPUT -i eth0 -p tcp --dport 5045 -j ACCEPT
iptables -I INPUT -i eth0 -p tcp --dport 5046 -j ACCEPT

iptables -I INPUT -i eth0 -p tcp --dport 1514 -j ACCEPT
iptables -I INPUT -i eth0 -p tcp --dport 1515 -j ACCEPT

iptables -I INPUT -i eth0 -p tcp --dport 27017 -j ACCEPT

iptables -I INPUT -i eth0 -p tcp --dport 9200 -j ACCEPT
iptables -I INPUT -i eth0 -p tcp --dport 9300 -j ACCEPT


service iptables save
iptables -n -L INPUT


vi /etc/sysconfig/iptables

-A INPUT -p tcp -m state --state NEW -m tcp --dport 9000 -j ACCEPT

-A INPUT -p tcp -m state --state NEW -m tcp --dport 5044 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 5045 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 5046 -j ACCEPT

-A INPUT -p tcp -m state --state NEW -m tcp --dport 1514 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 1515 -j ACCEPT

-A INPUT -p tcp -m state --state NEW -m tcp --dport 27017 -j ACCEPT

-A INPUT -p tcp -m state --state NEW -m tcp --dport 9200 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 9300 -j ACCEPT

service iptables reload
iptables -n -L INPUT

问题解决

非master节点报错:

[root@jz-Test-187 ~]# tail -f /var/log/graylog-server/server.log
2020-07-22T14:22:29.905+08:00 WARN  [NodePingThread] Did not find meta info of this node. Re-registering.
2020-07-22T14:22:30.905+08:00 WARN  [NodePingThread] Did not find meta info of this node. Re-registering.

解决:服务器时间未同步,时钟必须同步

CentOS 6 报错

[root@jz-Test-187 ~]# sudo service elasticsearch restart
Stopping elasticsearch:                                    [FAILED]
Starting elasticsearch: warning: Falling back to java on path. This behavior is deprecated. Specify JAVA_HOME
此告警没事。
                                                           [FAILED]
此处FAILED有问题,看日志

[root@jz-Test-187 ~]#
[root@jz-Test-187 ~]# tail -f /var/log/elasticsearch/elasticsearch.log
...
[2020-07-22T12:49:45,462][ERROR][o.e.b.Bootstrap          ] [K2oa3PJ] node validation exception
[2] bootstrap checks failed
[1]: max number of threads [1024] for user [elasticsearch] is too low, increase to at least [4096]
[2]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
[2020-07-22T12:49:45,463][INFO ][o.e.n.Node               ] [K2oa3PJ] stopping ...
[2020-07-22T12:49:45,472][INFO ][o.e.n.Node               ] [K2oa3PJ] stopped
[2020-07-22T12:49:45,472][INFO ][o.e.n.Node               ] [K2oa3PJ] closing ...
[2020-07-22T12:49:45,479][INFO ][o.e.n.Node               ] [K2oa3PJ] closed

注意到1:max number of threads [1024] for user [elasticsearch] is too low, increase to at least [4096]
vi /etc/security/limits.d/90-nproc.conf

*          soft    nproc     4096
root       soft    nproc     unlimited


注意到2:system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk

原因:
这是在因为Centos6不支持SecComp,而ES5.2.0默认bootstrap.system_call_filter为true进行检测,所以导致检测失败,失败后直接导致ES不能启动。
解决:
在elasticsearch.yml中配置bootstrap.system_call_filter为false,注意要在Memory下面:
bootstrap.memory_lock: false
bootstrap.system_call_filter: false

elasticsearch启动超时

[root@localhost elasticsearch7.12.0]# sudo systemctl restart elasticsearch.service
Job for elasticsearch.service failed because a timeout was exceeded. See "systemctl status elasticsearch.service" and "journalctl -xe" for details.

注意到:timeout was exceeded.

vi /usr/lib/systemd/system/elasticsearch.service
TimeoutStartSec=750

SSE4.2

[2021-09-27T06:25:24,677][ERROR][o.e.b.Bootstrap          ] [localhost.localdomain] Exception
org.elasticsearch.ElasticsearchException: Failure running machine learning native code. This could be due to running on an unsupported OS or distribution, missing OS libraries, or a problem with the temp directory. To bypass this problem by running Elasticsearch without machine learning functionality set [xpack.ml.enabled: false].

注意到:To bypass this problem by running Elasticsearch without machine learning functionality set [xpack.ml.enabled: false].

默认是开启机器学习设置的。
注意:机器学习使用SSE4.2指令,因此只能在cpu支持SSE4.2的机器上运行。如果在旧的硬件上运行Elasticsearch,那么需要设置:xpack.ml.enabled为false。

vi /etc/elasticsearch/elasticsearch.yml
xpack.ml.enabled: false

elasticsearch 节点磁盘满了

tail -f /var/log/graylog-server/server.log

2021-01-07T10:55:17.643+08:00 WARN  [MessagesAdapterES6] Failed to index message: index=<win_sys_11> id=<b2aa9370-5093-11eb-8dd7-525400aa6619> error=<{"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"}>
2021-01-07T10:55:17.643+08:00 WARN  [MessagesAdapterES6] Failed to index message: index=<win_sys_11> id=<b2aaba80-5093-11eb-8dd7-525400aa6619> error=<{"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"}>
2021-01-07T10:55:17.643+08:00 WARN  [MessagesAdapterES6] Failed to index message: index=<pgsql_1> id=<b2aaba81-5093-11eb-8dd7-525400aa6619> error=<{"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"}>
2021-01-07T10:55:18.427+08:00 WARN  [IndexRotationThread] Deflector is pointing to [graylog_11], not the newest one: [graylog_12]. Re-pointing.
2021-01-07T10:55:18.431+08:00 ERROR [IndexRotationThread] Couldn't point deflector to a new index
org.graylog2.indexer.ElasticsearchException: Couldn't switch alias graylog_deflector from index graylog_11 to index graylog_12

blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];
        at org.graylog.storage.elasticsearch6.jest.JestUtils.specificException(JestUtils.java:110) ~[?:?]
        at org.graylog.storage.elasticsearch6.jest.JestUtils.execute(JestUtils.java:60) ~[?:?]
        at org.graylog.storage.elasticsearch6.jest.JestUtils.execute(JestUtils.java:65) ~[?:?]
        at org.graylog.storage.elasticsearch6.IndicesAdapterES6.cycleAlias(IndicesAdapterES6.java:580) ~[?:?]
        at org.graylog2.indexer.indices.Indices.cycleAlias(Indices.java:318) ~[graylog.jar:?]
        at org.graylog2.indexer.MongoIndexSet.pointTo(MongoIndexSet.java:357) ~[graylog.jar:?]
        at org.graylog2.periodical.IndexRotationThread.checkAndRepair(IndexRotationThread.java:166) ~[graylog.jar:?]
        at org.graylog2.periodical.IndexRotationThread.lambda$doRun$0(IndexRotationThread.java:76) ~[graylog.jar:?]
        at java.lang.Iterable.forEach(Iterable.java:75) [?:1.8.0_172]
        at org.graylog2.periodical.IndexRotationThread.doRun(IndexRotationThread.java:73) [graylog.jar:?]
        at org.graylog2.plugin.periodical.Periodical.run(Periodical.java:77) [graylog.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_172]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_172]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_172]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_172]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_172]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_172]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]

这是因为磁盘空间不足导致的,查看官方文档,可以看到当磁盘的使用率超过95%时,Elasticsearch为了防止节点耗尽磁盘空间,自动将索引设置为只读模式。

在新加硬盘扩大磁盘空间后,还是会提示此错误。

在确认磁盘空间没问题后,删除es索引的read_only标记,即可解决

curl -XPUT -H "Content-Type: application/json" 'http://'${es_ip}'/_all/_settings' -d '{"index.blocks.read_only_allow_delete": null}'

其他

在 mongoDB 中查询 sidecar 节点

//查询Windows服务器
db.getCollection('sidecars').find({"node_details.operating_system":'Windows'})
//查询Linux服务器
db.getCollection('sidecars').find({"node_details.operating_system":'Linux'})

//模糊查询node_name
db.getCollection('sidecars').find({"node_name": /^.*80.*$/})

在 mongoDB 中删除已下线的 sidecar 节点

//查询相关sidecar节点
db.getCollection('sidecars').find({})

找到对应的node_id,删除即可

filebeat不报错也不采集日志

Kubernetes集群中的filebeat,部分节点不采集日志,重启之后,查看filebeat日志,不报错,但还是不采集日志。
检查filebeat的日志,发现没有读取Configured paths。最后删除filebeat的数据目录解决。

kubectl logs --tail=100 filebeat-gf546

2023-06-07T03:39:01.533Z        INFO    log/input.go:157        Configured paths: [/opt/log/stash/*/*.log /opt/log/stash/*/*/*.log /opt/log/stash/*/*/*/*.log /var/log/nginx/access.log]

# 解决方法
rm -rf /usr/share/filebeat/*
rm -rf /usr/share/filebeat/.build_hash.txt