我有一个网络爬虫,它可以爬取数千个网站并将其存储在 Kubernetes 上的持久卷中。
在该 pod 终止后,我想将 PV 内的 .json 文件摄取到 ECK 中。我已经成功运行弹性搜索和 Kibana。另外,我正在遵循快速入门指南。
我有一个网络爬虫,它可以爬取数千个网站并将其存储在 Kubernetes 上的持久卷中。
在该 pod 终止后,我想将 PV 内的 .json 文件摄取到 ECK 中。我已经成功运行弹性搜索和 Kibana。另外,我正在遵循快速入门指南。
Filebeat 将为您执行此操作 - https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html和https://www.elastic.co/guide/en/beats /filebeat/7.14/decode-json-fields.html
ps - 它是 Elasticsearch,而不是 elasticsearch ;)
您可以使用 filebeat daemonset 来实现这一点,可以从以下位置下载:
curl -L -O https://raw.githubusercontent.com/elastic/beats/7.14/deploy/kubernetes/filebeat-kubernetes.yaml
filebeat.autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
hints.enabled: true
hints.default_config:
type: container
paths:
- /var/log/containers/*${data.kubernetes.container.id}.log
Filebeat daemonset 从 stderr 读取,您可以采用以下两种方法之一:
# forward request and error logs to docker log collector
RUN ln -sf /dev/stdout /var/log/nginx/access.log \
&& ln -sf /dev/stderr /var/log/nginx/error.log
2) 使用流式 sidecar,如 Kubernetes 文档中所述:https ://kubernetes.io/docs/concepts/cluster-administration/logging/#streaming-sidecar-container 。
示例 Dockerfile
FROM node:13.3.0-alpine
WORKDIR /app
COPY package.json .
RUN npm install --only=prod
COPY . .
RUN ln -sf /dev/stdout /app/path/jsonfile.json
CMD ["npm","start"]
示例 filebeat 守护程序集
---
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-config
namespace: kube-system
labels:
k8s-app: filebeat
data:
filebeat.yml: |-
# filebeat.inputs:
# - type: container
# paths:
# - /var/log/containers/*.log
# json.keys_under_root: true
# json.add_error_key: true
# json.message_key: message
# processors:
# - add_kubernetes_metadata:
# host: ${NODE_NAME}
# matchers:
# - logs_path:
# logs_path: "/var/log/containers/"
# To enable hints based autodiscover, remove `filebeat.inputs` configuration and uncomment this:
filebeat.autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
hints.enabled: true
hints.default_config:
type: container
paths:
- /var/log/containers/*${data.kubernetes.container.id}.log
processors:
- add_cloud_metadata:
- add_host_metadata:
# - add_kubernetes_metadata:
cloud.id: ${ELASTIC_CLOUD_ID}
cloud.auth: ${ELASTIC_CLOUD_AUTH}
output.elasticsearch:
hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
username: ${ELASTICSEARCH_USERNAME}
password: ${ELASTICSEARCH_PASSWORD}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: filebeat
namespace: kube-system
labels:
k8s-app: filebeat
spec:
selector:
matchLabels:
k8s-app: filebeat
template:
metadata:
labels:
k8s-app: filebeat
spec:
serviceAccountName: filebeat
terminationGracePeriodSeconds: 30
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: filebeat
image: docker.elastic.co/beats/filebeat:7.13.3
args: ["-c", "/etc/filebeat.yml", "-e"]
env:
- name: ELASTICSEARCH_HOST
value: "elasticsearch:9200"
- name: ELASTICSEARCH_PORT
value: "9200"
- name: ELASTICSEARCH_USERNAME
value: "elastic"
- name: ELASTICSEARCH_PASSWORD
value: "password"
- name: ELASTIC_CLOUD_ID
value:
- name: ELASTIC_CLOUD_AUTH
value:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
securityContext:
runAsUser: 0
# If using Red Hat OpenShift uncomment this:
#privileged: true
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- name: config
mountPath: /etc/filebeat.yml
readOnly: true
subPath: filebeat.yml
- name: data
mountPath: /usr/share/filebeat/data
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: varlog
mountPath: /var/log
readOnly: true
volumes:
- name: config
configMap:
defaultMode: 0640
name: filebeat-config
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: varlog
hostPath:
path: /var/log
# data folder stores a registry of read status for all files, so we don't send everything again on a Filebeat pod restart
- name: data
hostPath:
# When filebeat runs as non-root user, this directory needs to be writable by group (g+w).
path: /var/lib/filebeat-data
type: DirectoryOrCreate
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: filebeat
subjects:
- kind: ServiceAccount
name: filebeat
namespace: kube-system
roleRef:
kind: ClusterRole
name: filebeat
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: filebeat
labels:
k8s-app: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- namespaces
- pods
- nodes
verbs:
- get
- watch
- list
- apiGroups: ["apps"]
resources:
- replicasets
verbs: ["get", "list", "watch"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: filebeat
namespace: kube-system
labels:
k8s-app: filebeat
---
类似问题的 Elasticsearch 讨论论坛:(https://discuss.elastic.co/t/kubernetes-autodiscovery-pod-with-multiple-log-files/207405)