解决方案:Logging Operator - 优雅的云原生日志管理方案 (一)

优采云发布时间: 2022-11-08 11:19

　　Logging Operator 是 BanzaiCloud 下云原生场景的开源 log采集解决方案。小白转载了之前崔老师介绍的一篇文章文章，但是由于我一直认为在单个k8s集群下同时管理Fluent bit和Fluentd服务在架构上比较臃肿，所以留下了一篇不适用的. 初步印象。后来小白在做多租户场景下k8s集群的日志管理方案时，发现传统的日志配置统一管理方式的灵活性很弱。通常，运维人员会站在全局的角度，尝试将日志配置做成模板来适应业务。久而久之，模板会变得很大很臃肿，

　　直到这段时间学习了 Logging Operator，才发现用 Kubernetes 的方式来管理日志是很舒服的。在开始之前，我们先来看看它的架构。

　　可以看出，Logging Operator 使用 CRD 的方式从采集介入日志的配置、路由、输出。本质上，它使用 DaemonSet 和 StatefulSet 分别在集群中部署 FluentBit 和 Fluentd。FluentBit 转发容器日志采集，初步处理后转发给 Fluentd 做进一步分析和路由。将日志结果转发到不同的服务。

　　所以服务容器化后，我们可以讨论日志的输出标准是打印到标准输出还是文件。

　　除了管理日志工作流之外，Logging Operator 还允许管理员开启 TLS 对集群内日志的网络传输进行加密，并默认集成 ServiceMonitor 以暴露日志采集端的状态。当然，最重要的是由于CRD的配置，我们的日志策略最终可以实现集群内的多租户管理。

　　1.Logging算子CRD

　　整个 Logging Operator 中只有 5 个核心 CRD，它们分别是

　　通过这 5 个 CRD，我们可以自定义 Kubernetes 集群中每个命名空间中的容器日志流

　　2. Logging Operator 安装

　　Logging Operator 依赖于 Kubernetes 1.14 之后的版本，可以通过 helm 和 mainfest 两种方式安装。

　　$ helm repo add banzaicloud-stable https://kubernetes-charts.banzaicloud.com

$ helm repo update

$ helm upgrade --install --wait --create-namespace --namespace logging logging-operator banzaicloud-stable/logging-operator \

--set createCustomResource=false"

　　$ kubectl create ns logging

# RBAC

$ kubectl -n logging create -f https://raw.githubusercontent.com/banzaicloud/logging-operator-docs/master/docs/install/manifests/rbac.yaml

# CRD

$ kubectl -n logging create -f https://raw.githubusercontent.com/banzaicloud/logging-operator/master/config/crd/bases/logging.banzaicloud.io_clusterflows.yaml

$ kubectl -n logging create -f https://raw.githubusercontent.com/banzaicloud/logging-operator/master/config/crd/bases/logging.banzaicloud.io_clusteroutputs.yaml

$ kubectl -n logging create -f https://raw.githubusercontent.com/banzaicloud/logging-operator/master/config/crd/bases/logging.banzaicloud.io_flows.yaml

$ kubectl -n logging create -f https://raw.githubusercontent.com/banzaicloud/logging-operator/master/config/crd/bases/logging.banzaicloud.io_loggings.yaml

$ kubectl -n logging create -f https://raw.githubusercontent.com/banzaicloud/logging-operator/master/config/crd/bases/logging.banzaicloud.io_outputs.yaml

# Operator

$ kubectl -n logging create -f https://raw.githubusercontent.com/banzaicloud/logging-operator-docs/master/docs/install/manifests/deployment.yaml

　　安装完成后，我们需要验证服务的状态

　　# Operator状态

$ kubectl -n logging get pods

NAME READY STATUS RESTARTS AGE

logging-logging-operator-599c9cf846-5nw2n 1/1 Running 0 52s

# CRD状态

$ kubectl get crd |grep banzaicloud.io

NAME CREATED AT

clusterflows.logging.banzaicloud.io 2021-03-25T08:49:30Z

clusteroutputs.logging.banzaicloud.io 2021-03-25T08:49:30Z

flows.logging.banzaicloud.io 2021-03-25T08:49:30Z

loggings.logging.banzaicloud.io 2021-03-25T08:49:30Z

outputs.logging.banzaicloud.io 2021-03-25T08:49:30Z

　　3. Logging Operator 配置 3.1 loggingLoggingSpec

　　LoggingSpec 定义了用于采集和传输日志消息的日志基础设施服务，其中收录 Fluentd 和 Fluent-bit 的配置。它们都部署在 controlNamespace 指定的命名空间中。一个简单的例子如下：

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

namespace: logging

spec:

fluentd: {}

fluentbit: {}

controlNamespace: logging

　　此示例告诉操作员在 logging 命名空间中创建一个默认的配置日志服务，其中包括 FluentBit 和 Fluentd 两个服务

　　当然，其实我们在生产环境中部署 FluentBit 和 Fluentd 时不会只使用默认配置。通常我们要考虑很多方面，比如：

　　好在Loggingspec中对以上的支持比较全面，我们可以参考文档来定制自己的服务

　　小白挑选了几个重要的领域来解释以下用途：

　　为 Operator 创建一个命名空间以监控 Flow 和 OutPut 资源。如果是多租户场景，并且每个租户都使用 logging 来定义日志 schema，可以使用 watchNamespaces 关联租户的命名空间来缩小资源过滤的范围

　　ClusterOutput、ClusterFlow 等全局资源默认只在与 controlNamespace 关联的命名空间中生效。如果它们在其他命名空间中定义，它们将被忽略，除非 allowClusterResourcesFromAllNamespaces 设置为 true

　　LoggingSpec 说明文档：

　　FluentbitSpec

　　用于获取 Kubernetes 日志元数据的插件。使用示例如下：

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

spec:

fluentd: {}

fluentbit:

filterKubernetes:

Kube_URL: "https://kubernetes.default.svc:443"

Match: "kube.*"

controlNamespace: logging

　　您还可以使用 disableKubernetesFilter 禁用此功能，例如：

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

spec:

fluentd: {}

fluentbit:

disableKubernetesFilter: true

controlNamespace: logging

　　filterKubernetes 描述文档：#filterkubernetes

　　定义FluentBit的log tail采集配置，有很多详细的参数需要控制，小白直接贴出使用中的配置示例：

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

spec:

fluentbit:

inputTail:

Skip_Long_Lines: "true"

#Parser: docker

Parser: cri

Refresh_Interval: "60"

Rotate_Wait: "5"

Mem_Buf_Limit: "128M"

#Docker_Mode: "true"

Docker_Mode: "false

　　如果 Kubernetes 集群的容器运行时是 Containerd 或其他 CRI，则需要将 Parser 更改为 cri 并禁用 Docker_Mode

　　inputTail 描述文档：#inputtail

　　定义 FluentBit 的缓冲设置，这点比较重要。由于 FluentBit 以 DaemonSet 的形式部署在 Kubernetes 集群中，我们可以直接使用 hostPath 的卷挂载方式为其提供数据持久化配置，例如如下：

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

spec:

fluentbit:

bufferStorage:

storage.backlog.mem_limit: 10M

storage.path: /var/log/log-buffer

bufferStorageVolume:

hostPath:

path: "/var/log/log-buffer"

　　bufferStorage 描述文档：#bufferstorage

　　定义了 FluentBit采集日志的文件位置信息。同样，我们可以使用 hostPath 方法来支持它。示例如下：

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

spec:

fluentbit:

positiondb:

hostPath:

path: "/var/log/positiondb"

　　提供自定义FluentBit图片信息，这里强烈推荐使用FluentBit-1.7.3之后的图片，它修复了采集端很多网络连接超时的问题，其示例如下：

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

spec:

fluentbit:

image:

repository: fluent/fluent-bit

tag: 1.7.3

pullPolicy: IfNotPresent

　　定义了 FluentBit 的监控暴露端口和集成的 ServiceMonitor采集定义，其示例如下：

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

spec:

fluentbit:

metrics:

interval: 60s

path: /api/v1/metrics/prometheus

port: 2020

serviceMonitor: true

　　定义 FluentBit 的资源分配和限制信息，例如：

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

spec:

fluentbit:

resources:

limits:

cpu: "1"

memory: 512Mi

requests:

cpu: 200m

memory: 128Mi

　　定义 FluentBit 运行过程中的安全设置，包括 PSP、RBAC、securityContext 和 podSecurityContext。它们共同控制 FluentBit 容器内的权限。它们的例子如下：

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

spec:

fluentbit:

security:

podSecurityPolicyCreate: true

roleBasedAccessControlCreate: true

securityContext:

allowPrivilegeEscalation: false

readOnlyRootFilesystem: true

podSecurityContext:

fsGroup: 101

　　这定义了 FluentBit 的一些性能参数，包括：

　　1.启用forward转发上游响应响应

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

spec:

fluentbit:

forwardOptions:

Require_ack_response: true

　　2、TCP连接参数

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

spec:

fluentbit:

network:

connectTimeout: 30

keepaliveIdleTimeout: 60

　　3.开启负载均衡模式

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

spec:

fluentbit:

enableUpstream: true

　　4.调度污点容限

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

spec:

fluentbit:

tolerations:

- effect: NoSchedule

key: node-role.kubernetes.io/master

　　流利规范

　　Fluentd的缓冲区数据持久化配置主要在这里定义。由于 Fluentd 是以 StatefulSet 的形式部署的，所以不适合我们使用 hostPath。这里我们应该使用 PersistentVolumeCliamTemplate 的方法为每个 fluentd 实例创建一个特殊的缓冲区数据卷。一个例子如下：

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

spec:

fluentd:

bufferStorageVolume:

pvc:

spec:

accessModes:

- ReadWriteOnce

resources:

requests:

storage: 50Gi

storageClassName: csi-rbd

volumeMode: Filesystem

　　如果此处未指定 storageClassName，则操作员将通过默认 StorageClass 的存储插件创建 pvc

　　定义了Fluentd的标准输出到文件配置的重定向，主要是为了避免Fluentd发生错误时的连锁反应，将错误信息作为日志信息返回给系统产生另一个错误，例如如下：

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

spec:

fluentd:

fluentOutLogrotate:

enabled: true

path: /fluentd/log/out

age: 10

size: 10485760

　　这里表达的意思是将fluentd日志重定向到/fluentd/log/out目录下，保存10天，最大文件大小不要超过10M

　　FluentOutLogrotate 描述文档：#fluentoutlogrotate

　　这里主要定义fluentd的副本数。如果 FluentBit 启用 UpStraem 支持，调整 Fluentd 的副本数将导致 FluentBit 滚动更新。它的例子如下：

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

spec:

fluentd:

scaling:

replicas: 4

　　缩放描述文档：#fluentdscaling

　　Fluentd 中的 worker 数量在这里定义。由于 Fluentd 受到 ruby 的限制，它仍然在单个进程中处理日志工作流。增加worker的数量可以显着提高Fluentd的并发性。示例如下：

　　apiVersion: logging.banzaicloud.io/v1beta1

kind: Logging

metadata:

spec:

fluentd:

workers: 2

　　当worker数大于1时，Operator-3.9.2之前的版本对Fluentd缓冲区数据的持久化存储不够友好，可能导致Fluentd容器崩溃。

　　定义了 FluentD 的图像信息。这里必须使用 Logging Operator 自定义的镜像。图像版本可以定制。结构类似于 FluetBit。

　　定义 FluentD 运行过程中的安全设置，包括 PSP、RBAC、securityContext 和 podSecurityContext，结构类似于 FluetBit。

　　定义了FluentD的监控暴露端口，以及集成的ServiceMonitor采集的定义，结构与FluetBit类似。

　　定义FluentD的资源分配和限制信息，结构与FluetBit类似。

　　分阶段总结

　　本文介绍了 Logging Operator 的架构、部署和 CRD，并详细描述了 Logging 的定义和重要参数。当我们想将 Operators 用于生产环境的采集日志时，它们会变得非常重要，使用前请参考文档。

　　由于Logging Operator的内容非常多，接下来几期会更新Flow、ClusterFlow、Output、ClusterOutput以及各种Plugins的使用，请继续关注

　　你可能还喜欢

　　点击下图阅读

　　分享:产品及解决方案 / 易海聚编译平台 / 智能编译平台 / 网络采编发平台

　　易海居编译平台产品致力于为媒体单位和信息研究单位提供专业的信息采集、自动翻译、编辑审校、自动出版等综合性信息采集处理平台解决方案之一。分为逻辑功能，包括采集存储层、翻译整理层、编辑审阅层3大逻辑功能实现层。宜海居编译平台可以灵活部署在云服务和本地服务器上，可以分布式采集和存储。

　　系统可以对指定网站、公司新闻、指定新闻媒体、数据库、政府单位政策法规等进行专业、快速、准确的采集；它还可以自动将采集搜索引擎*敏*感*词*关键词组合搜索结果。准确采集后，信息会自动分类，自动翻译，智能聚类，智能标签提取分析；模仿新闻审核流程，用户可以编辑和处理任何文章审核、审核、待审核等；信息最终可以通过接口直接发布到指定平台、APP、数据库。

　　一海居编译平台的功能实现追求标准化、开放性、完整性、健壮性、灵活性、可监控性、安全性、可操作性和可维护性的要求，同时遵循松耦合、模块化、可复用、可配置的原则，保持可扩展性，为客户提供提供可衡量的标准服务。

0

2022-11-08

云采集

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

解决方案:Logging Operator - 优雅的云原生日志管理方案 (一)

0 个评论

发起人