云端内容采集(上海驻云自大数据统一分析平台-DataFluxStudio实时数据洞察平台 )
优采云 发布时间: 2022-02-01 11:18云端内容采集(上海驻云自大数据统一分析平台-DataFluxStudio实时数据洞察平台
)
DataFlux是上海住云自主研发的一套统一的大数据分析平台,可以通过对任意来源、任意类型、任意规模的实时数据进行监测、分析和处理,释放数据价值。
DataFlux 包括五个功能模块:
- 数据包 采集器
- Dataway 数据网关
- DataFlux Studio 实时数据洞察平台
- DataFlux Admin Console 管理后台
- DataFlux.f(x) 实时数据处理开发平台
为企业提供全场景数据洞察分析能力,具有实时性、灵活性、易扩展性、易部署性。
安装 DataKit
PS:以Linux系统为例
第一步:执行安装命令
DataKit 安装命令:
DK_FTDATAWAY=[你的 DataWay 网关地址] bash -c "$(curl https://static.dataflux.cn/datakit/install.sh)"
在安装命令中添加DataWay网关地址,然后将安装命令复制到主机执行。
例如:如果DataWay网关的IP地址为1.2.3.4,端口为9528(9528为默认端口),则网关地址为
DK_FTDATAWAY=http://1.2.3.4:9528/v1/write/metrics bash -c "$(curl https://static.dataflux.cn/datakit/install.sh)"
安装完成后DataKit会默认自动运行,并会在终端提示DataKit状态管理命令
Hadoop 监控指标采集
前提
配置
打开DataKit采集源码配置文件夹(默认路径是DataKit安装目录的conf.d文件夹),找到jolokia2_agent文件夹,打开里面的jolokia2_agent.conf。
设置:
[[inputs.jolokia2_agent]]
urls = ["http://localhost:8778/jolokia"]
name_prefix = "hadoop.hdfs.namenode."
[[inputs.jolokia2_agent.metric]]
name = "FSNamesystem"
mbean = "Hadoop:name=FSNamesystem,service=NameNode"
paths = ["CapacityTotal", "CapacityRemaining", "CapacityUsedNonDFS", "NumLiveDataNodes", "NumDeadDataNodes", "NumInMaintenanceDeadDataNodes", "NumDecomDeadDataNodes"]
[[inputs.jolokia2_agent.metric]]
name = "FSNamesystemState"
mbean = "Hadoop:name=FSNamesystemState,service=NameNode"
paths = ["VolumeFailuresTotal", "UnderReplicatedBlocks", "BlocksTotal"]
[[inputs.jolokia2_agent.metric]]
name = "OperatingSystem"
mbean = "java.lang:type=OperatingSystem"
paths = ["ProcessCpuLoad", "SystemLoadAverage", "SystemCpuLoad"]
[[inputs.jolokia2_agent.metric]]
name = "jvm_runtime"
mbean = "java.lang:type=Runtime"
paths = ["Uptime"]
[[inputs.jolokia2_agent.metric]]
name = "jvm_memory"
mbean = "java.lang:type=Memory"
paths = ["HeapMemoryUsage", "NonHeapMemoryUsage", "ObjectPendingFinalizationCount"]
[[inputs.jolokia2_agent.metric]]
name = "jvm_garbage_collector"
mbean = "java.lang:name=*,type=GarbageCollector"
paths = ["CollectionTime", "CollectionCount"]
tag_keys = ["name"]
[[inputs.jolokia2_agent.metric]]
name = "jvm_memory_pool"
mbean = "java.lang:name=*,type=MemoryPool"
paths = ["Usage", "PeakUsage", "CollectionUsage"]
tag_keys = ["name"]
tag_prefix = "pool_"
################
# DATANODE #
################
[[inputs.jolokia2_agent]]
urls = ["http://localhost:7778/jolokia"]
name_prefix = "hadoop.hdfs.datanode."
[[inputs.jolokia2_agent.metric]]
name = "FSDatasetState"
mbean = "Hadoop:name=FSDatasetState,service=DataNode"
paths = ["Capacity", "DfsUsed", "Remaining", "NumBlocksFailedToUnCache", "NumBlocksFailedToCache", "NumBlocksCached"]
[[inputs.jolokia2_agent.metric]]
name = "OperatingSystem"
mbean = "java.lang:type=OperatingSystem"
paths = ["ProcessCpuLoad", "SystemLoadAverage", "SystemCpuLoad"]
[[inputs.jolokia2_agent.metric]]
name = "jvm_runtime"
mbean = "java.lang:type=Runtime"
paths = ["Uptime"]
[[inputs.jolokia2_agent.metric]]
name = "jvm_memory"
mbean = "java.lang:type=Memory"
paths = ["HeapMemoryUsage", "NonHeapMemoryUsage", "ObjectPendingFinalizationCount"]
[[inputs.jolokia2_agent.metric]]
name = "jvm_garbage_collector"
mbean = "java.lang:name=*,type=GarbageCollector"
paths = ["CollectionTime", "CollectionCount"]
tag_keys = ["name"]
[[inputs.jolokia2_agent.metric]]
name = "jvm_memory_pool"
mbean = "java.lang:name=*,type=MemoryPool"
paths = ["Usage", "PeakUsage", "CollectionUsage"]
tag_keys = ["name"]
tag_prefix = "pool_"
配置好后重启DataKit生效
验证数据报告
完成数据采集操作后,我们需要验证数据采集是否成功并上报给DataWay,以便日后可以正常分析和展示数据。
操作步骤:登录DataFlux-数据管理-指标浏览-验证数据采集是否成功
Hadoop 性能指标:
DataFlux 的数据洞察力
根据获得的指标进行数据洞察设计,如:
Hadoop 性能监控视图
基于自研DataKit数据(采集器),DataFlux现在可以对接200多种数据协议,包括:云数据采集、应用数据采集、日志数据采集,时序数据上报和常用数据库的数据聚合,帮助企业实现最便捷的IT统一监控。