10

请你帮我解决以下问题。

我在 node.js 上有一个后端服务,我将它部署在 GCE VM 上。它工作正常,但在安装日志记录和监控代理后,我在日志查看器中看到非常奇怪的日志。我查看了生成该日志的付费。它是堆栈驱动程序代理。

这是他们:

A 2020-05-15T22:45:26Z write_gcm: can not take infinite value
A 2020-05-15T22:45:26Z write_gcm: wg_typed_value_create_from_value_t_inline failed for swap/percent/value! Continuing. 
A 2020-05-15T22:45:26Z write_gcm: can not take infinite value 
A 2020-05-15T22:45:26Z write_gcm: wg_typed_value_create_from_value_t_inline failed for swap/percent/value! Continuing. 
A 2020-05-15T22:45:26Z write_gcm: can not take infinite value 
A 2020-05-15T22:45:26Z write_gcm: wg_typed_value_create_from_value_t_inline failed for swap/percent/value! Continuing. 
A 2020-05-15T22:45:28Z write_gcm: Server response (CollectdTimeseriesRequest) contains errors:#012{#012  "payloadErrors": [#012    {#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    },#012    {#012      "index": 5,#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    },#012    {#012      "index": 10,#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    },#012    {#012      "index": 15,#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    },#012    {#012      "index": 20,#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    },#012    {#012      "index": 25 
A 2020-05-15T22:45:29Z write_gcm: Server response (CollectdTimeseriesRequest) contains errors:#012{#012  "payloadErrors": [#012    {#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    }#012  ]#012} 
A 2020-05-15T22:45:29Z write_gcm: Unsuccessful HTTP request 400: {#012  "error": {#012    "code": 400,#012    "message": "Field timeSeries[3].points[0].interval.start_time had an invalid value of \"2020-05-15T15:45:27.348251-07:00\": The start time must be before the end time (2020-05-15T15:45:27.348251-07:00) for the non-gauge metric 'agent.googleapis.com/agent/api_request_count'.",#012    "status": "INVALID_ARGUMENT"#012  }#012} 
A 2020-05-15T22:45:29Z write_gcm: Error talking to the endpoint. 
A 2020-05-15T22:45:29Z write_gcm: wg_transmit_unique_segment failed. 
A 2020-05-15T22:45:29Z write_gcm: wg_transmit_unique_segments failed. Flushing. 

所以,每分钟我都会看到这样的日志出现。当我停止 stackdriver-agent 服务时,它们消失了。我的项目中有 4 个虚拟机。只有其中两个出现这样的问题在 Cent OS7 VM 和 Ubuntu 18 VM 上

4

1 回答 1

3

到目前为止,有 2 个 PIT:

最后一个有谷歌工程师对错误的解释400

这些消息令人讨厌但无害。您不会丢失任何指标。您可以放心地忽略这些日志。

根本原因是服务器端配置更改并影响所有代理。该更改仅影响响应的详细程度,而不影响请求的处理。一些传入的指标在该更改之前被静默删除,现在被嘈杂地删除。

这些指标默认由上游 collectd 插件发送,我们没有任何控件可以完全阻止这些指标被发送。日志垃圾邮件消息来自 collectd 对这些指标的内部处理。

如果您想过滤掉您看到的所有嘈杂的日志,您可以创建一个 Log Exclusion[1][2] 或 Log Sink[3][4]。日志排除会将日志与指定的过滤器匹配,并在它们进入之前将它们从日志查看器中删除,而日志接收器将获取日志并将它们定向到存储桶、大查询表或 PubSub 主题。

关于swap有一篇博文:

发生此错误是因为 VM 实例没有交换内存,因此此指标插件尝试除以 0。

要解决此问题,请删除此配置并重新启动stackdriver-agent.

于 2020-10-25T08:16:34.013 回答