2

使用生命周期管理器配置 DSE 集群总是失败。正确安装主节点(也是运行 OpsCenter 的一个)。其他每个节点都无法执行安装(也是配置)任务。已仔细检查 SSH 凭据和端口。关于如何进一步调查和解决问题的任何想法都会很棒。

请原谅长度 - 试图提供所有相关信息。

Ubuntu 14.04.4,JRE:1.8.0.91,DSE 5.0.0

工作事件:

   ...
    "results": [
        {
            "event-subtype": "start",
            "event-type": "milestone",
            "message": "job started...",
            ...
        },
        {
            "event-subtype": "invocation",
            "event-type": "shell-command",
            "message": "Invoked command: if [ -x $(which yum) ] && [ -f /etc/redhat-release -o -f /etc/SuSE-release ]; then echo -n yum; elif [ -x $(which apt-get) ]; then echo -n apt; fi"
            ...
        },
        {
            "event-subtype": "uploaded-facts",
            "event-type": "milestone",
            "message": "Uploaded facts to OpsCenter server",
            ...
        },
        {
            "event-subtype": "meld-error",
            "event-type": "error",
            "message": "Unexpected error executing meld",
            ...
        },
        {
            "event-subtype": "MeldError",
            "event-type": "error",
            "message": "Meld failed on: name=\"NODE-2\" ssh-management-address=\"<IP>\" node-id=\"<node-id>\" job-id=\"<job-id>\" stdout=\"\r\n\" stderr=\"\"",
            ...
        }
    ]

opscenterd.log

/var/log/opscenter/opscenterd.log-2016-07-02 16:34:16,848 [opscenterd]  INFO: Install job started for node name="NODE-2" ssh-management-address="<IP>" node-id="<node-id>" (async-thread-macro-53)
/var/log/opscenter/opscenterd.log-2016-07-02 16:34:16,850 [opscenterd]  INFO: using ssh-private-key (async-thread-macro-53)
/var/log/opscenter/opscenterd.log-2016-07-02 16:34:18,478 [opscenterd]  INFO: Received milestone from node name="NODE-2" ssh-management-address="<IP>" node-id="<node-id>" message="Uploaded facts to OpsCenter server" job-id="a630c081-6ac1-4b00-ac08-18fef320e0d5" (MainThread)
/var/log/opscenter/opscenterd.log:2016-07-02 16:34:18,675 [opscenterd] ERROR: Received error from node event-subtype="meld-error" job-id="a630c081-6ac1-4b00-ac08-18fef320e0d5" name="NODE-2" traceback="Traceback (most recent call last):
/var/log/opscenter/opscenterd.log:  File \"meld.py\", line 3313, in run
/var/log/opscenter/opscenterd.log-    rc = engine.go()
/var/log/opscenter/opscenterd.log:  File \"meld.py\", line 2991, in go
/var/log/opscenter/opscenterd.log-    self.file_manager.get_config_files()
/var/log/opscenter/opscenterd.log:  File \"meld.py\", line 1280, in get_config_files
/var/log/opscenter/opscenterd.log-    {\"accept\": \"application/json\"})
/var/log/opscenter/opscenterd.log:  File \"meld.py\", line 598, in get
/var/log/opscenter/opscenterd.log-    return json.loads(response.read())
/var/log/opscenter/opscenterd.log-  File \"/usr/lib/python2.7/socket.py\", line 351, in read
/var/log/opscenter/opscenterd.log-    data = self._sock.recv(rbufsize)
/var/log/opscenter/opscenterd.log-  File \"/usr/lib/python2.7/httplib.py\", line 549, in read
/var/log/opscenter/opscenterd.log-    return self._read_chunked(amt)
/var/log/opscenter/opscenterd.log-  File \"/usr/lib/python2.7/httplib.py\", line 609, in _read_chunked
/var/log/opscenter/opscenterd.log-    value.append(self._safe_read(amt))
/var/log/opscenter/opscenterd.log-  File \"/usr/lib/python2.7/httplib.py\", line 666, in _safe_read
/var/log/opscenter/opscenterd.log-    raise IncompleteRead(''.join(s), amt)
/var/log/opscenter/opscenterd.log:IncompleteRead: IncompleteRead(4153 bytes read, 4039 more expected)" ssh-management-address="<IP>" node-id="<node-id>" event-type="error" message="Unexpected error executing meld" (MainThread)
/var/log/opscenter/opscenterd.log-2016-07-02 16:34:18,892 [opscenterd] ERROR: Install job a630c081-6ac1-4b00-ac08-18fef320e0d5 failed! (async-thread-macro-54)
/var/log/opscenter/opscenterd.log:2016-07-02 16:34:19,105 [opscenterd] ERROR: Meld failed on: name="NODE-2" ssh-management-address="<IP>" node-id="<node-id>" job-id="a630c081-6ac1-4b00-ac08-18fef320e0d5" stdout="
/var/log/opscenter/opscenterd.log-" stderr="" (async-thread-macro-53)

谢谢

编辑:捕获 NODE2 和 master 之间的 HTTP 流量。传输配置文件时发生错误。其中之一由于某种原因没有完全转移。json 看起来很合理,直到出现一些乱码。

 {"filename": "dse.yaml", "contents": {"internode_messaging_options": {"client_worker_threads": 16, "port": 8609, "server_worker_threads": 16, "server_acceptor_thread

Yvatv+~UK{.kMI4^QOrqQTDX_3"DPm,v!"H&M$!1M7

LRYCs{l>-df;cj

W6C9dq

配置文件有效并且在主节点上工作。只有复制失败。

4

2 回答 2

1

OpsCenter LCM 开发人员在这里。您的问题是由 LCM 已知问题列表中的 OPSC-8851 引起的:http: //docs.datastax.com/en/opscenter/6.0/opsc/release_notes/opscReleaseNotes600.html

这仅在某些网络条件下触发,并且被发现距离发布太近而无法在 6.0.0 中修复。虽然这是一个高优先级,但很快将在后续版本中修复。不幸的是,我不认为你可以做任何事情来解决这个问题。如果您是 DataStax 客户,您可以联系支持人员并可能立即获得补丁来解决此问题……否则我唯一能建议的就是观看即将发布的发行说明。

编辑:我还应该注意,在我们的测试中,问题是间歇性的。LCM 的设计目的是让您可以安全地重新运行失败的作业(也就是幂等),因此除了最极端的情况外,您也可以通过重新运行您的作业来解决这个问题。

于 2016-07-05T14:17:13.677 回答
0

您可以为监听地址指定私有 IP,为广播地址指定 0.0.0.0,LCM 应该能够适当地进行配置。

于 2016-07-07T00:17:37.923 回答