adobe - AEM 6.2 Mongo 副本集自动故障转移不起作用

Question

安装 AEM-author 和 Mongo 副本集后似乎运行良好。我安装的 AEM 版本是 6.2

所以我尝试通过以下方法检查自动故障转移能力。1. 停止当前 Primary 的 mongod 实例 2. 通过发出 rs.status() mongo 命令检查 Secondary 是否成为 Primary 3. 并检查 AEM-author 的 logs/error.log

Mongo 副本集似乎正确地进行了故障转移。但是 AEM-author 因显示以下错误而崩溃。

/home/vagrant/mounts/author1/aem/22_crx-quickstart/logs/error__5.log:01.11.2016 12:36:06.386 *ERROR* [pool-44-thread-1] org.apache.sling.serviceusermapping.impl.ServiceUserMapperImpl cannot unregister ServiceUserMapped Mapping [serviceName=com.adobe.cq.social.cq-social-messaging, subServiceName=utility-reader, userName=communities-utility-reader]
/home/vagrant/mounts/author1/aem/22_crx-quickstart/logs/error__5.log:01.11.2016 12:36:06.386 *ERROR* [pool-44-thread-1] org.apache.sling.serviceusermapping.impl.ServiceUserMapperImpl cannot unregister ServiceUserMapped Mapping [serviceName=com.adobe.cq.social.cq-social-messaging, subServiceName=acl-manager, userName=communities-acl-manager]
/home/vagrant/mounts/author1/aem/22_crx-quickstart/logs/error__5.log:01.11.2016 12:36:06.964 *ERROR* [FelixDispatchQueue] org.apache.felix.http.jetty FrameworkEvent ERROR (org.osgi.framework.BundleException: Activator stop error in bundle org.apache.felix.http.jetty [36].)
/home/vagrant/mounts/author1/aem/22_crx-quickstart/logs/error_6.log:01.11.2016 13:27:59.516 *ERROR* [DocumentDiscoveryLiteService-BackgroundWorker-[2]] org.apache.jackrabbit.oak.plugins.document.DocumentDiscoveryLiteService doRun: got an exception: com.mongodb.MongoTimeoutException: Timed out after 10000 ms while waiting for a server that matches {serverSelectors=[ReadPreferenceServerSelector{readPreference=primary}, LatencyMinimizingServerSelector{acceptableLatencyDifference=15 ms}]}. Client view of cluster state is {type=ReplicaSet, servers=[{address=172.18.8.248:27017, type=ReplicaSetArbiter, averageLatency=1.0 ms, state=Connected}, {address=SERVW0014:27017, type=Unknown, state=Connecting, exception={com.mongodb.MongoException$Network: Exception opening the socket}, caused by {java.net.SocketException: Connection reset}}, {address=SERVW0015:27017, type=ReplicaSetSecondary, averageLatency=1.3 ms, state=Connected}]
/home/vagrant/mounts/author1/aem/22_crx-quickstart/logs/error_6.log:01.11.2016 13:29:46.343 *ERROR* [DocumentNodeStore background read thread (2)] org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo This oak instance failed to update the lease in time and can therefore no longer access this DocumentNodeStore. (leaseEndTime: 1477974601170, leaseTime: 120000, leaseFailureMargin: 20000, lease check end time (leaseEndTime-leaseFailureMargin): 1477974581170, now: 1477974585328, remaining: -4158) Need to stop oak-core/DocumentNodeStoreService.
/home/vagrant/mounts/author1/aem/22_crx-quickstart/logs/error_6.log:01.11.2016 13:29:46.343 *ERROR* [LeaseFailureHandler-Thread] org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService handleLeaseFailure: stopping oak-core...
/home/vagrant/mounts/author1/aem/22_crx-quickstart/logs/error_6.log:01.11.2016 13:29:46.422 *ERROR* [LeaseFailureHandler-Thread] org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo This oak instance failed to update the lease in time and can therefore no longer access this DocumentNodeStore.
/home/vagrant/mounts/author1/aem/22_crx-quickstart/logs/error_6.log:01.11.2016 13:29:46.422 *ERROR* [LeaseFailureHandler-Thread] org.apache.sling.discovery.oak [org.apache.sling.discovery.oak.OakDiscoveryService(256)] The updatedPropertyProvider method has thrown an exception (com.google.common.util.concurrent.ExecutionError: java.lang.AssertionError: This oak instance failed to update the lease in time and can therefore no longer access this DocumentNodeStore.)
/home/vagrant/mounts/author1/aem/22_crx-quickstart/logs/error_6.log:01.11.2016 13:29:46.453 *ERROR* [LeaseFailureHandler-Thread] org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo This oak instance failed to update the lease in time and can therefore no longer access this DocumentNodeStore.
/home/vagrant/mounts/author1/aem/22_crx-quickstart/logs/error_6.log:01.11.2016 13:29:46.453 *ERROR* [LeaseFailureHandler-Thread] com.adobe.cq.social.cq-social-scf-impl [com.adobe.cq.social.scf.impl.SocialComponentFactoryManagerImpl(2527)] The unbindFactories method has thrown an exception (com.google.common.util.concurrent.ExecutionError: java.lang.AssertionError: This oak instance failed to update the lease in time and can therefore no longer access this DocumentNodeStore.)
/home/vagrant/mounts/author1/aem/22_crx-quickstart/logs/error_6.log:01.11.2016 13:29:46.500 *ERROR* [LeaseFailureHandler-Thread] org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo This oak instance failed to update the lease in time and can therefore no longer access this DocumentNodeStore.
/home/vagrant/mounts/author1/aem/22_crx-quickstart/logs/error_6.log:01.11.2016 13:29:46.500 *ERROR* [LeaseFailureHandler-Thread] com.adobe.cq.dtm.impl.DTMJobsInitializer Could not obtain a resource resolver.
/home/vagrant/mounts/author1/aem/22_crx-quickstart/logs/error_6.log:01.11.2016 13:29:46.625 *ERROR* [LeaseFailureHandler-Thread] org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo This oak instance failed to update the lease in time and can therefore no longer access this DocumentNodeStore.
/home/vagrant/mounts/author1/aem/22_crx-quickstart/logs/error_6.log:01.11.2016 13:29:46.625 *ERROR* [LeaseFailureHandler-Thread] org.apache.sling.discovery.oak [org.apache.sling.discovery.oak.OakDiscoveryService(256)] The updatedPropertyProvider method has thrown an exception (com.google.common.util.concurrent.ExecutionError: java.lang.AssertionError: This oak instance failed to update the lease in time and can therefore no longer access this DocumentNodeStore.)

我试图根据adobe论坛解决问题，但我无法解决问题。

http://help-forums.adobe.com/content/adobeforums/en/experience-manager-forum/adobe-experience-manager.topic.html/forum__r93i-hi_friends_icam.html

有人可以帮助我为什么会出现此问题并让我知道如何解决此问题吗？

问候

score 0 · Accepted Answer

您的问题是您无法连接到新的主 mongodb 实例（至少不是在所需的时间内）.. 我建议在您的问题中添加 mongodb 的标签，因为该问题与 mongodb 相关并且有更多用户谁比jackrabbit-oak更了解mongodb。回到问题：你能从运行你的jackrabbit Oak实例的机器上ping你的新主节点吗？你的副本集需要多长时间来选举新的主节点？如果这超过 10 秒，您将需要更改一些 mongo db 配置设置。你能发布 rs.status() 的结果吗？

score 0 · Accepted Answer

感谢您的评论和建议。

我已经自己解决了这个问题。我接近的方式也许是正确的。

此问题是在 AEM 中传递给 MongoDBDriver 的 WriteConcern 参数。我将 mongi.uri 更改为关注，所以这个问题得到了解决。

-Doak.mongo.uri=mongodb://PrimaryHost:27017,SecondoryHost:27017/?replicaSet=rs0&readPreference=nearest
↓
-Doak.mongo.uri=mongodb://PrimaryHost:27017,SecondoryHost:27017/?replicaSet=rs0&readPreference=nearest&w=1&j=1

我忘记了关于我的副本集成员的帖子。我们的副本集由 Primary、Secondary 和 Arbiter 组成。

当我检查 Oak.jackrabit API 时，MongoDiver 的默认 WriteConcern 是“多数” https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/plugins/document/util/MongoConnection.html# getDefaultWriteConcern(com.mongodb.DB)

当副本集的一个成员（不包括仲裁器）关闭时，AEM 无法确认写操作，因为写操作无法传播到大多数成员。

当我将 WriteConcern 更改为 w=1 时，写入操作得到确认，并且 AEM 仍然可以正常工作。

你怎么这么认为？你有什么顾虑吗？

adobe - AEM 6.2 Mongo 副本集自动故障转移不起作用

2 回答 2

Related

Reference