12

我有一个在 Amazon Web Services 上运行的网站,该网站使用 Elastic Beanstalk 部署并在至少 2 个 EC2 微实例上运行。有一个 Auto Scaling 策略,因此它可以根据网站中的流量进行扩展和缩减。由于这个自动缩放策略,我想避免使用粘性会话,因此我使用memcached-session-manager。我将 Amazon ElastiCache(小型实例)用于 memcached 服务器。

context.xml中的配置如下:

<Manager className="de.javakaffee.web.msm.MemcachedBackupSessionManager"
    memcachedNodes="sessions.myinstancecode.0001.use1.cache.amazonaws.com:11211"
    sticky="false"
    sessionBackupAsync="false"
    lockingMode="none"
    transcoderFactoryClass="de.javakaffee.web.msm.serializer.kryo.KryoTranscoderFactory" />

当流量较低(即在线用户少于 10 个)时,这可以正常工作,但有时会导致 EC2 实例重新启动。您可以想象,如果网站当前在两个实例上运行,并且它们都决定同时重新启动,那么网站将变得无法访问,这是一个大问题。这些是在 EC2 实例决定重新启动之前在 Amazon S3 上轮换的 tail_catalina.log 中的最后几行:

Jun 13, 2012 12:32:27 AM de.javakaffee.web.msm.BackupSessionTask handleException
WARNING: Could not store session 42F9761AC24F826E1FC3F2A834FBF442 in memcached.
Note that this session was relocated to this node because the original node was not available.
net.spy.memcached.internal.CheckedOperationTimeoutException: Timed out waiting for operation - failing node: sessions.myinstancecode.0001.use1.cache.amazonaws.com/10.194.23.99:11211
    at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:73)
    at de.javakaffee.web.msm.BackupSessionTask.storeSessionInMemcached(BackupSessionTask.java:230)
    at de.javakaffee.web.msm.BackupSessionTask.doBackupSession(BackupSessionTask.java:195)
    at de.javakaffee.web.msm.BackupSessionTask.call(BackupSessionTask.java:120)
    at de.javakaffee.web.msm.BackupSessionTask.call(BackupSessionTask.java:51)
    at de.javakaffee.web.msm.BackupSessionService$SynchronousExecutorService.submit(BackupSessionService.java:339)
    at de.javakaffee.web.msm.BackupSessionService.backupSession(BackupSessionService.java:198)
    at de.javakaffee.web.msm.MemcachedSessionService.backupSession(MemcachedSessionService.java:967)
    at de.javakaffee.web.msm.SessionTrackerValve.backupSession(SessionTrackerValve.java:226)
    at de.javakaffee.web.msm.SessionTrackerValve.invoke(SessionTrackerValve.java:128)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
    at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:680)
    at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
    at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:636)
Jun 13, 2012 12:32:28 AM de.javakaffee.web.msm.LockingStrategy onAfterBackupSession
WARNING: An error occurred during onAfterBackupSession.
net.spy.memcached.internal.CheckedOperationTimeoutException: Timed out waiting for operation - failing node: sessions.myinstancecode.0001.use1.cache.amazonaws.com/10.194.23.99:11211
    at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:73)
    at de.javakaffee.web.msm.LockingStrategy.onAfterBackupSession(LockingStrategy.java:287)
    at de.javakaffee.web.msm.MemcachedSessionService.backupSession(MemcachedSessionService.java:970)
    at de.javakaffee.web.msm.SessionTrackerValve.backupSession(SessionTrackerValve.java:226)
    at de.javakaffee.web.msm.SessionTrackerValve.invoke(SessionTrackerValve.java:128)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
    at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:680)
    at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
    at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:636)

似乎 Amazon ElastiCache 节点出现故障,但问题是,检查 Amazon CloudWatch,我可以看到 CPU 利用率从未超过 8%。Amazon ElastiCache 节点失败是否有任何原因,即使它没有受到太大压力?此外,当 Amazon ElastiChace 节点出现故障时,为什么 Amazon 决定重新启动(或者更好:终止并启动一个新实例)?

非常感谢任何帮助。

谢谢!

4

1 回答 1

8

您应该从文档中增加 memcached-session-manager 的 sessionBackupTimeout :

sessionBackupTimeout(可选,默认 100)

会话备份被视为失败后的超时时间(以毫秒为单位)。仅当同步存储会话(通过 sessionBackupAsync 设置)时才评估此属性。默认值为 100 毫秒。

于 2012-06-13T13:51:02.690 回答