11

我们有一个 Scala 服务器,它使用由Casbah包装的 Java MongoDB 驱动程序。最近,我们使用 Mongo API 将其数据库从实际的 MongoDB 切换到 Azure CosmosDB。这通常工作正常,但是每隔一段时间,对 Cosmos 的调用就会失败,并出现 MongoSocketWriteException(下面的堆栈跟踪)。

我们将客户端创建为

import com.mongodb.casbah.Imports._

val mongoUrl = "mongodb://username:password@host.documents.azure.com:10255/?ssl=true&replicaSet=globaldb"

val client = MongoClient(MongoClientURI(mongoUrl))
val collection: MongoCollection = client("mongoDatabase")("mongoCollection")

对于这个看似相似的错误(如何解决 MongoError:连接到 CosmosDB 时损坏的池),我们尝试&replicaSet=globaldb按照建议的解决方法从连接 URI 中删除,但它没有解决问题。

堆栈跟踪:

com.mongodb.MongoSocketWriteException: Exception sending message
    at com.mongodb.connection.InternalStreamConnection.translateWriteException(InternalStreamConnection.java:462)
    at com.mongodb.connection.InternalStreamConnection.sendMessage(InternalStreamConnection.java:205)
    at com.mongodb.connection.UsageTrackingInternalConnection.sendMessage(UsageTrackingInternalConnection.java:95)
    at com.mongodb.connection.DefaultConnectionPool$PooledConnection.sendMessage(DefaultConnectionPool.java:424)
    at com.mongodb.connection.CommandProtocol.sendMessage(CommandProtocol.java:209)
    at com.mongodb.connection.CommandProtocol.execute(CommandProtocol.java:111)
    at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:159)
    at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:286)
    at com.mongodb.connection.DefaultServerConnection.command(DefaultServerConnection.java:173)
    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:215)
    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:206)
    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:112)
    at com.mongodb.operation.CountOperation$1.call(CountOperation.java:210)
    at com.mongodb.operation.CountOperation$1.call(CountOperation.java:206)
    at com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:230)
    at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:203)
    at com.mongodb.operation.CountOperation.execute(CountOperation.java:206)
    at com.mongodb.operation.CountOperation.execute(CountOperation.java:53)
    at com.mongodb.Mongo.execute(Mongo.java:772)
    at com.mongodb.Mongo$2.execute(Mongo.java:759)
    at com.mongodb.DBCollection.getCount(DBCollection.java:962)
    at com.mongodb.DBCursor.count(DBCursor.java:670)
    at com.mongodb.casbah.MongoCollectionBase.getCount(MongoCollection.scala:496)
    at com.mongodb.casbah.MongoCollectionBase.getCount$(MongoCollection.scala:488)
    at com.mongodb.casbah.MongoCollection.getCount(MongoCollection.scala:1106)
    at com.mongodb.casbah.MongoCollectionBase.count(MongoCollection.scala:897)
    at com.mongodb.casbah.MongoCollectionBase.count$(MongoCollection.scala:894)
    at com.mongodb.casbah.MongoCollection.count(MongoCollection.scala:1106)
    [snip]
Caused by: java.net.SocketException: Broken pipe (Write failed)
    at java.net.SocketOutputStream.socketWrite0(Native Method)
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
    at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
    at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
    at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:876)
    at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:847)
    at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
    at com.mongodb.connection.SocketStream.write(SocketStream.java:75)
    at com.mongodb.connection.InternalStreamConnection.sendMessage(InternalStreamConnection.java:201)
    ... 38 common frames omitted

(发布这个答案是因为我希望该解决方案对其他人有用,并且因为我欢迎任何进一步的见解。)

4

2 回答 2

13

在我们添加&maxIdleTimeMS=1500000到连接 URI 以将最大连接空闲时间设置为 25 分钟后,问题就消失了。

原因似乎是 Azure 服务器上的空闲连接超时 30 分钟,而 Mongo 客户端的默认行为根本没有空闲超时。服务器不会向客户端传达它正在丢弃空闲连接的事实,因此下一次使用它的尝试会失败并出现上述错误。将最大连接空闲时间设置为小于 30 分钟的值会使我们的服务器在 Azure 服务器终止空闲连接之前关闭它们。在使用连接之前进行某种保持活动或检查也可能是可能的。

我实际上还没有找到关于 CosmosDB 的这个问题或其他参考的任何文档,尽管它可能是由 Azure 内部负载均衡器的 TCP 连接的 30 分钟空闲超时引起或相关的(参见例如https:/ /feedback.azure.com/forums/217313-networking/suggestions/18823588-increase-idle-timeout-on-internal-load-balancers-t)。

于 2018-01-26T15:30:37.147 回答
0

您可以使用设置时间

var options = new MongoClientOptions.Builder() .socketKeepAlive(true) .heartbeatFrequency(1000) .maxConnectionIdleTime(18000) var clientUri = new MongoClientURI(uri,options)

试试这个

于 2020-01-08T13:54:54.357 回答