c++ - 为什么当客户端忙于接收数据时 select() 有时会超时

Question

我编写了简单的 C/S 应用程序来测试非阻塞套接字的特性，以下是有关服务器和客户端的一些简要信息：

//On linux The server thread will send 
//a file to the client using non-blocking socket       
void *SendFileThread(void *param){
    CFile* theFile = (CFile*) param;
    int sockfd = theFile->GetSocket();
    set_non_blocking(sockfd);
    set_sock_sndbuf(sockfd, 1024 * 64); //set the send buffer to 64K

    //get the total packets count of target file
    int PacketCOunt = theFile->GetFilePacketsCount();
    int CurrPacket = 0;
    while (CurrPacket < PacketCount){
        char buffer[512];
        int len = 0;

        //get packet data by packet no.
        GetPacketData(currPacket, buffer, len); 

        //send_non_blocking_sock_data will loop and send
        //data into buffer of sockfd until there is error
        int ret = send_non_blocking_sock_data(sockfd, buffer, len);
        if (ret < 0 && errno == EAGAIN){
            continue；
          } else if (ret < 0 || ret == 0 ){
             break;
         } else {
             currPacket++;
         }


         ......
     }
 }

//On windows, the client thread will do something like below
//to receive the file data sent by the server via block socket
void *RecvFileThread(void *param){
    int sockfd = (int) param; //blocking socket
    set_sock_rcvbuf(sockfd, 1024 * 256); //set the send buffer to 256

    while (1){
        struct timeval timeout;
        timeout.tv_sec = 1;
        timeout.tv_usec = 0;

        fd_set rds;
        FD_ZERO(&rds);
        FD_SET(sockfd, &rds)'

        //actually, the first parameter of select() is 
        //ignored on windows, though on linux this parameter
        //should be (maximum socket value + 1)
        int ret = select(sockfd + 1, &rds, NULL, NULL, &timeout );
        if (ret == 0){
            // log that timer expires
            CLogger::log("RecvFileThread---Calling select() timeouts\n");
        } else if (ret) { 
            //log the number of data it received
            int ret = 0;
            char buffer[1024 * 256];
            int len = recv(sockfd, buffer, sizeof(buffer), 0);
            // handle error
            process_tcp_data(buffer, len);
        } else {
            //handle and break;
            break;
        }

    }
}

令我惊讶的是，由于套接字缓冲区已满，服务器线程经常失败，例如发送一个 14M 大小的文件，它报告 errno = EAGAIN 的 50000 次失败。但是，通过日志记录我观察到传输过程中有数十次超时，流程如下：

第N次循环，select()成功，成功读取256K的数据。
在第 (N+1) 次循环中，select() 因超时而失败。
在第 (N+2) 次循环中，select() 成功并成功读取了 256K 的数据。

为什么在接收期间会有超时交错？谁能解释这种现象？

[更新]
1.上传一个14M的文件到服务器只需要8秒
2.使用与1)相同的文件，服务器将所有数据发送到客户端大约需要30秒。
3. 客户端使用的所有套接字都是阻塞的。服务器使用的所有套接字都是非阻塞的。

关于＃2，我认为超时是＃2比＃1花费更多时间的原因，我想知道为什么当客户端忙于接收数据时会有这么多超时。

[更新2]
感谢@Duck、@ebrobe、@EJP、@ja_mesa 的评论，我今天会做更多的调查，然后更新这篇文章。
关于为什么我在服务器线程中每个循环发送 512 个字节，这是因为我发现服务器线程发送数据的速度比客户端线程接收它们的速度快得多。我很困惑为什么客户端线程会发生超时。

score 2 · Accepted Answer

认为这更像是一个长评论而不是答案，但正如一些人所指出的那样，网络比您的处理器慢几个数量级。非阻塞 i/o 的关键在于差异如此之大，以至于您实际上可以使用它来完成实际工作而不是阻塞。在这里，您只是在按电梯按钮，希望有所作为。

我不确定你的代码有多少是真实的，有多少是为了发布而被砍掉的，但是在服务器中你没有考虑（ret == 0），即对等方的正常关闭。

在select客户端是错误的。同样，不确定这是否是草率的编辑，但如果不是，那么参数的数量是错误的，但更令人担忧的是，第一个参数 - 即应该是 select 要查看的最高文件描述符加一 - 为零。根据select我想知道这是否实际上只是变成select了一个花哨的sleep声明的实现。

score 0 · Accepted Answer

你应该先打电话recv()，然后select()只有在recv()告诉你这样做的情况下才打电话。不要select()先调用，那是浪费处理。 recv()知道数据是立即可用还是必须等待数据到达：

void *RecvFileThread(void *param){
    int sockfd = (int) param; //blocking socket
    set_sock_rcvbuf(sockfd, 1024 * 256); //set the send buffer to 256

    char buffer[1024 * 256];

    while (1){

        int ret = 0;
        int len = recv(sockfd, buffer, sizeof(buffer), 0);
        if (len == -1) {
            if (WSAGetLastError() != WSAEWOULDBLOCK) {
                //handle error
                break;
            }

            struct timeval timeout;
            timeout.tv_sec = 1;
            timeout.tv_usec = 0;

            fd_set rds;
            FD_ZERO(&rds);
            FD_SET(sockfd, &rds)'

            //actually, the first parameter of select() is 
            //ignored on windows, though on linux this parameter
            //should be (maximum socket value + 1)
            int ret = select(sockfd + 1, &rds, NULL, &timeout );
            if (ret == -1) { 
                // handle error
                break;
            }

            if (ret == 0) {
                // log that timer expires
                break;
            }

            // socket is readable so try read again
            continue;
        }

        if (len == 0) {
            // handle graceful disconnect
            break;
        }

        //log the number of data it received
        process_tcp_data(buffer, len);
    }
}

在发送端也做类似的事情。先调用send()，然后调用select()等待可写性，前提是send()告诉你这样做。

c++ - 为什么当客户端忙于接收数据时 select() 有时会超时

2 回答 2

Related

Reference