I'm seeing different epoll
and select
behavior in two different binaries and was hoping for some debugging help. In the following, epoll_wait
and select
will be used interchangeably.
I have two processes, one writer and one reader, that communicate over a fifo. The reader performs an epoll_wait
to be notified of writes. I would also like to know when the writer closes the fifo, and it appears that epoll_wait
should notify me of this as well. The following toy program, which behaves as expected, illustrates what I'm trying to accomplish:
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/epoll.h>
#include <sys/stat.h>
#include <unistd.h>
int
main(int argc, char** argv)
{
const char* filename = "tempfile";
char buf[1024];
memset(buf, 0, sizeof(buf));
struct stat statbuf;
if (!stat(filename, &statbuf))
unlink(filename);
mkfifo(filename, S_IRUSR | S_IWUSR);
pid_t pid = fork();
if (!pid) {
int fd = open(filename, O_WRONLY);
printf("Opened %d for writing\n", fd);
sleep(3);
close(fd);
} else {
int fd = open(filename, O_RDONLY);
printf("Opened %d for reading\n", fd);
static const int MAX_LENGTH = 1;
struct epoll_event init;
struct epoll_event evs[MAX_LENGTH];
int efd = epoll_create(MAX_LENGTH);
int i;
for (i = 0; i < MAX_LENGTH; ++i) {
init.data.u64 = 0;
init.data.fd = fd;
init.events |= EPOLLIN | EPOLLPRI | EPOLLHUP;
epoll_ctl(efd, EPOLL_CTL_ADD, fd, &init);
}
while (1) {
int nfds = epoll_wait(efd, evs, MAX_LENGTH, -1);
printf("%d fds ready\n", nfds);
int nread = read(fd, buf, sizeof(buf));
if (nread < 0) {
perror("read");
exit(1);
} else if (!nread) {
printf("Child %d closed the pipe\n", pid);
break;
}
printf("Reading: %s\n", buf);
}
}
return 0;
}
However, when I do this with another reader (whose code I'm not privileged to post, but which makes the exact same calls--the toy program is modeled on it), the process does not wake when the writer closes the fifo. The toy reader also gives the desired semantics with select
. The real reader configured to use select
also fails.
What might account for the different behavior of the two? For any provided hypotheses, how can I verify them? I'm running Linux 2.6.38.8.