它来了:我有一个在 RedHat 服务器上运行的 Rails 应用程序。我使用Passenger Standalone v3.0.19 来服务它。数据库是 MySQL。最近我发现一些Passenger Application实例每分钟消耗越来越多的内存,而这些实例实际上并没有什么用处。(当我请求一个页面时,Passenger 会生成一个新实例而不是使用这个实例)我试图手动杀死这些实例(我知道这不是处理问题的正确方法),但几个小时后它又发生了。
这是我通过乘客记忆统计得到的:
------- Apache processes --------
### Processes: 0
### Total private dirty RSS: 0.00 MB
---------- Nginx processes ----------
PID PPID VMSize Private Name
-------------------------------------
48881 1 27.6 MB 0.1 MB nginx: master process /home/myuser/.passenger/standalone/3.0.19-x86_64-ruby1.9.3-linux-gcc4.4.7-1002/nginx-1.2.6/sbin/nginx -c /tmp/passenger-standalone.48333/config -p /tmp/passenger-standalone.48333/
48882 48881 28.5 MB 0.7 MB nginx: worker process
### Processes: 2
### Total private dirty RSS: 0.78 MB
------- Passenger processes -------
PID VMSize Private Name
-----------------------------------
48841 207.3 MB 0.1 MB PassengerWatchdog
48844 1822.7 MB 1.7 MB PassengerHelperAgent
48846 247.2 MB 14.3 MB Passenger spawn server
48850 143.0 MB 0.5 MB PassengerLoggingAgent
63729 10006.3 MB 9097.2 MB Rack: /home/myuser/webapp/current
76631 999.0 MB 92.3 MB Rack: /home/myuser/webapp/current
这是乘客状态输出:
----------- General information -----------
max = 6
count = 1
active = 0
inactive = 1
Waiting on global queue: 0
----------- Application groups -----------
/home/myuser/webapp/current:
App root: /home/myuser/webapp/current
* PID: 76631 Sessions: 0 Processed: 15 Uptime: 51m 7s
显然,进程 63729 不再由乘客管理。
我还使用了 strace,通过运行:
strace -f -p 63729
我有:
[pid 63729] read(14, "\1\0\0\1\1\36\0\0\2\3def\0\0\0\10COUNT(*)\0\f?\0\25\0\0"..., 16384) = 63
[pid 63729] brk(0x234f2c000) = 0x234f2c000
[pid 63729] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid 63729] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid 63729] poll([{fd=14, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 63729] write(14, "\221\0\0\0\3SELECT COUNT(*) FROM `test_"..., 149) = 149
[pid 63729] ppoll([{fd=14, events=POLLIN}], 1, NULL, NULL, 8) = 1 ([{fd=14, revents=POLLIN}])
[pid 63729] read(14, "\1\0\0\1\1\36\0\0\2\3def\0\0\0\10COUNT(*)\0\f?\0\25\0\0"..., 16384) = 63
[pid 63729] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid 63729] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid 63729] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid 63729] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid 63729] poll([{fd=14, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 63729] write(14, "\221\0\0\0\3SELECT COUNT(*) FROM `test_"..., 149) = 149
[pid 63729] ppoll([{fd=14, events=POLLIN}], 1, NULL, NULL, 8) = 1 ([{fd=14, revents=POLLIN}])
[pid 63729] read(14, "\1\0\0\1\1\36\0\0\2\3def\0\0\0\10COUNT(*)\0\f?\0\25\0\0"..., 16384) = 63
[pid 63729] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid 63729] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid 63729] poll([{fd=14, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 63729] write(14, "\221\0\0\0\3SELECT COUNT(*) FROM `test_"..., 149) = 149
并且相同的模式一直存在。我用代码检查了 fd 14
ls -l /proc/63729/fd
这就是我得到的:
lr-x------ 1 myuser myuser 64 May 9 05:38 0 -> /dev/null
l-wx------ 1 myuser myuser 64 May 9 05:38 1 -> /home/myuser/webapp/shared/log/passenger.log
lrwx------ 1 myuser myuser 64 May 9 05:38 10 -> socket:[430845495]
lrwx------ 1 myuser myuser 64 May 9 05:38 11 -> socket:[430860531]
lrwx------ 1 myuser myuser 64 May 9 05:38 12 -> socket:[430845847]
lrwx------ 1 myuser myuser 64 May 9 05:38 13 -> socket:[430845850]
lrwx------ 1 myuser myuser 64 May 9 05:38 14 -> socket:[430845864]
l-wx------ 1 myuser myuser 64 May 9 05:38 2 -> /home/myuser/webapp/shared/log/passenger.log
l-wx------ 1 myuser myuser 64 May 9 05:38 3 -> /home/myuser/webapp/shared/log/production.log
lr-x------ 1 myuser myuser 64 May 9 05:38 4 -> pipe:[430845856]
lr-x------ 1 myuser myuser 64 May 9 05:38 5 -> pipe:[430845845]
l-wx------ 1 myuser myuser 64 May 9 05:38 6 -> pipe:[430845845]
lr-x------ 1 myuser myuser 64 May 9 05:38 7 -> pipe:[430845846]
l-wx------ 1 myuser myuser 64 May 9 05:38 8 -> pipe:[430845856]
lrwx------ 1 myuser myuser 64 May 9 05:38 9 -> socket:[430845494]
这就是我现在卡住的地方。我不知道如何从socket id获取请求,或者从strace获取SQL命令,但我猜问题可能是SQL或ActiveRecord的滥用。有人可以帮我吗?我不知道是否缺少任何信息,但我会按需提供。任何事情都会受到赞赏。谢谢!