0

它来了:我有一个在 RedHat 服务器上运行的 Rails 应用程序。我使用Passenger Standalone v3.0.19 来服务它。数据库是 MySQL。最近我发现一些Passenger Application实例每分钟消耗越来越多的内存,而这些实例实际上并没有什么用处。(当我请求一个页面时,Passenger 会生成一个新实例而不是使用这个实例)我试图手动杀死这些实例(我知道这不是处理问题的正确方法),但几个小时后它又发生了。

这是我通过乘客记忆统计得到的:

------- Apache processes --------

### Processes: 0
### Total private dirty RSS: 0.00 MB


---------- Nginx processes ----------
PID    PPID   VMSize   Private  Name
-------------------------------------
48881  1      27.6 MB  0.1 MB   nginx: master process    /home/myuser/.passenger/standalone/3.0.19-x86_64-ruby1.9.3-linux-gcc4.4.7-1002/nginx-1.2.6/sbin/nginx -c /tmp/passenger-standalone.48333/config -p /tmp/passenger-standalone.48333/
48882  48881  28.5 MB  0.7 MB   nginx: worker process
### Processes: 2
### Total private dirty RSS: 0.78 MB

------- Passenger processes -------
PID    VMSize      Private    Name
-----------------------------------
48841  207.3 MB    0.1 MB     PassengerWatchdog
48844  1822.7 MB   1.7 MB     PassengerHelperAgent
48846  247.2 MB    14.3 MB    Passenger spawn server
48850  143.0 MB    0.5 MB     PassengerLoggingAgent
63729  10006.3 MB  9097.2 MB  Rack: /home/myuser/webapp/current
76631  999.0 MB    92.3 MB    Rack: /home/myuser/webapp/current

这是乘客状态输出:

----------- General information -----------
max      = 6
count    = 1
active   = 0
inactive = 1
Waiting on global queue: 0

----------- Application groups -----------
/home/myuser/webapp/current:
  App root: /home/myuser/webapp/current
  * PID: 76631   Sessions: 0    Processed: 15      Uptime: 51m 7s

显然,进程 63729 不再由乘客管理。

我还使用了 strace,通过运行:

strace -f -p 63729

我有:

[pid 63729] read(14, "\1\0\0\1\1\36\0\0\2\3def\0\0\0\10COUNT(*)\0\f?\0\25\0\0"..., 16384) = 63
[pid 63729] brk(0x234f2c000)            = 0x234f2c000
[pid 63729] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid 63729] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid 63729] poll([{fd=14, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 63729] write(14, "\221\0\0\0\3SELECT COUNT(*) FROM `test_"..., 149) = 149
[pid 63729] ppoll([{fd=14, events=POLLIN}], 1, NULL, NULL, 8) = 1 ([{fd=14, revents=POLLIN}])
[pid 63729] read(14, "\1\0\0\1\1\36\0\0\2\3def\0\0\0\10COUNT(*)\0\f?\0\25\0\0"..., 16384) = 63
[pid 63729] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid 63729] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid 63729] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid 63729] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid 63729] poll([{fd=14, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 63729] write(14, "\221\0\0\0\3SELECT COUNT(*) FROM `test_"..., 149) = 149
[pid 63729] ppoll([{fd=14, events=POLLIN}], 1, NULL, NULL, 8) = 1 ([{fd=14, revents=POLLIN}])
[pid 63729] read(14, "\1\0\0\1\1\36\0\0\2\3def\0\0\0\10COUNT(*)\0\f?\0\25\0\0"..., 16384) = 63
[pid 63729] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid 63729] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
[pid 63729] poll([{fd=14, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 63729] write(14, "\221\0\0\0\3SELECT COUNT(*) FROM `test_"..., 149) = 149

并且相同的模式一直存在。我用代码检查了 fd 14

ls -l /proc/63729/fd

这就是我得到的:

lr-x------ 1 myuser myuser 64 May  9 05:38 0 -> /dev/null
l-wx------ 1 myuser myuser 64 May  9 05:38 1 -> /home/myuser/webapp/shared/log/passenger.log
lrwx------ 1 myuser myuser 64 May  9 05:38 10 -> socket:[430845495]
lrwx------ 1 myuser myuser 64 May  9 05:38 11 -> socket:[430860531]
lrwx------ 1 myuser myuser 64 May  9 05:38 12 -> socket:[430845847]
lrwx------ 1 myuser myuser 64 May  9 05:38 13 -> socket:[430845850]
lrwx------ 1 myuser myuser 64 May  9 05:38 14 -> socket:[430845864]
l-wx------ 1 myuser myuser 64 May  9 05:38 2 -> /home/myuser/webapp/shared/log/passenger.log
l-wx------ 1 myuser myuser 64 May  9 05:38 3 ->     /home/myuser/webapp/shared/log/production.log
lr-x------ 1 myuser myuser 64 May  9 05:38 4 -> pipe:[430845856]
lr-x------ 1 myuser myuser 64 May  9 05:38 5 -> pipe:[430845845]
l-wx------ 1 myuser myuser 64 May  9 05:38 6 -> pipe:[430845845]
lr-x------ 1 myuser myuser 64 May  9 05:38 7 -> pipe:[430845846]
l-wx------ 1 myuser myuser 64 May  9 05:38 8 -> pipe:[430845856]
lrwx------ 1 myuser myuser 64 May  9 05:38 9 -> socket:[430845494]

这就是我现在卡住的地方。我不知道如何从socket id获取请求,或者从strace获取SQL命令,但我猜问题可能是SQL或ActiveRecord的滥用。有人可以帮我吗?我不知道是否缺少任何信息,但我会按需提供。任何事情都会受到赞赏。谢谢!

4

1 回答 1

1

我已经通过检查 mysql 日志解决了这个问题。slow_query_log 没有任何问题。但是general_log 显示一系列查询一直在寻找较旧的记录。这是由于我为活动记录编写的愚蠢代码。我想我现在要结束这个问题了。

于 2013-05-10T06:40:30.893 回答