1

I created a script for use with my website that is supposed to erase the oldest entry in cache when a new item needs to be cached. My website is very large with 500,000 photos on it and the cache space is set to 2 GB.

These functions are what cause the trouble:

function cache_tofile($fullf, $c)
{
    error_reporting(0);
    if(strpos($fullf, "/") === FALSE)
    {
        $fullf = "./".$fullf;
    }
    $lp = strrpos($fullf, "/");
    $fp = substr($fullf, $lp + 1);
    $dp = substr($fullf, 0, $lp);
    $sz = strlen($c);
    cache_space_make($sz);
    mkdir($dp, 0755, true);
    cache_space_make($sz);
    if(!file_exists($fullf))
    {
        $h = @fopen($fullf, "w");
        if(flock($h, LOCK_EX))
        {
            ftruncate($h, 0);
            rewind($h);
            $tmo = 1000;
            $cc = 1;
            $i = fputs($h, $c);
            while($i < strlen($c) || $tmo-- > 1)
            {
                $c = substr($c, $i);
                $i = fwrite($h, $c);
            }
            flock($h, LOCK_UN);
            fclose($h);
        }
    }
    error_reporting(7);
}

function cache_space_make($sz)
{
    $ct = 0;
    $cf = cachefolder();
    clearstatcache();
    $fi = shell_exec("df -i ".$cf." | tail -1 | awk -F\" \" '{print \$4}'");
    if($fi < 1)
    {
        return;
    }
    if(($old = disk_free_space($cf)) === false)
    {
        return;
    }
    while($old < $sz)
    {
        $ct++;
        if($ct > 10000)
        {
            error_log("Deleted over 10,000 files. Is disk screwed up?");
            break;
        }
        $fi = shell_exec("rm \$(find ".$cf."cache -type f -printf '%T+ %p\n' | sort | head -1 | awk -F\" \" '{print \$2}');");
        clearstatcache();
        $old = disk_free_space($cf);
    }
}

cachefolder() is a function that returns the correct folder name with a / appended to it.

When the functions are executed, the CPU usage for apache is between 95% and 100% and other services on the server are extremely slow to access during that time. I also noticed in whm that cache disk usage is at 100% and refuses to drop until I clear the cache. I was expecting more like maybe 90ish%.

What I am trying to do with the cache_tofile function is attempt to free disk space in order to create a folder then free disk space to make the cache file. The cache_space_make function takes one parameter representing the amount of disk space to free up.

In that function I use system calls to try to find the oldest file in the directory tree of the entire cache and I was unable to find native php functions to do so.

The cache file format is as follows:

/cacherootfolder/requestedurl

For example, if one requests http://www.example.com/abc/def then from both functions, the folder that is supposed to be created is abc and the file is then def so the entire file in the system will be:

/cacherootfolder/abc/def

If one requests http://www.example.com/111/222 then the folder 111 is created and the file 222 will be created

/cacherootfolder/111/222

Each file in both cases contain the same content as what the user requests based on the url. (example: /cacherootfolder/111/222 contains the same content as what one would see when viewing source from http://www.example.com/111/222)

The intent of the caching system is to deliver all web pages at optimal speed.

My question then is how do I prevent the system from trying to lockup when the cache is full. Is there better code I can use than what I provided?

4

1 回答 1

1

我首先将||代码中的 替换为&&,这很可能是本意。
目前,循环将始终运行至少 1000 次- 我非常希望其意图是在 1000 次后停止尝试

另外,删除ftruncateand rewind
来自PHP 手册fopen(强调我的):

'w' 只为书写而打开;将文件指针放在文件的开头并将文件截断
            为零长度
。如果该文件不存在,请尝试创建它。

所以你truncate是多余的,你的rewind.

接下来,检查你shell_exec的。
循环外的那个对我来说似乎不是太大的瓶颈,但是循环的那个……
假设您在该缓存文件夹中有 1'000'000 个文件。
find无论需要多长时间,都会很乐意为您列出所有这些。
然后你对该列表进行排序。
然后你将该列表中的 999'999 个条目冲入马桶,只保留第一个。
然后你做一些awk我不关心的事情,然后你删除文件。
在下一次迭代中,您只需要浏览 999'999 个文件,其中您只丢弃 999'998
看看我要去哪里?
无论如何,我考虑调用 shell 脚本是出于纯粹的方便不好的做法,但如果你这样做,至少要尽可能高效地执行它!
做一个shell_exec没有head -1,将结果列表存储在一个变量中,并对其进行迭代。
虽然完全放弃shell_exec并用 PHP 编写相应的例程可能会更好(有人可能会说,find并且rm是机器代码,因此比用 PHP 编写的代码更快地完成相同的任务,但肯定会有很多开销那个 IO 重定向)。

请做所有这些,然后看看它仍然表现如何。
如果结果仍然不可接受,我建议您输入一些代码来测量这些功能的某些部分所需的时间(提示:)microtime(true)或使用分析器(如XDebug)来查看您的大部分时间到底花在了哪里。

另外,您为什么关闭该块的错误报告?对我来说看起来更可疑。

作为一点奖励,您可以摆脱它,$cc因为您没有在任何地方使用它。

于 2015-07-29T18:58:23.623 回答