-1

我必须优化硬脚本。当前运行时间约 5 小时。

脚本通过 md5 哈希查找表文件中的重复项使用表文件夹。

DB:MySQL,服务器:本地。
电脑:华擎 Z77 pro4,英特尔酷睿 i7 3770,12Gb 内存。

代码:

// find file-duplicates by md5-hash
$current_folder_id = 1;
$select_ids_files = array();

$folders = $this->db->query("
    SELECT `folder_id`
    FROM `sc_folders`
")->result();

if (!$folders)
    exit('folders not found');

$current_files_data = $this->get_files_by_folder_id($current_folder_id);
if (!$current_files_data) {
    exit('!current_files_data');
}

foreach ($folders as $folder) {
    $files = (object)array();
    $files = $this->get_files_by_folder_id($folder->folder_id);

    if (!$files)
        continue;

    if (count($files) > count($current_files_data)) {
        $gl_arr = &$files;
        $nogl_arr = &$current_files_data;
    } else {
        $gl_arr = &$current_files_data;
        $nogl_arr = &$files;
    }

    foreach ($gl_arr as $key => $value) {
        foreach ($nogl_arr as $k => &$v) {
            if ($value->file_hash == $v->file_hash && $value->file_id != $v->file_id) { // an important place for optimize
                $select_ids_files[] = $v->file_id;
            }
        }
    }
}

print_r($select_ids_files);exit; // id duplicates records

表文件夹:folder_id、folder_name。(约 45 条记录)
表文件:file_id、file_hash、file_folder_id、file_name。(约 1,400,000 条记录)

4

2 回答 2

1

First, it might be very helpful to state, what you are actually try to achieve.

From what I can read from the sourcecode:

  • You have a datatable, containing links to the files and their hash.
  • You want to (periodically) check, if a file has been insterted, changed or removed?

First Question that raises: HOW are files inserted, removed or edited? Are Users simple accessing the folder directly, or does it happen through any kind of application?

IF it happens through an application, you should update THAT point, and flag any outdated Entry in the Database. Something like UPDATE files SET 'requires_approval'=1 WHERE filename LIKE '{$current_changed_file}'

If that is NOT the case (Users are editing, deleting, inserting files on the File-System Level) You could optimize your check by doing the following:

  • Save the timestamp (i.E. the newest modification date of ANY file) inside your database.
  • When checking for changes, ONLY take files with a later modification date into account.

Something like

foreach ($files as $file){
   if (filemtime($file) > $my_stored_modification_time){
      //refresh the data-row
   }
}

(to recognize a deletion, you could iterate over all file-entrys (database) and use is_file - for deletions you don't need to care about file-hashes, because you cant even generate them)

于 2013-08-16T17:29:20.083 回答
0

不使用 foreach {foreach {} }。使用 foreach { in_array() }。

-50% 的时间。

于 2013-08-21T13:29:36.683 回答