1

我有一个存储旧文件的存储库,例如存档。
用户使用一个简单的 Web 应用程序来获取这些文件。
我在运行我的 Web 应用程序的服务器上维护一个简单的文件系统缓存。
至少它看起来很简单,而这只是一个想法:)

我需要以一种方式同步该缓存中的文件创建,即一次只允许一个线程从存档中获取相同的文件。

碰巧需要该文件的所有其他线程必须等到第一个线程完成将其写入缓存,然后从那里获取它。
起初我使用 File.exists() 方法,但这并不好,因为它在线程(锁所有者)创建一个空文件后立即返回 true(因此它可以开始从存储库流写入它)。

我不确定这是否是正确的方法,但我正在使用静态 Map(将 file_ID 映射到 syncDummyObject)来跟踪当前正在获取哪些文件。
然后我(试图)在该 syncDummyObject 上同步文件获取。

这是正确的方法吗?该代码正在运行,但在我将其投入生产之前,我需要确保它是好的。

我考虑使用暂存目录,在其中创建文件并在完成后将它们传输到缓存中,但这会带来另一组问题......

为了更好的可读性,我删除了错误处理的日志记录和不相关的部分。

谢谢!

public class RepoFileFetcher{

    private static volatile ConcurrentHashMap<String, Object> syncStrings = new ConcurrentHashMap<String, Object>();    
    private static final Object mapSync = new Object(); // map access sync
    private Boolean isFileBeingCreated = new Boolean(false);
    private Boolean isFileReadyInCache = new Boolean(false);


    public File getFileById(MikFileIdentifier cxfi){        
        File theFile = null; // file I'm going to return in the end

        try{
            Object syncObject = null;

            // sync map access
            synchronized(mapSync){

                if(syncStrings.containsKey(cxfi.getFilePath())){
                    // if the key exists in the map it means that
                    // it's being created by another thread
                    // fetch the object from the map 
                    // and use it to wait until file is created in cache
                    syncObject = syncStrings.get(cxfi.getFilePath());

                    isFileBeingCreated = true;


                }else if(!(new File(cxfi.getFilePath())).exists()){
                    // if it doesn't exist in map nor in cache it means that
                    // I'm the first one that fetches it from repo
                    // create new dummyLockObject and put it in the map
                    syncObject = new Object();
                    syncStrings.put(cxfi.getFilePath(), syncObject);

                }else{
                    // if it's not being created and exists in cache
                    // set flag so I can fetch if from the cache bellow
                    isFileReadyInCache = true;
                }
            }


            // potential problem that I'm splitting the critical section in half,
            // but I don't know how to avoid locking the whole fetching process
            // I want to lock only on the file that's being fetched, not fetching of all files (which I'd get if the mapSync was still locked)
            // What if, at this very moment, some other thread starts fetching the file and isFileBeingCreated becomes stale? Is it enough to check whether I succeeded renaming it and if not then fetch from cache? 


            if(!isFileBeingCreated && !isFileReadyInCache){

                // skip fetching from repo if another thread is currently fetching it
                // sync only on that file's map object

                synchronized(syncObject){
                    File pFile = new File(cxfi.getFilePath());
                    pFile.createNewFile();

                    // ...
                    // ... the part where I write to pFile from repo stream
                    // ...

                    if(!pFile.renameTo(theFile)){
                        // file is created by someone else 
                        // fetch it from cache
                        theFile = fetchFromCache(cxfi, syncObject);
                    }

                    syncStrings.remove(cxfi.getFilePath());

                    // notify all threads in queue that the file creation is over
                    syncObject.notifyAll();
                }//sync

            }else{

                theFile = fetchFromCache(cxfi, syncObject);
            }

            return theFile;


        }catch(...{
            // removed for better readability
        }finally{
            // remove from the map, otherwise I'll lock that file indefinitely
            syncStrings.remove(cxfi.getFilePath());
        }

        return null;
    }


    /**
     * Fetches the file from cache
     * @param cxfi File identification object
     * @param syncObject Used to obtain lock on file
     * @return File from cache
     * @throws MikFileSynchronizationException
     * @author mbonaci
     */
    private File fetchFromCache(FileIdentifier cxfi, Object syncObject)
            throws MikFileSynchronizationException{

        try{
            // wait till lock owner finishes creating the file
            // then fetch it from the cache

            synchronized(syncObject){   

                // wait until lock owner removes dummyObject from the map
                // while(syncStrings.containsKey(cxfi.getFilePath()))
                // syncObject.wait();                   

                File existingFile = new File(cxfi.getFilePath());
                if(existingFile.exists()){
                    return existingFile;
                }else{
                    // this should never happen
                    throw new MikFileSynchronizationException();
                }
            }

        }catch(InterruptedException ie){
            logger.error("Synchronization error", ie);
        }
        return null;
    }

编辑我: 谢谢大家的帮助。关于在 CHM 上使用 putIfAbsent() 的建议是关键。我最终这样做了(欢迎任何其他评论):

编辑二:在类 的其他分支中添加了 CHM 元素删除if(因为现在我将元素放在地图中,即使我不需要)。

编辑三: 在变量中移动了上面文件存在的检查isFileInCache

public class RepoFileFetcher{               
    private static volatile ConcurrentHashMap<String, Object> syncStrings = new ConcurrentHashMap<String, Object>();    

    // save some time so I can lock syncObject earlier 
    private boolean isFileInCache = false;

    // remember whether we put the elem in the map or not 
    // so we know whether to remove it later 
    private boolean insertedMapElem = false; // added in EDIT II 

    /**
     * Fetches the file from repository (anc caches it) or directly from cache if available
     * @param cxfi File identification object
     * @return File
     * @author mbonaci
     */
    public File getFileById(FileIdentifier cxfi){
        String fileId = cxfi.getFileId();
        String fileName = cxfi.getOnlyPath() + fileId;
        File theFile = null; // file I'm going to return in the end

        try{
            Object syncObject = null;
            Object dummyObject = new Object();                  
            isFileInCache = (new File(fileName)).exists();
            syncObject = syncStrings.putIfAbsent(fileId, dummyObject);

            if(syncObject == null){ // wasn't in the map                            
                insertedMapElem = true; // we put the new object in

                if(!isFileInCache){ // not in cache

                    // if it doesn't exist in map nor in cache it means that
                    // I'm the first one that fetches it from repo (or cache was deleted)
                    // syncObject = new lock object I placed in the map
                    syncObject = dummyObject;

                    synchronized(syncObject){
                        File pFile = new File(cxfi.getFilePath());
                        pFile.createNewFile();

                        // ...
                        // ... the part where I write to pFile from repo stream
                        // ...

                        pFile.renameTo(theFile)
                        theFile = pFile;

                        syncStrings.remove(cxfi.getFilePath());

                        // notify all threads in queue that the file is now ready to be fetched from cache
                        syncObject.notifyAll();
                    }//sync

                }else{
                    // if it's not being created and exists in cache it means that it's complete
                    // fetch it from cache without blocking (only reading)
                    syncStrings.remove(cxfi.getFilePath()); // added in EDIT II
                    theFile = new File(fileName);
                }

            }else{
                // if the key exists in the map it means that
                // it's being created by another thread
                // fetch the object from the map 
                // and use it to wait until file is created in cache
                // don't touch the map (I haven't added anything)
                // the lock owner will remove the elem
                // syncObject = the object that what was in the map when I called putIfAbsent()
                theFile = fetchFromCache(cxfi, syncObject);
            }

            return theFile;

        }catch(...{
            // removed for better readability
        }finally{ 
            // no good cuz' this way I'd sometimes remove the map elem
            // while lock owner still writes to a file
            // only the one who placed the elem in the map should remove it
            // remove from the map, otherwise I'll lock that file indefinitely
            // syncStrings.remove(fileId); // commented out in EDIT II
        }
        // remove in case of exception (but only if we added it)
        if(insertedMapElem)
            syncStrings.remove(fileId);
        return null;
    }


    /**
     * Fetches the file from cache after it obtains lock on <code>syncObject</code>
     * @param cxfi File identification object
     * @param syncObject Used to obtain lock on file
     * @return File from cache
     * @author mbonaci
     */
    private File fetchFromCache(FileIdentifier cxfi, Object syncObject){
        String fileId = cxfi.getFileId();
        String fileName =  fileId + ".tif";

        synchronized(syncObject){

            File existingFile = new File(cxfi.getAbsPath() + fileName);

            if(existingFile.exists()){
                return existingFile;
            }
        }
    }
4

1 回答 1

3

我可以建议几个调整:

  1. 您已经在使用ConcurrentHashMap,不需要额外的锁。
  2. 我会将“文件”包装在一个更智能的对象中,它有自己的同步。因此,您可以执行以下操作:

    • putIfAbsent()使用路径和包装文件的“智能”对象调用地图。
    • 以上将返回值(如果路径不存在,则返回新值,或者现有包装器)
    • 在包装器中有状态,它知道它是否已经被缓存
    • 调用cache()它检查它是否已被缓存,如果是,则不执行任何操作,否则缓存
    • 然后从包装器中返回“文件”(比如一个getFile()方法)

然后确保在公共函数的包装器内使用锁,这意味着在cache()同时发生时会阻塞。

这是一个草图:

class CachedFile
{
  File realFile;
  // Initially not cached
  boolean cached = false;

  // Construct with file

  public synchronized boolean isCached()
  { return cached; }

  public synchronized void cache()
  {
    if (!cached)
    {
      // now load - safe in the knowledge that no one can get the file (or cache())
      ..
      cached = true; // done
    }
  }

  public synchronized <File> getFile()
  {
    // return the "file"
  }
}

现在你的代码变成了这样:

ConcurrentHashMap<String, CachedFile> myCache = ConcurrentHashMap<>();
CachedFile newFile = new CachedFile(<path>);
CachedFile file = myCache.putIfAbsent(<path>, newFile);
// Use the new file if it did not exist
if (file == null) file = newFile;
// This will be no-op if already cached, or will block is someone is caching this file.
file.cache();
// Now return the cached file.
return file.getFile();

我的建议有意义吗?

于 2012-11-07T11:26:53.613 回答