3

I have a ConcurrentHashMap that exhibits strange behavior on occasion.

When my app first starts up, I read a directory from the file system and load contents of each file into the ConcurrentHashMap using the filename as the key. Some files may be empty, in which case I set the value to "empty".

Once all files have been loaded, a pool of worker threads will wait for external requests. When a request comes in, I call the getData() function where I check if the ConcurrentHashMap contains the key. If the key exists I get the value and check if the value is "empty". If value.contains("empty"), I return "file not found". Otherwise, the contents of the file is returned. When the key does not exist, I try to load the file from the file system.

private String getData(String name) {
    String reply = null;
    if (map.containsKey(name)) {
        reply = map.get(name);
    } else {
        reply = getDataFromFileSystem(name);
    }

    if (reply != null && !reply.contains("empty")) {
        return reply;
    }

    return "file not found";
}

On occasion, the ConcurrentHashMap will return the contents of a non-empty file (i.e. value.contains("empty") == false), however the line:

if (reply != null && !reply.contains("empty")) 

returns FALSE. I broke down the IF statement into two parts: if (reply != null) and if (!reply.contains("empty")). The first part of the IF statement returns TRUE. The second part returns FALSE. So I decided to print out the variable "reply" in order to determine if the contents of the string does in fact contain "empty". This was NOT the case i.e. the contents did not contain the string "empty". Furthermore, I added the line

int indexOf = reply.indexOf("empty");

Since the variable reply did not contain the string "empty" when I printed it out, I was expecting indexOf to return -1. But the function returned a value approx the length of the string i.e. if reply.length == 15100, then reply.indexOf("empty") was returning 15099.

I experience this issue on a weekly basis, approx 2-3 times a week. This process is restarted on a daily basis therefore the ConcurrentHashMap is re-generated regularly.

Has anyone seen such behavior when using Java's ConcurrentHashMap?

EDIT

private String getDataFromFileSystem(String name) {
    String contents = "empty";
    try {
        File folder = new File(dir);

        File[] fileList = folder.listFiles();
        for (int i = 0; i < fileList.length; i++) {
            if (fileList[i].isFile() && fileList[i].getName().contains(name)) {
                String fileName = fileList[i].getAbsolutePath();

                FileReader fr = null;
                BufferedReader br = null;

                try {
                    fr = new FileReader(fileName);
                    br = new BufferedReader(fr);
                    String sCurrentLine;
                    while ((sCurrentLine = br.readLine()) != null) {
                        contents += sCurrentLine.trim();
                    }
                    if (contents.equals("")) {
                        contents = "empty";
                    }

                    return contents;
                } catch (Exception e) {
                    e.printStackTrace();

                    if (contents.equals("")) {
                        contents = "empty";
                    }
                    return contents;
                } finally {
                    if (fr != null) {
                        try {
                            fr.close();
                        } catch (Exception e) {
                            e.printStackTrace();
                        }
                    }

                    if (br != null) {
                        try {
                            br.close();
                        } catch (Exception e) {
                            e.printStackTrace();
                        }
                    }

                    if (map.containsKey(name)) {
                        map.remove(name);
                    }

                    map.put(name, contents);
                }
            }
        }
    } catch (Exception e) {
        e.printStackTrace();

        if (contents.equals("")) {
            contents = "empty";
        }
        return contents;
    }
    return contents;
}
4

3 回答 3

4

I think your problem is that some of your operations should be atomic and they aren't.

For example, one possible thread interleaving scenario is the following:

  • Thread 1 reads this line in the getData method:

    if (map.containsKey(name)) // (1)
    
  • the result is false and Thread 1 goes to

    reply = getDataFromFileSystem(name); // (2)
    
  • in getDataFromFileSystem, you have the following code:

    if (map.containsKey(name)) { // (3)
        map.remove(name);  // (4)
    }
    map.put(name, contents); // (5)
    
  • imagine that another thread (Thread 2) arrives at (1) while Thread 1 is between (4) and (5): name is not in the map, so thread 2 goes to (2) again

Now that does not explain the specific issue you are observing but it illustrates the fact that when you let many threads run concurrently in a section of code without synchronization, weird things can and do happen.

As it stands, I can't find an explanation for the scenario you describe, unless you call reply = map.get(name) more than once in your tests, in which case it is very possible that the 2 calls don't return the same result.

于 2012-07-10T09:53:27.367 回答
2

First off, don't even think that there is a bug in ConcurrentHashMap. JDK faults are very rare and even entertaining the idea will pull you away from properly debugging your code.

I think your bug is as follows. Since you are using contains("empty") what happens if the line from the file has the word "empty" in it? Isn't that going to screw things up?

Instead of using contains("empty") I would use ==. Make the "empty" a private static final String then you can use equality on it.

private final static String EMPTY_STRING_REFERENCE = "empty";
...
if (reply != null && reply != EMPTY_STRING_REFERENCE) {
    return reply;
}
...
String contents = EMPTY_STRING_REFERENCE;
...
// really this should be if (contents.isEmpty())
if (contents.equals("")) {
    contents = EMPTY_STRING_REFERENCE;
}

This is, btw, the only time you should be using == to compare strings. In this case you want to test it by reference and not by contents since lines from your files could actually contain the magic string.

Here are some other points:

  • In general, whenever you are using the same String in multiple places in your program, it should be pulled up to a static final field. Java will probably do this for you anyway but it makes the code a lot cleaner as well.
  • @assylias is spot on about race conditions when you make 2 calls to ConcurrentHashMap. For example, instead of doing:

    if (map.containsKey(name)) {
        reply = map.get(name);
    } else {
    

    You should do the following so you do only one.

    reply = map.get(name);
    if (reply == null) {
    
  • In your code you do this:

    if (map.containsKey(name)) {
         map.remove(name);
    }
    map.put(name, contents);
    

    That should be rewritten as the following. There is no need to remove before the put which introduces race conditions as @assylias mentioned.

    map.put(name, contents);
    
  • You said:

    if reply.length == 15100, then reply.indexOf("empty") was returning 15099.

    This is not possible with the same reply String. I suspect that you were looking at different threads or in some other way misinterpreting the output. Again, don't be fooled into thinking that there are bugs in java.lang.String.

于 2012-07-10T14:16:26.723 回答
0

First, using ConcurrentHashMap does not protect you if you call its methods from multiple threads in sequence. If you call containsKey and get afterwards and another thread calls remove in between you will have a null result. Be sure to call only get and check for null instead of containsKey/get. It's also better regarding performance, because both methods nearly have the same cost.

Second, the weird indexOf call result is either due to a programming error, or points to memory corruption. Is there any native code involved in your application? What are you doing in getDataFromFileSystem? I observed memory corruption when using FileChannel objects from multiple threads.

于 2012-07-09T19:15:13.930 回答