SourceForge.net Logo
Main Overview Wiki Issues Forum Build Fisheye
Issue Details (XML | Word | Printable)

Key: CMP-673
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Shay Banon
Reporter: Stefan Fussenegger
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Compass

Incorrect synchronization in TerracottaDirectory causing NPE in Lucene core

Created: 09/Jul/08 07:16 AM   Updated: 10/Jul/08 02:24 AM
Component/s: Compass::Needle
Affects Version/s: None
Fix Version/s: 2.0.2, 2.1.0 M2

File Attachments: 1. Java Source File TerracottaDirectory.patched.java (8 kB)

Environment: any version of Terracotta, org.compass.needle.terracotta.TerracottaDirectory from trunk


 Description  « Hide
/**
  • Returns an array of strings, one for each file in the directory.
    */
    public final String[] list() { Set<String> fileNames = fileMap.keySet(); return fileNames.toArray(new String[fileNames.size()]); }

above method is not correctly synchronized. If size of fileMap (a ConcurrentHashMap) changes between fileNames.size() and fileName.toArray(...), the returned Array contains null values, causing NPE exceptions later in Lucene core:

from org.apache.lucene.index.SegementInfo.java:

public static long getCurrentSegmentGeneration(String[] files) {
if (files == null) { return -1; }
long max = -1;
for (int i = 0; i < files.length; i++) {
String file = files[i];
if (file.startsWith(IndexFileNames.SEGMENTS) && !file.equals(IndexFileNames.SEGMENTS_GEN)) {
long gen = generationFromSegmentsFileName(file);
if (gen > max) { max = gen; }
}
}
return max;
}

I fixed this bug using a regular HashMap together with a ReentrantReadWriteLock. This has also proven to be faster than a ConcurrentHashMap. Subsegment locking seems to be overhead for the number of files contained in a Lucene directory.



 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Stefan Fussenegger added a comment - 09/Jul/08 07:16 AM
attached patched version

Shay Banon added a comment - 09/Jul/08 08:19 AM
Mmm, I see the problem. I am wondering if having a global lock won't actually cause a bottleneck problem. Why do you think that subsegment locking is an overhead? The size method is a bit of a nasty one in ConcurrentHashMap, so the list method can actually do this:

public final String[] list() { Set<String> fileNames = fileMap.keySet(); return fileNames.toArray(new String[0]); }

This will not cause size to be called, and will traverse the keyset and add it internally (within the concurrent hash map) into an array list, which will then be turned into an array.


Stefan Fussenegger added a comment - 09/Jul/08 08:45 AM
Using toArray(new String[0]) should fix the problem, right.

From my experience, batch indexing 800.000 (rather small) documents with concurrent threads was about 15% (don't have concrete numbers, just from my memory) faster than using a ConcurrentHashMap. I can't provide any evident reason, but probably because ConcurrentHashMap does not use read-write locking. Furthermore, regarding the rather small number of files in a directory (where small is everything < 1000), locking the whole directory for a full iteration does not seem to be a high price to pay for faster (and concurrent) reads. However, I'll try to come up with some concrete numbers, to support that.

btw: I'd really like to see this implementation becoming part of Terracotta Forge, as I am not using Compass itself - sorry about that I haven't done that before, but I'll look into it in the near future (see also http://jira.terracotta.org/jira/browse/CDV-338)

Best regards


Shay Banon added a comment - 09/Jul/08 10:32 AM
I fixed to new String[0].

I think that the benefits of concurrent hash map if the fact that terractotta can handle its fetching and discarding of "files" from the client better (when the whole directory does not fit into the JVM memory).

But, if it turns out to the better in terms of performance, lets open another issue, and we can discuss it there. I will ask the terracotta fellows as well.


Stefan Fussenegger added a comment - 10/Jul/08 02:24 AM
Wow, that was quick!

" I will ask the terracotta fellows as well": shall we start a public discussion in their forums or do you have other channels to use?