SourceForge.net Logo
Main Overview Wiki Issues Forum Build Fisheye
Issue Details (XML | Word | Printable)

Key: CMP-795
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Shay Banon
Reporter: Ben Boggess
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Compass

IOException EOF when getting term vectors from LuceneHelper with multiple aliases and using non-null first-level cache

Created: 10/Dec/08 02:05 PM   Updated: 08/Jan/09 11:46 AM
Component/s: Compass::Core
Affects Version/s: 2.1.0 M4
Fix Version/s: 2.1.1, 2.2.0 M1

File Attachments: 1. Zip Archive multisub_eof_test.zip (2 kB)



 Description  « Hide
An IOException occurs because of an EOF when trying to get the term vectors from the hits. I am using multiple aliases. The problem appears to be due to the fact that org.compass.core.lucene.engine.DefaultLuceneSearchEngineHits is creating LuceneResources with a document number from a MultiSearcher (in getResource(int)). The problem with this is that when LuceneHelper goes to retrieve the term vector, the document number can be (and is) greater than the total number of documents so the index is accessed at an offset past the end of the file.

For our purposes, we have temporarily modified DefaultLuceneSearchEngineHits.getResource(int) to get the sub-document number for the resource:
...
int docId = hits.id;
Searcher searcher = internalSearch.getSearcher();
if (searcher instanceof MultiSearcher) { docId = ((MultiSearcher) searcher).subDoc(docId); }
...

I'm not sure what other ramifications this may have or how 'correct' the fix is but it appears to solve the issue for us.

In attempting to create a test case in the Compass project's test framework, I discovered that this issue seems to be related to caching. I initially added a test case to org.compass.core.test.termfreqvector.simple1.TermFreqVectorMultiSubIndexTests but was unable to duplicate the issue. It turns out that I was unable to duplicate the issue because a NullFirstLevelCache was being used.

To duplicate the issue as a Compass test I:
1. overrode AbstractTestCase.buildConf() because it programmatically sets the first-level cache to NullFirstLevelCache
2. modifed org.compass.core.test.compass.cfg.xml to set the first-level cache to DefaultFirstLevelCache instead of NullFirstLevelCache.

Attached is my modified TermFreqVectorMultiSubIndexTests.java with the added test case testTermFreqVectorsMultiSubIndex() method, and the compass.cfg.xml that I used.



 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Shay Banon added a comment - 12/Dec/08 02:38 PM
This was actually fixed in 2.1, can you give it a go? What I changed there is that I load the resource with no cache before getting the term frequencies. The main reason for the reload is that you can't really know which one loaded the resource (what reader, and what multi searcher, and which sub indexes this multi searcher was initialized with).

Ben Boggess added a comment - 17/Dec/08 03:27 PM
That definitely makes sense and sounds like it should fix this issue but it does appear to do so. I swapped out my M4 jar with the 2.1.0 and updated the lucene-core.jar I was using to match 2.1.0, and I get the same result (EOF).
Stack trace:
Exception in thread "main" org.compass.core.engine.SearchEngineException: Failed to fetch term info for resource [{a2} stored/uncompressed,indexed,omitNorms<$/uid:a2#1#>,[stored/uncompressed,indexed,omitNorms<alias:a2>],[stored/uncompressed,indexed<id:1>],[stored/uncompressed,indexed,tokenized,termVector<value1:test1>],[stored/uncompressed,indexed,tokenized,termVector<value2:test2>]]; nested exception is java.io.IOException: read past EOF
java.io.IOException: read past EOF
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:151)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:68)
at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:91)
at org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:345)
at org.apache.lucene.index.SegmentReader.getTermFreqVectors(SegmentReader.java:1059)
at org.apache.lucene.index.MultiReader.getTermFreqVectors(MultiReader.java:169)
at org.compass.core.lucene.util.LuceneHelper.getTermFreqVectors(LuceneHelper.java:192)
...

Shay Banon added a comment - 18/Dec/08 10:26 AM
Sorry, I missed the getTermFreqVectors and only did it for getTermFreqVector in 2.1. I already fixed it a few weeks ago for both 2.1.1 and 2.2.0 M1. I will release 2.1.1 next week. You can try the 2.1 nightly build to test that it solves the problem.

Ben Boggess added a comment - 07/Jan/09 10:43 AM
Sorry for the late reply but I was on holiday. I downloaded a 2.1.1 build and it does indeed resolve the issue. Do you know when this release will be available?

Thanks!


Shay Banon added a comment - 08/Jan/09 11:46 AM
I will release it either this this weeked.