SourceForge.net Logo
Main Overview Wiki Issues Forum Build Fisheye
Issue Details (XML | Word | Printable)

Key: CMP-346
Type: New Feature New Feature
Status: Open Open
Priority: Major Major
Assignee: Shay Banon
Reporter: Uri
Votes: 3
Watchers: 5
Operations

If you were logged in you would be able to see more operations.
Compass

Add HitCollector support

Created: 02/Jan/07 07:46 PM   Updated: 11/Mar/10 10:00 AM
Component/s: Compass::Core
Affects Version/s: 1.1 M3
Fix Version/s: None

File Attachments: 1. Text File hitcollector-2.patch.txt (2 kB)
2. Text File hitcollector.patch.txt (5 kB)



 Description  « Hide
It would be very nice to have support hit collection support. Either lucene's HitCollector support (maybe via the LuceneHelper class), have a CompassHitCollector as an abstraction above it.

 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Shay Banon added a comment - 05/Jan/07 03:01 PM
Currently, you can get an IndexSearcher using LuceneHelper, which you can then execute HitCollector logic. A nicer abstraction on top of it would be nice, but I don't think that it will happen for this release. The reason is that Lucene BitSet is not the best solution out there, and I would like in the next release to improve that. Solr has done really nice work in that area, and I would like to incorporate some of it into Compass.

Stefan Fussenegger added a comment - 03/Jun/09 03:38 AM
As mentioned on the forums (http://forum.compass-project.org/thread.jspa?threadID=216146), I'd volunteer in bringing HitCollector support to Compass. However, first and most importantly I need green lights from Shay to get this into an upcoming release).

Furthermore, I'd love to here your requirements and what you've implemented using HitCollectors. Some code would be cool too. Maybe it's also an option to get this stuff out of the box? At a later step, I'd really love to get faceted search support from Compass. HitCollector support is a first step into this direction.

What do others think?


Shay Banon added a comment - 08/Jun/09 12:07 PM
It would be great to get HitCollectors support into Compass, maybe as a first stage in a more internal manner which is Lucene specific, and then, at a later stage, more aligned with how Compass does things.

Facet search is great, but I want to try and build some support for that first (similar to what Solr did with Doc Sets).


Stefan Fussenegger added a comment - 09/Jun/09 02:14 AM
void collect(HitCollector hitCollector);
added to CompassQuery. It's the raw interface from Lucene as you've suggested.

Shay Banon added a comment - 13/Jun/09 10:59 AM
I would suggest against adding this to CompassQuery. What I was hoping is for something that uses the concept of LuceneHelper. Something like: LuceneHelper#collect(CompassQuery query, HitCollector hitCollector). What do you think?

Stefan Fussenegger added a comment - 15/Jun/09 02:59 AM
Well, my preference still is on adding it to CompassQuery. Mainly because - from a not-that-heavy user perspective - it's easier to discover there as most newbies will look at the methods provided by CompassQuery and won't even know about LuceneHelper. I think your main concern is adding Lucene-only classes to CompassQuery, right? But it's an interface with a single method only. Shouldn't be too difficult to get it done the Compass-way.

What I am dreaming of is a nice collection of ready-to-use HitCollectors as part of Compass in order to get some common problems solved easily out of the box. I'm talking about stuff like this:

hitCollector = new UniquePropertyValueCollector("lastname");
compassQuery.collect(hitCollector);
return hitCollector.getValues();

As I've seen, you've already applied my patch, that's great! However, if you'd like to see that code as part of LuceneHelper or a more Compass-like implementation, I'd be happy to contribute this too.


Shay Banon added a comment - 17/Jun/09 12:15 PM
Hi,

I am sorry, I applied the patch by mistake, I reverted it back (mostly). I agree regarding the built in collections, count is an example of one, and we can expose more as methods on the query itself (I am a big fan of rich domain model).

I have a problem with exposing HitCollectors in such a manner, since currently I am thinking of re-architecting the way search is performed (which will pave the way for simple facet based support). Once this is done, then exposing a more Compass level API will be simpler.

The problem with hit collectors, as they are, is that people might create ones that will kill the index, for example, loading the document in each collect operation instead of going to the field cache or doing other tricks. This is why there needs to be a much better abstraction for that, which can be built once the proper foundation is in place.

Cheers,
Shay


Stefan Fussenegger added a comment - 17/Jun/09 12:21 PM
Okay, so I'll move the HitCollector code into LuceneUtil - and look into my own (admittedly quick and dirty) HitCollector implementation, as I am currently "killing the index"

Shay Banon added a comment - 17/Jun/09 12:39 PM
Ouch .

Stefan Fussenegger added a comment - 25/Jun/09 06:30 AM
Okay, I just moved the code to LuceneHelper ... well, almost. I'm just struggling with a this problem: I went for your suggestion and used the method signature you suggested (LuceneHelper#collect(CompassQuery query, HitCollector hitCollector)). However, I can't figure out how to get from CompassQuery to LuceneSearchEngineInternalSearch (in order to get a Searcher).

Shay Banon added a comment - 28/Jun/09 09:29 AM
I have just added a getSession method to InternalCompassQuery, so you can get the session from the query, and then use it to get the internal search object. I think that the collect method should also return the internal search object so it can be used later on with what the collector found.

Stefan Fussenegger added a comment - 29/Jun/09 04:43 AM
added to LuceneHelper:

public static LuceneSearchEngineInternalSearch collect(CompassQuery query, HitCollector hitCollector)


Shay Banon added a comment - 29/Jun/09 09:58 AM
Committed the patch, Thanks!.

The only thing left now, just to complete this I think, is to be able to get a Resource and an actual unmarshalled object from a Document in a LuceneHelper.


Stefan Fussenegger added a comment - 29/Jun/09 10:04 AM
If you tell me where to start, I could help with this too. I suspect though, that this needs some clever refactoring to avoid duplication of existing code - and I doubt that I'm able to do this with the big picture in mind.

Lyle Hanson added a comment - 16/Nov/09 05:55 PM
I'm trying to use LuceneSearchEngineInternalSearch.getSearcher().search(Query, HitCollector) to be able to specify the number of hits to return (TopDocCollector) and a timeout value (TimeLimitedCollector). However, I'm at loss as to how I can deal with the resulting Document objects (obtained through searcher.doc). I assume this is what's been discussed as far as getting an unmarshalled object returned (with CompassHits I could access my domain objects).

Until this gets implemented, am I out of luck with this approach? Or is there something else I can try to limit the number of hits and elapsed time from a higher level?


Lyle Hanson added a comment - 13/Jan/10 05:11 PM
To follow up for others who might be looking for a similar solution, I ended up searching with a TimeLimitedHitCollector (though that requires another HitCollector passed in; I kludged it by passing a TopFieldDocCollector - I need to specify a sort order but don't need the arbitrary numHits restriction. Is there an unbounded version?). From the resulting TopDocs, I get my unmarshalled domain object by doing something like this:

LuceneResource resource = new LuceneResource(searcher.doc(docId), docId, searchEngineFactory);
(DomainObject) internalCompassSession.getByResource(resource);