|
[
Permlink
| « Hide
]
Shay Banon added a comment - 05/Jan/07 03:01 PM
Currently, you can get an IndexSearcher using LuceneHelper, which you can then execute HitCollector logic. A nicer abstraction on top of it would be nice, but I don't think that it will happen for this release. The reason is that Lucene BitSet is not the best solution out there, and I would like in the next release to improve that. Solr has done really nice work in that area, and I would like to incorporate some of it into Compass.
As mentioned on the forums (http://forum.compass-project.org/thread.jspa?threadID=216146
Furthermore, I'd love to here your requirements and what you've implemented using HitCollectors. Some code would be cool too. Maybe it's also an option to get this stuff out of the box? At a later step, I'd really love to get faceted search support from Compass. HitCollector support is a first step into this direction. What do others think? It would be great to get HitCollectors support into Compass, maybe as a first stage in a more internal manner which is Lucene specific, and then, at a later stage, more aligned with how Compass does things.
Facet search is great, but I want to try and build some support for that first (similar to what Solr did with Doc Sets). void collect(HitCollector hitCollector);
added to CompassQuery. It's the raw interface from Lucene as you've suggested. I would suggest against adding this to CompassQuery. What I was hoping is for something that uses the concept of LuceneHelper. Something like: LuceneHelper#collect(CompassQuery query, HitCollector hitCollector). What do you think?
Well, my preference still is on adding it to CompassQuery. Mainly because - from a not-that-heavy user perspective - it's easier to discover there as most newbies will look at the methods provided by CompassQuery and won't even know about LuceneHelper. I think your main concern is adding Lucene-only classes to CompassQuery, right? But it's an interface with a single method only. Shouldn't be too difficult to get it done the Compass-way.
What I am dreaming of is a nice collection of ready-to-use HitCollectors as part of Compass in order to get some common problems solved easily out of the box. I'm talking about stuff like this: hitCollector = new UniquePropertyValueCollector("lastname"); As I've seen, you've already applied my patch, that's great! However, if you'd like to see that code as part of LuceneHelper or a more Compass-like implementation, I'd be happy to contribute this too. Hi,
I am sorry, I applied the patch by mistake, I reverted it back (mostly). I agree regarding the built in collections, count is an example of one, and we can expose more as methods on the query itself (I am a big fan of rich domain model). I have a problem with exposing HitCollectors in such a manner, since currently I am thinking of re-architecting the way search is performed (which will pave the way for simple facet based support). Once this is done, then exposing a more Compass level API will be simpler. The problem with hit collectors, as they are, is that people might create ones that will kill the index, for example, loading the document in each collect operation instead of going to the field cache or doing other tricks. This is why there needs to be a much better abstraction for that, which can be built once the proper foundation is in place. Cheers, Okay, so I'll move the HitCollector code into LuceneUtil - and look into my own (admittedly quick and dirty) HitCollector implementation, as I am currently "killing the index"
Okay, I just moved the code to LuceneHelper ... well, almost. I'm just struggling with a this problem: I went for your suggestion and used the method signature you suggested (LuceneHelper#collect(CompassQuery query, HitCollector hitCollector)). However, I can't figure out how to get from CompassQuery to LuceneSearchEngineInternalSearch (in order to get a Searcher).
I have just added a getSession method to InternalCompassQuery, so you can get the session from the query, and then use it to get the internal search object. I think that the collect method should also return the internal search object so it can be used later on with what the collector found.
added to LuceneHelper:
public static LuceneSearchEngineInternalSearch collect(CompassQuery query, HitCollector hitCollector) Committed the patch, Thanks!.
The only thing left now, just to complete this I think, is to be able to get a Resource and an actual unmarshalled object from a Document in a LuceneHelper. If you tell me where to start, I could help with this too. I suspect though, that this needs some clever refactoring to avoid duplication of existing code - and I doubt that I'm able to do this with the big picture in mind.
I'm trying to use LuceneSearchEngineInternalSearch.getSearcher().search(Query, HitCollector) to be able to specify the number of hits to return (TopDocCollector) and a timeout value (TimeLimitedCollector). However, I'm at loss as to how I can deal with the resulting Document objects (obtained through searcher.doc
Until this gets implemented, am I out of luck with this approach? Or is there something else I can try to limit the number of hits and elapsed time from a higher level? To follow up for others who might be looking for a similar solution, I ended up searching with a TimeLimitedHitCollector (though that requires another HitCollector passed in; I kludged it by passing a TopFieldDocCollector - I need to specify a sort order but don't need the arbitrary numHits restriction. Is there an unbounded version?). From the resulting TopDocs, I get my unmarshalled domain object by doing something like this:
LuceneResource resource = new LuceneResource(searcher.doc(docId), docId, searchEngineFactory); |
||||||||||||||||||||||||||||||||||||||||