SourceForge.net Logo
Main Overview Wiki Issues Forum Build Fisheye
Issue Details (XML | Word | Printable)

Key: CMP-887
Type: Bug Bug
Status: Open Open
Priority: Major Major
Assignee: Shay Banon
Reporter: Stefan Fussenegger
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Compass

Analyzer ignored for prefix queries

Created: 28/Jul/09 08:57 AM   Updated: 20/Aug/09 11:58 AM
Component/s: Compass::Core
Affects Version/s: 2.1.4
Fix Version/s: None

File Attachments: 1. Text File cmp887-trunk.patch (4 kB)
2. Text File cmp887.patch (4 kB)



 Description  « Hide
For instance, when using ISOLatin1AccentFilter, "café" is converted to "cafe". This works perfectly fine for regular queries. However, when used with a prefix query (e.g. "café*"), the analyzer is not applied which leads to a query with (unknown) term "café" and an empty resultset.

After stepping through the debugger for some time, I'd say that the analyzer should be applied inside CompassQueryParser#getPrefixQuery(..) as it is used in CompassQueryParser#.getInternalFieldQuery(..)



 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Shay Banon added a comment - 31/Jul/09 03:03 AM
You can't really do that for two reasons: 1. The text can be partial so applying analyzer to it will most likely not make sense, the second is that Lucene does not support it ...

Stefan Fussenegger added a comment - 31/Jul/09 03:22 AM
Hi Shay, thanks for your reply. I agree, that some filters might not make sense (e.g. a stopword filter - don't filter the if actually looking for the*). However, others make perfect sense and are absolutely necessary to get results (e.g. the ISOLatin1AccentFilter mentioned above).

Any ideas on how to get things working? I'm currently thinking about annotating filter providers specified in compass.engine.analyzer.default.filters and building a token filter chain for prefix queries (with a subset of token filters used by default). Any chance, this would get approved to go into core?


Shay Banon added a comment - 31/Jul/09 03:55 AM
You refer to creating specific analyzers that will get applied on prefix query? You can do that, it might work in some cases...

Stefan Fussenegger added a comment - 31/Jul/09 06:06 AM
I'd be happy if it would at least work for the ISOLatin1AccentFilter. However, hardcoding that would be a hack and would make me feel kinda dirty

I think about enhancing the implementations of LuceneAnalyzerTokenFilterProvider to either

a) annotate it (either using an annotation or a maker interface) as PrefixQueryTokenFilterProvider and decided which Providers to call to build the filter chain or
b) pass an additional argument to createTokenFilter to indicate the context (i.e. what's going to be escaped, e.g. a prefix query) and let the provider decide, whether they should wrap the current TokenStream or not

What I'm not yet sure about:
How to get my hands on such a special filtered TokenStream inside getPrefixQuery()
How to create a TokenStream with a single Term (and the question if a filtered TokenStream for a single Term is kinda over the top - other ways?)


Stefan Fussenegger added a comment - 20/Aug/09 05:11 AM
I've implemented something very simple - was way easier than I expected. Why add more code if all that's needed is another analyzer?

the attached patch allows for a special prefix analzyer, e.g.

<prop key="compass.engine.analyzer.default.filters">latin1Filter,stopwordFilter</prop>
<prop key="compass.engine.analyzer.prefix.filters">latin1Filter</prop>

("prefix" is defined as a constant in LuceneEnvironment.Analyzer.PREFIX_GROUP - analogue to DEFAULT_GROUP and SEARCH_GROUP)

CompassQueryParser.getPrefixQuery(..) no tries to check if a prefix analyzer is configured. if yes, the prefix term ('bar' for query 'foo bar*') is analyzed. if analyzer filters the term or if no analyzer is configured, fallback to current default (use raw, lower-case input as term)


Stefan Fussenegger added a comment - 20/Aug/09 05:18 AM
btw, patch is for 2_2 branch version

Shay Banon added a comment - 20/Aug/09 11:10 AM
Is there a chance that you can update the patch for trunk? I will release 2.3 soon anyhow.

Stefan Fussenegger added a comment - 20/Aug/09 11:58 AM
The patch applies to trunk just fine. However, I've created a patch from the patched trunk version. It should be identical, but still, here is it.