Main Overview Wiki Issues Forum Build Fisheye
Issue Details (XML | Word | Printable)

Key: CMP-698
Type: New Feature New Feature
Status: Open Open
Priority: Major Major
Assignee: Shay Banon
Reporter: Louis Emmett
Votes: 1
Watchers: 1

If you were logged in you would be able to see more operations.

Support Distributed Indexing

Created: 26/Aug/08 06:18 PM   Updated: 19/Oct/08 10:35 PM
Component/s: Compass::Gps, Compass::Needle
Affects Version/s: 1.2.2, 2.0.2
Fix Version/s: None

 Description  « Hide
I recognise this is obviously a difficult one to solve for the general case, but if we could have some more framework support for distributed index building it would be very useful, as with non-trivial index sizes the runtime can be substantial.
Moving to a Multi-JVM environment allows us to throw more hardware (CPU/Memory) at the problem.

Currently the parallel gps devices like Hibernate use an internal thread executor service to parallel-process the index buld (typically at a sub-index level).
I've managed to distribute this by overriding this and using coherence to distribute out the work, which works but is a bit of a hack.

What would be ideal is some sort of baseline JMS implementation which allows pushing the index tasks onto a work queue and then hook points for the GPS devices to pull the work off that queue, basically a special type of Executor.
AFAIR Hibernate search has something along these lines.

 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Shay Banon added a comment - 19/Oct/08 10:33 PM
Hi Loius,

Yes, that would be very interesting to implement. Another interesting integration can be do create another type of transaction "isolation" where all changes are bunched into a Job, that is then written to a queue where a worker picks it up and applies the changes. This means that changes happening in real time won't wait till the data is indexed.

Is there a chance that you can share the current hack you have? I would love to extend both the GPS indexing and the real time indexing to support this notion.

Shay Banon added a comment - 19/Oct/08 10:35 PM
p.s. one way to increase the indexing time is to configure GPS to have the indexing proeprties to work against file system. The GPS mechanism will automatically copy it to the actual index once the indexing is finished.