I recognise this is obviously a difficult one to solve for the general case, but if we could have some more framework support for distributed index building it would be very useful, as with non-trivial index sizes the runtime can be substantial.
Moving to a Multi-JVM environment allows us to throw more hardware (CPU/Memory) at the problem.
Currently the parallel gps devices like Hibernate use an internal thread executor service to parallel-process the index buld (typically at a sub-index level).
I've managed to distribute this by overriding this and using coherence to distribute out the work, which works but is a bit of a hack.
What would be ideal is some sort of baseline JMS implementation which allows pushing the index tasks onto a work queue and then hook points for the GPS devices to pull the work off that queue, basically a special type of Executor.
AFAIR Hibernate search has something along these lines.
Yes, that would be very interesting to implement. Another interesting integration can be do create another type of transaction "isolation" where all changes are bunched into a Job, that is then written to a queue where a worker picks it up and applies the changes. This means that changes happening in real time won't wait till the data is indexed.
Is there a chance that you can share the current hack you have? I would love to extend both the GPS indexing and the real time indexing to support this notion.