|
[
Permlink
| « Hide
]
Ramon Garcia Fernandez added a comment - 29/Oct/07 06:44 AM
Patch to fix described issues.
Applied the patch. Changed some things within it in order to be as much as possible backward compatible (it still works with Query as well).
Also, did not understand the prex/item logic when scrolling though the data. I kept it how it was before for now. If you can explain why you did it like that, I would be happy to hear. Anyhow, thanks for the patch!. I did not know about the outer join difference between Criteria and Query, strange ... The logic of comparison with previous is to avoid indexing repeated objects. This is necessary to support OUTER JOIN queries. When Hibernate performs such a query (for getting records of a table that has an 1:N relation with another table) it returns several times the same object. This would create an index with repeated entries. To do the correct indexing, the object must be indexed when it is complete, that is, when all the 1:N references have been fetched. When we see that the current record is different from the previous one, that means that all the 1:N relations of the previous row have been got, and then the previous object is indexed (actually, scheduled for indexing by storing it).
The behaviour of Hibernate with OUTER JOIN is documented in http://www.hibernate.org/117.html#A12 And I am guessing .setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY) does not work with scroll, right?
Although setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY) works, it is inefficient, because it loads everything and then removes duplicates in memory, thus requiring to have all the rows in memory at the same time.
That is what I thought, since it is not possible to do that with LinkedHashSet and scroll. Ok, I have added the mentioned logic (which will only work with the mentioned order by). Last question, why do you have the RowBuffer class, is there some additional caching that need to be done that I am missing? I have done it without it. Also, can you please check that the changes applied works for you?
The purpose of the RowBuffer class is to enhance the performance when one is not using OUTER JOIN, but lazy loading and batch processing (see batch processing in http://www.hibernate.org/hib_docs/v3/reference/en/html/batch.html
SELECT * from related_table WHERE foreign_key in (row1, row2, row3, row4); If one enables lazy loading, hibernate will load the related rows when they are accessed, that is, when the row is indexed. For this to work, Hibernate must have several objects with related rows pending to be loaded. The RowBuffer class delays the loading of the related rows of one object (caused by the indexing) until other objects have been already loaded. The previous way of working: load object at any time, Hibernate has only one row in its memory, and, when access to its related row is requested, cannot fetch related rows of other objects, because it knows about none. I thought so. But, isn't it enough to remove the evict method and rely on Hibernate first level cache to keep that. Why do we need to build our own mechanism as well?
There are two different issues.
If one does not evict, Hibernate keeps everything in memory, and one eventually gets an OutOfMemoryException. This is neccessary to make object identity reliable (that is object1 == object2 if both come from the same database row); if the one calls evictAll() it is under one's responsability. This behavior is somewhat odd. Note that Hibernate is designed for transaction applications, and not for batch processing like indexing. It does not work too bad, but is somewhat odd. There is another issue. With the loop {load ; index; } the batch processing cannot do its job. When the related rows of some row are being loaded, there is no other row that has pending related rows. All the rows previously loaded have been completely loaded because they have been already indexed. What is necessary is something like load 1; in this way, when one indexes row 1, Hibernate notices, I have here rows 2,3, 4, 5 pending to be loaded, and I can load them, and issues a select statement to load all of them at once: SELECT fields FROM related WHERE main_id IN (id1, id2, id3, id4, id5) I think I understand partially what you are saying. First, basically, the code (without the call to evict, but with the call to clear) (this is not how it is now in svn, basically, the evict call needs to be removed), does not evict on each row. So basically, Hibernate can act as the row buffer by using its first level cache.
What you are suggesting, if I understand correctly, is that by keeping things in memory, and then indexing a "fetchCount" objects that we keep in memory, will cause less sql queries initiated against the database by Hibernate? This I do not fully understand. The call to index 1 will cause object 1 to be indexed, and if it has a relationship to other objects, they will be loaded or have already been preloaded, regardless of the fact that I loaded 2. Right? Also, with your change using prev and item, we actually only index the last similar object. Object equality is kept by Hibernate first level cache anyhow. Note, I am not suggesting that what you are saying is wrong, I just want to understand precisely what you are suggesting before committing this (I have not directly worked with Hibernate for over 2 years now). In order to Hibernate batch processing work for related objects, Hibernate must have in its first cache level objects that have been loaded, but whose related objects have not been loaded. Hibernate batch processing happens when in addition to the current object, there are other candidate objects in the same table that are pending to be loaded. In this case, "pending" is a related object whose referring object is loaded.
The call to index 1 will cause object 1 to be indexed, and if it has a relationship to other objects, the will be loaded. But if object 2 is loaded, then, when loading the related objects of obj 1, the related objects of obj2 will be also loaded. load 1 load 1 If Hibernate does that, then it makes perfect sense. Can you post a link here that points to this feature in Hibernate?
The behaviour was deduced by a mixture of documentation (the linked chapter on batch processing http://www.hibernate.org/hib_docs/v3/reference/en/html/batch.html
I will setup a simple and reproducible test case. This is neccessary anyway, to test the patches as applied. An example would be great. It is very strange for me that Hibernate would do that since it might load data for things already loaded in its first level cache which might never be used. For example, if I only call on index1 and thats it.
If it does that that, then it would be great to understand how it decides how much to load, since people would probably want to control it. You are right that batch fetching is counterintuitive. For this reason it is disabled by default. For cases where it improves the performance, like indexing, the user can enable it.
The documentation about batch fetching (sorry, I had forgotten about it) is in http://www.hibernate.org/hib_docs/v3/reference/en/html/performance.html#performance-fetching-batch The decission of how much to load is given by the parameter batch-size of an object. The default can be set in the hibernate property hibernate.default_batch_fetch_size I knew about the collection part, but did not know that Hibernate is smart enough to do it on class level as well. Pretty cool. Now, the row buffer makes perfect sense, I will add it.
Ok, patch applied (with some changes to support backward compatibility). Can you please check it the latest SNAPSHOT and see if it works?
OK, I will try a small set of test cases.
Sorry for the delay.
I have just performed a small set of tests (a couple of tables book, author) and it seems to works fine with all combinations of join type, lazyness and batch-size. |
||||||||||||||||||||||||||||||||||||||||||