Batching in Hibernate


Most people who use batch probably find out about it by trying to perform a large operation and finding out the hard way why batching is needed. They run out of memory. Once this is resolved they assume that batching is working properly. The problem is that even if you are flushing your first level cache, you may not be batching your SQL statements.

Hibernate flushes by default for the following reasons:

  • Before some queries
  • When commit() is executed
  • When session.flush() is executed

The thing to note here is that until the session is flushed, every persistent object is placed into the first level cache (your JVM's memory). So if you are iterating over a million objects you will have at least a million objects in memory.

To avoid this problem you need to call the flush() and then clear() method on the session at regular intervals. Hibernate documentation recommends that you flush every n records where n is equal to the hibernate.jdbc.batch_size parameter. A Hibernate Batch example shows a trivial batch process.

public void doBatch() {
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
    Customer customer = new Customer(.....);
    Cart cart = new Cart(...);
    customer.setCart(cart) // note we are adding the cart to the customer, so this object
     // needs to be persisted as well;
    if ( i % 20 == 0 ) { //20, same as the JDBC batch size
        //flush a batch of inserts and release memory:
There are two reasons for batching your hibernate database interactions.  The first is to maintain a reasonable first level cache size so that you do not run out memory.  The second is that you want to batch the inserts and updates so that they are executed efficently by the database.  The example above will accomplish the first goal but not the second. 
The problem is Hibernate looks at each SQL statement and checks to see if it is the same statement as the previously executed statement.  If they are and if it hasn't reached the <code>batch_size</code> it will batch those two statements together using JDBC2 batch.  However, if your statements look like the example above, hibernate will see alternating insert statements and will flush an individual insert statement for each record processed.  So 1 million newcustomers would equal a total of 2 million insert statements in this case.  This is extremely bad for performance. 
The solution is very simple.  Just add the following two lines to your hibernate configuration.
<prop key="hibernate.order_inserts">true</prop>
<prop key="hibernate.order_updates">true</prop>