This blog is subject the DISCLAIMER below.

Wednesday, June 10, 2009

Optimizing Hibernate Usage

Well, getting straight to the point, using hibernate is totally different from effectively using it. You can set up and put hibernate in action very fast in your application and enjoy ORM features, however, as any project grows and expands by time, you may find that the application is acting slow and with overhead on the DBMS, most people - (mostly seniors and architects :D) - just blame hibernate for being inefficient and tend to prove that the ORM concept itself is wrong and hibernate is just saving some JDBC code with a great performance tradeoff. This article aims at helping you to know how you can optimize the usage of hibernate to minimize performance penalty. we suppose you already have some experience with basic usage of hibernate by the way, so if you are an absolute beginner this is not for you.

Before we proceed, there are two rules that we need to stress on:
1- Using hibernate does not completely isolate the developer from that database, the relational model is still involved.
2- Hibernate is not some sort of magic, it can not predict what you want, it will do what you configured it to do.

===== Basic Principles =====
99% of problems faced with hibernate is due to misconfiguration or misunderstanding, both reasons are the common two reasons when dealing with any framework that abstracts details behind the scenes. In order to start effectively using hibernate, we need to talk about the following terms:

1- Session.
2- Object state.
3- Transaction.
4- Locking.
5- Caching.
6- Connection Pooling.

Session: A session object in hibernate is that place created to store and manage objects and their states in memory, it can be considered as a cache, in other words, it is the heart of communication between your business model and the relational model represented by the DBMS. When you retrieve an object from the DB it is stored in the session, when you update that object and call session.save(object) the session manages to synchronize the data in the data base with the one in the session in object. A session's life ends once the thread which created it is killed, so, when a web user requests a web page which needs to contact the DB, the thread that was initiated to handle the HTTP request will create the session to be used as long as the thread is alive, once the HTTP request is done and the thread is killed the session object is killed too.
When you retrieve an object, hibernate checks if the object is in the session first, if not, it gets it from the DB and saves it in the session object for further requests.

Object State: The word "state" here means how hibernate sees an object from the session perspective, an object has one of 3 states:

1- Transient : The object is still not saved in the DB.
2- Persistent : The object has already been saved to the DB and hibernate session can manage it.
3- Detached : The object has an entry in the DB, but the session that saved or retrieved it has been closed, so the object now has no associated session to manage it.

A developer needs to keep in mind the states of objects, because this can lead to many problems if not handled correctly, for example; a user requested a web page which loaded an object from the DB, that object will be updated by the client, however, the client took too long to perform that update, after he submitted the page again, the business logic simply calls session.update(object), but, a problem is there. The problem is because the hibernate session which retrieved the object in the first place has been closed, so the object is now detached, and can not be updated until it is re-attached to a hibernate session again. There are many solutions for this problem, we might look at later.

Transaction: A transaction with the terminology of hibernate is any kind of operations taking place between the application and the database, do not mix this with the atomic data base transactions. A transaction in hibernate is any data manipulation operations; when u need to reflect any thing to the database you need to put your code within begin Transaction and commit calls.

Locking: When a DB record is accessed by many threads, we need to manage the way these threads are dealing with the DB, a typical situation is where a thread reads a certain value and before it updates it another thread has already put another value, so, the information now being processed by thread one is considered invalid. Locking is the mechanism which decides the action taken by the DB when some records are being read. One solution for the previously mentioned problem is to tell the DBMS not to permit any updates for a certain record while a thread is reading it and intending to update it. There are different locking mechanisms for different situations. Hibernate can manage locking for objects on the DB level, that means that hibernate never locks an object in memory - ( in other words; the session) - so, always take care that hibernate only supports a certain locking mechanism if the underlying DBMS supports it.

Caching: Caching is a way to save extra hits to the database to retrieve info, a cache saves information in memory for fast retrieval and a cache is also responsible for maintaining that the data stored in the cache is valid and synchronized with the DB. Caching in hibernate takes place at 2 levels:

1- First level cache: Which is the session object, however, as we said before, a session is valid as long as the thread that initially created it is running, so, it is a caching layer per-thread. This caching can not be disabled, you can not manage to disable the caching in the hibernate session by any means.

2- Second level cache: Which can be any caching manager, ex; ehcache. A second level cache is a caching for the application, it is not tied to a specific thread, as long as the application is running, this cache will be used, if you use a second level cache and try to retrieve an object from the DB Hibernate will first check the first level cache (Session) if it does not find it, or finds it invalid, the first level cache will check with the second level one, if it does not find it, the second level cache will manage to get a fresh copy from the DB and saves it for further uses.
Using of second level caching needs caution, objects on the second level chache may not be in some cases synchronized with the DB or the first level cache, there are some types of applications the can not use second level caching by any means,like; financial applications.

Connection Pooling: Like any pool, this is a pool containing some open connections to the DB. Pooling techniques serve performance by keeping it fast and reliable to talk to the DB without the overhead of always opening and closing connection, also, a connection pool may control the number of open connections to a DB so no overload occurs. Hibernate offers a connection pooling provider "C3p0", however, you can use any other pooling mechanism.

===Approaches to optimizing hibernate===

We will now try to define a roadmap to how you can start the process of optimization, sure we can not give or describe all problems and techniques, however, we try to provide your first steps towards optimization.

Some Advices:
1- Never try to start optimizing an application using hibernate from day zero, you will never be able to effectively tune it, wait until you have an application running and then manage to optimize.
2- A use of a load testing tool along with a profiling one may help a lot in determining bottle necks and bad written code.
3- Tuning requires a lot of team work effort and time, it is not always hibernate that is the cause of bad behavior, it may be from the DB, Java itself, network or server issues.
4- Review & fix your object model design for the application as it direclty affects your relational model, many problems arise from bad object models, the more you enhance your object model, the more you save your self a lot of trouble in the optimization process.
5- Configure hibernate to write the queries generated on the standard output, this will help you very much in tracing problems.
6- The ultimate solution for your problems is not here, every problem has its specific solution, you have to explore alternatives to decide what to use, there is nothing totally right or totally wrong.

===== Lets Go ======
1- It all starts from mapping:
As a hibernate developer, you must have a good knowledge about the ".hbm" files even if you use a generation tool. Mapping is a critical section, as a wrong or a bad mapped object may result in an unwanted heavy behavior. Make sure you define relations between objects in the right way, the tags you write in mappings defines how hibernate will build queries to retrieve and save your objects.

Hint 1: When mapping a one-to-one relations, its very effective to use the "constrained=true" attribute if you are sure that there is a record matching the foreign key association, this way, hibernate will not attempt a lookup first before selecting the records, it will direclty go and pick up the record as it previously knows of its existence.

Hint 2: Revisit your fetching strategy for objects, the "fetch" attribute tells hibernate how to retrieve a certain relation, it has many options. You may need to tell hibernate to use a certain type of join for a specific object, or a sub-select for another, that depends on the situation.

2- Queries:
Your queries may be the source of the problem, you may override an association strategy defined in the mapping file and then wonder why it is going wrong!!. Using SQL, HQL, or Criteria API are all valid options for object retrieval. Criteria API just gets translated to HQL before execution, so, using HQL is slightly better in terms of performance. Using SQL is needed at some cases when HQL lacks a certain feature for a specific DBMS.

Using query caching may be effective in cases where the application does not write too much to the DB, as caching a result set of a query become invalid if any update, delete or insert takes place at the DB table, you may enable query caching by setting
hibernate.cache.use_query_cache true
and use the setCacheable(true) of the query class before execution of a query.

Hint 1: Suppose a users logs in to your website, simply, you will hit the database to get the user profile object to compare the password supplied by the user, using a default hibernate get operation will retrieve the whole object graph which may include references to other objects and as a result a bad looking join SQL statement may be produced, at the end we have a heavy object in memory only for getting a simple string value. In situations like this, you need to specify a certain retrieval approach that just gets only the information you need to be added to the object built and returned for later use. You need to see how to use hibernate to build objects only of certain information.

3- Locking:
Ok, you used a load testing tool and booom.... a locking problem occurs, you find that your application messed up with the records, so, you decided to move to pessimistic locking, but, oops, the application is now having deadlocks. This problem mostly arise from the DBMS, as mentioned before; hibernate manages to use the provided locking mechanisms of the DBMS, no memory locking takes place, a problem like this may need a DBA to rightly configure the DBMS to work smoothly with your application to handle locking situations. From the hibernate side, you need to revisit your queries which needs to handle locking scenarios in your application, you need to pay attention whith writing such queries as those queries may mess up things if not written with caution. (See how to use locking with hibernate)

hint 1: A problem might take place if you lock some object for update and then not doing anything with the hibernate transaction initiated the lock. This can be solved by making sure that under any circumstances the transaction will either commit or rollback, this way, the lock will be released, another solution, is by setting the locking timeout of the DBMS itself, so a lock will be released after some time either the transaction issues any further action or not.

4- Connection Pooling:
As mentioned before, using connection pooling is a very good practice if used properly, beside configuring a pool manager, you need to pay your code a visit to make sure that there is a thread some where holding a connection open, this situation mostly found in a missing or a miss-placed connection close call. Optimizing the usage of a connection pool may need the advice of the DBA.

4 comments:

Mohamed Reda said...

Thanks Youssef, it is really a great post

mhewedy said...

I am sorry I disrupts you.
About "First level cache", Is it per-transaction or per-thread?
I mean if we close the session but the thread still open, and open an new session, I think the objects cached by the first session will be lost, right?

mhewedy said...

Actually this post is very brilliant.
Thanks too much brother :)

Udayam said...

It is a great post, Sharp and simple to the point.

Thanks Youssef.