Wednesday, January 05, 2011

Memory leaks: OutOfMemoryError: PermGen space

I have used JProfiler, Eclipse TPTP, a combination of jconsole and other command line tools in the past to resolve memory leaks problems. We are using JDK 6 and so Java Visual VM (jvisualvm command) can be used to troubleshoot what is going on.

I recall using JBoss as part of one of my assignments and concluding that a total refactoring of the code was needed after finding out memory leaks were most likely there to stay.

So I have seen people deciding to restart the servers every so often and even minimize the amount of deployments made to production just to avoid the inevitable error:
OutOfMemoryError: PermGen space

Regardless the fact that an application server itself might be buggy (and you can just Google about memory leaks for any Java application Server to see that actually all of them have suffered from this in the past) or even the JVM can be buggy there is a lot you can do before just going ahead and blaming others.

I decided to post the results of a task that involved 10 hours of work plus 12 hours of automated tests to get rid of memory leaks in an application deployed in tomcat. Basically after some deployments the server had to be restarted.

Below is a screenshot showing the initial situation. As you see PermGen consumption goes up with each deployment (steps in graphic) and it never goes down (Garbage Collection has no effect === memory leak)


Here are some of the errors I was getting from tomcat logs after each deployment:
Jan 3, 2011 5:24:07 PM org.apache.catalina.loader.WebappClassLoader clearReferencesJdbc
SEVERE: The web application [/] registered the JBDC driver [com.mysql.jdbc.Driver] but failed to unregister it when the web application was stopped. To prevent a memory leak, the JDBC Driver has been forcibly unregistered.
Jan 3, 2011 5:24:07 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/] appears to have started a thread named [Thread-43] but has failed to stop it. This is very likely to create a memory leak.
Jan 3, 2011 5:24:07 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/] appears to have started a thread named [MultiThreadedHttpConnectionManager cleanup] but has failed to stop it. This is very likely to create a memory leak.
Jan 3, 2011 5:24:07 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap
SEVERE: The web application [/] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@44d2963b]) and a value of type [org.springframework.security.core.context.SecurityContextImpl] (value [org.springframework.security.core.context.SecurityContextImpl@ffffffff: Null authentication]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.
Jan 3, 2011 5:24:07 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap
SEVERE: The web application [/] created a ThreadLocal with key of type [net.sf.json.AbstractJSON.CycleSet] (value [net.sf.json.AbstractJSON$CycleSet@5d851ec9]) and a value of type [java.lang.ref.SoftReference] (value [java.lang.ref.SoftReference@4ea84e16]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.
Jan 3, 2011 5:24:07 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap
SEVERE: The web application [/] created a ThreadLocal with key of type [org.apache.commons.lang.builder.HashCodeBuilder$1] (value [org.apache.commons.lang.builder.HashCodeBuilder$1@b9eaeb2]) and a value of type [java.util.HashSet] (value [[]]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.
INFO: Illegal access: this web application instance has been stopped already.  Could not load org.apache.axis2.context.ConfigurationContext$1.  The eventual following stack trace is caused by an error thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access, and has no functional impact.
java.lang.IllegalStateException
 at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1531)
 at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1491)
 at org.apache.axis2.context.ConfigurationContext.cleanupTemp(ConfigurationContext.java:759)
 at org.apache.axis2.context.ConfigurationContext.terminate(ConfigurationContext.java:747)
 at org.apache.axis2.client.ServiceClient.cleanup(ServiceClient.java:816)
 at org.apache.axis2.client.ServiceClient.finalize(ServiceClient.java:795)
 at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
 at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83)
 at java.lang.ref.Finalizer.access$100(Finalizer.java:14)
 at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160)
Exception in thread "MultiThreadedHttpConnectionManager cleanup" java.lang.NullPointerException
 at org.apache.commons.logging.impl.SLF4JLocationAwareLog.isDebugEnabled(SLF4JLocationAwareLog.java:59)
 at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.handleReference(MultiThreadedHttpConnectionManager.java:1105)
 at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1124)

 



Below are the results from jvisualvm thread dump
2011-01-04 14:45:40
Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.1-b03-307 mixed mode):

"Thread-12" daemon prio=5 tid=102d68000 nid=0x1263b2000 in Object.wait() [1263b1000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <10d131898> (a org.enhydra.jdbc.pool.PoolKeeper)
 at org.enhydra.jdbc.pool.PoolKeeper.run(PoolKeeper.java:55)
 - locked <10d131898> (a org.enhydra.jdbc.pool.PoolKeeper)
 at java.lang.Thread.run(Thread.java:680)

   Locked ownable synchronizers:
 - None

 "Thread-15" daemon prio=5 tid=125539000 nid=0x129983000 runnable [129982000]
    java.lang.Thread.State: RUNNABLE
  at java.net.SocketInputStream.socketRead0(Native Method)
  at java.net.SocketInputStream.read(SocketInputStream.java:129)
  at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
  at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
  at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
  - locked <10bd35918> (a java.io.BufferedInputStream)
  at com.sun.jndi.ldap.Connection.run(Connection.java:808)
  at java.lang.Thread.run(Thread.java:680)

    Locked ownable synchronizers:
  - None
  
"JotmClock" daemon prio=5 tid=10190e000 nid=0x126db5000 waiting on condition [126db4000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at org.objectweb.jotm.TimerManager.clock(TimerManager.java:171)
 at org.objectweb.jotm.Clock.run(TimerManager.java:65)

   Locked ownable synchronizers:
 - None

"JotmBatch" daemon prio=5 tid=101e9d000 nid=0x126cb2000 in Object.wait() [126cb1000]
   java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <10948f3c0> (a java.util.Vector)
 at java.lang.Object.wait(Object.java:485)
 at org.objectweb.jotm.TimerManager.batch(TimerManager.java:223)
 - locked <10948f3c0> (a java.util.Vector)
 at org.objectweb.jotm.Batch.run(TimerManager.java:87)

   Locked ownable synchronizers:
 - None

Clearly we have a combination of problems here.

  1. The mysql error is related to a DBCP memory leak bug which went away after switching to the newest jar version 1.4
  2. [Thread-15] for example could not be stopped because of a memory leak bug in apache commons lang which went away after using version 2.5
  3. JotmClock, JotmBatch and [Thread-12] are related to a combination of the DBCP leaking error
  4. Axis2 was upgraded to version 1.5.4 from version 1.4.1 which had a memory leak
  5. json-lib was upgraded to 2.4 because of memory leak in previous version
  6. The rest of the problems are related to an application bug. The spring context was not properly closed (webApplicationContext.close() must be called). Below is the code showing the correction:
    public class ApplicationServletContextListener implements
            ServletContextListener {
    ...
    public void contextInitialized(ServletContextEvent event) {
           ServletContext servletContext = event.getServletContext();
           webApplicationContext = (ConfigurableApplicationContext) WebApplicationContextUtils
                   .getWebApplicationContext(servletContext);
    ...
    public void contextDestroyed(ServletContextEvent event) {
            webApplicationContext.close();
    }
    ...
    

After I finished all the changes I scripted an atomatic redeployment to occur every minute and I left it running the whole night.

Below is a screenshot showing the final situation. As you see PermGen consumption goes up with each deployment (steps in graphic) but it goes down again as a result of Garbage Collection === no memory leak.

6 comments:

mjb said...

I don't see how "[Thread-15] for example could not be stopped because of a memory leak bug in apache commons lang which went away after using version 2.5" could be true. Where is commons lang involved in this stack trace?

Nestor Urquiza said...

I should have referred perhaps to specific URLs describing the specific problems I found like https://issues.apache.org/jira/browse/LANG-586

But the bottom line of this post is not to enumerate all possible issues that can result in a container memory leaking but for the contrary prepare the developers with the tools they need to find out memory leaks.

To get it straight memory leaks will always be there because of bugs and most of the time they will be out of the container responsibility even though of course the leaking might come from the container itself as I have found out several times in JBoss in the past (for example).

Tech Bits said...

Does spring not already do that? http://static.springsource.org/spring/docs/2.5.x/api/org/springframework/web/context/ContextLoaderListener.html#contextDestroyed%28javax.servlet.ServletContextEvent%29

Nestor Urquiza said...

@tech bits I am not following you. How can sprint prevents a memory leak coming from your code or from a library yoi use? Indeed I do use spring and you should use the hooks for context destroy as you would with plane sevlets but that is not automatic. You need to take care of freeing resources and objects yourself.

This post is about memory leaks that you will always need to deal with. As far as I know there is no such a thing like automated memory leak fixer for server containers. In fact I have seen more than once: Just restart your server after deployment to avoid a perm space out of memory, for example.

Tech Bits said...

It looked to me like spring was already listening for the servlet event thus making the additional listener also calling webApplicationContext.close(); extraneous but I might be missing something.

Nestor Urquiza said...

Yup, spring cleans up or at least attempt to unload all spring managed objects however if one of the objects is using a buggy code spring can do nothing. The developer is responsible to identify the memory leak in that case.

Followers