Wednesday, August 3, 2011

Debugging high CPU consumption Java problem using thread dumps

I was trying to locate a problematic stack trace of a Java application which caused the CPU to go 100%. The best friend when come to debugging un-ordered CPU utilization of Java program in thread dumps. Here how you do this.

1. Locate the thread which eats up the CPU.

ps -C java -L -o pcpu,cpu,nice,state,cputime,pid,tid | less

Note that this assume the only Java process that runs on the system is the process that you want to debug.

This will out put something like below in Fedora Linux.


%CPU CPU NI S TIME PID TID
0.1 - 0 S 00:00:00 8016 8016
2.1 - 0 S 00:00:00 8016 8048
1.1 - 0 S 00:00:00 8016 8049
1.1 - 0 S 00:00:00 8016 8050
1.0 - 0 S 00:00:00 8016 8051
0.0 - 0 S 00:00:00 8016 8052
0.0 - 0 S 00:00:00 8016 8053
0.0 - 0 S 00:00:00 8016 8054
14.1 - 0 S 00:00:06 8016 8055
11.3 - 0 S 00:00:04 8016 8056
0.0 - 0 S 00:00:00 8016 8057
0.0 - 0 S 00:00:00 8016 8085
0.0 - 0 S 00:00:00 8016 8086
0.0 - 0 S 00:00:00 8016 8087
2.3 - 0 S 00:00:00 8016 8088
25.4 - 0 S 00:00:09 8016 8090
11.2 - 0 S 00:00:04 8016 8091


With the information above you can conclude that the thread number(TID) of the problematic thread is 8090.


Update:
You can sort the threads in a process according to their usage using following.

ps -mp <pid> -o THREAD


USER %CPU PRI SCNT WCHAN USER SYSTEM
rajika 3.2 - - - - -
rajika 0.0 19 - futex_ - -
rajika 0.2 19 - futex_ - -
rajika 0.0 19 - futex_ - -
rajika 0.0 19 - futex_ - -
rajika 0.1 19 - futex_ - -
rajika 0.0 19 - futex_ - -
rajika 0.0 19 - futex_ - -
rajika 0.0 19 - futex_ - -
rajika 0.0 19 - futex_ - -
rajika 0.0 19 - futex_ - -
rajika 0.0 19 - sk_wai - -
rajika 0.3 19 - futex_ - -
rajika 0.4 19 - futex_ - -


2. Get a thread dump of the application

ps ax | grep java
kill -3 java-process-number


3. Convert the value in #1(8090) into it's hexadecimal value and locate the problematic stack trace using that value in the thread dump. The value that you need to look into is the nid value in the stack.


"[ThreadPool Manager] - Idle Thread" daemon prio=10 tid=0x00007f516c066800 nid=0x1a8f in Object.wait() [0x00007f51d85b9000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00007f51b44b4610> (a org.eclipse.equinox.internal.util.impl.tpt.threadpool.Executor)
at java.lang.Object.wait(Object.java:485)
at org.eclipse.equinox.internal.util.impl.tpt.threadpool.Executor.run(Executor.java:106)
- locked <0x00007f51b44b4610> (a org.eclipse.equinox.internal.util.impl.tpt.threadpool.Executor)




4. Once you know where to look in the source (#3) you can look into the source of the high CPU utilization problems

1 comments: