Warming up Byte Code in a JVM

Firstly it is important to understand the whole compilation process that we all studied in Computer 101 class. The process is:

  • Code as text (Code)
  • Linting and code replacements ( Pre Compile )
  • compilation into Object files ( Compile )
  • finally Linking and Optimizations which produces something that represents instructions that can be given directly to a CPU by the OS to execute some action hence why the results are called executables.

In a C/C++ type environment this is what you would do and the end result is a file with instructions of how to execute the program that the Operating System can run for you. In Java this is not quite the same and the difference is important.
In java we have this flow:

  • Code as Text
  • Linting and code replacements ( Pre Compile )
  • Create class files representing Bytecode ( Pre Compile / Pre Link )
  • Perform Class Loading
  • Bootstrap class loading
  • Extension class loading
  • Application Class Loading
  • Run ByteCode through Java interpreter starting at Main, which then on the fly works out what instructions to give the CPU so that your program can run. ( this is on the fly execution via the interpreter ).
  • The JVM interpreter will interpret and profile the code and it will then profile how many times specific pieces of code are executed and once a piece of code is executed X times the Interpreter will on the fly invoke the internal C1 compiler to actually create reusable executable instruction sets so it does need to reinterpret that section of code but rather just use the compiled code for that section. Note interpreted is slower than compiled code.

This is all about Tired compilation and you can see how many iterations a JVM needs to warm up. The command we care about is Tier3InvocationThreshold where the interpreter saw that method 200 times before actually compiling it rather than interpreting the code.


java -XX:+PrintFlagsFinal | grep Tier3
intx Tier3BackEdgeThreshold = 60000 {product}
intx Tier3BackedgeNotifyFreqLog = 13 {product}
intx Tier3CompileThreshold = 2000 {product}
intx Tier3DelayOff = 2 {product}
intx Tier3DelayOn = 5 {product}
intx Tier3InvocationThreshold = 200 {product}
intx Tier3InvokeNotifyFreqLog = 10 {product}
intx Tier3LoadFeedback = 5 {product}
intx Tier3MinInvocationThreshold = 100 {product}




So you can see the process is slightly different and we determine what to send the CPU at the very last second in Java (JIT just in time compilation) as we need the execution rather than building the executions and compilation upfront like in C/C++.
So warming up a JVM can lead to performance benefits, and different JVM vendors will have different performance characteristics.


Instruction Set

Now we have this information we need to look at what we mean by instructions to a CPU to execute the Execution step.
For those developers who have been around a while like myself you may well remember assembly language that we used to use to program computers. This language is essentially instructions to a processor ( We call them CPU but it is really just transistors in Silicon ) to do something. When programming at this level you need to be very specific for example to execute:


int i=0; i=i+1;



I would follow these painful steps.

move i into Register AX from current memory location
manipulate register AX bits to add 1
Return Result of AX on the Bus back to memory.

So you can see this language is very specific and uses Registers, what we today call sections of a Cache Line, as most processors have a cache. The language evolves as the processors evolve and now we tend to call these languages Instruction Sets and only creators of compilers and JVMs really care about it. Note the instruction set is different for each processor or CPU type so we Have the x86 Intels and the ARM instruction sets and the Apple Instructions Sets for M1 based on ARM, the RISC instruction sets.

Ultimately the goal is to create a set of instructions to a CPU to do some work and execute some instructions. The more instructions a CPU needs to execute for a given task of work the slower it will be. As a motivating example lets look at some simple code:



if ( i==3) { j=i; }



Now lets look at the Instructions set pseudo code for something like this

1:Move i into Local Cache from Heap
2:if i!=3 then abandon stack and move to next task
3:copy value in local cache memory location i to local cache memory section j
4:return j on the local Bus and back to Heap memory j

Now lets look at an obvious optimisation ( This is how most processors do it today )

1:Move i into Local Cache
2:compare i to 3 and if equal move 3 to j
3:return j on the local Bus and back to Heap memory j

Now here is the key the 2: above is really a compare and set command and we actually have a command compareandset in Java look at AtomicInteger. So in order to be performant we can directly call the cpu instruction set so we can manipulate at the CPU level which in Java is about as close as you are likely to get to the metal. If you use tricks like this we can squeeze more performance from a processor. This is what being Low Latency means.

Note in Java we can’t just say do some CPU instructions here in code but we can leave an elephant sized hint to the Interpreter/Compiler of our desired intentions.

People who enjoyed this article also enjoyed the following:


Thread Safe Java Singleton
Create a Statistics Distribution
Naive Bayes classification AI algorithm
K-Means Clustering AI algorithm
Equity Derivatives tutorial
Fixed Income tutorial


And the following Trails:

C++
Java
python
Scala
Investment Banking tutorials

HOME

homeicon

By clicking Dismiss you accept that you may get a cookie that is used to improve your user experience and for analytics.
All data is anonymised. Our privacy page is here =>
Privacy Policy
This message is required under GDPR (General Data Protection Rules ) and the ICO (Information Commissioners Office).