Kotlin Reverse Engineering Tutorial 1 - Study of Boolean Datatype Size
12 Oct 2023Intro
In this blogpost we will see the basic disassembly of code generated for JVM bytecode from a compiled Kotlin class to understand how boolean data type is treated internally.
Kotlin source code file (BooleanSize.kt) :
package com.shubhamaher.hellokotlin
fun main() {
var booleanTrue : Boolean = true
}
Generated bytecode disassembly in Intellij IDEA
1. Open your Kotlin source file (BooleanSize.kt in this example).
2. Goto "Tools" --> "Kotlin" --> "Show Kotlin Bytecode"
Note: Make sure you have “Kotlin to Java Decompiler” plugin installed.
To confirm this do,
A. Goto "Help" --> "Find Action" --> type "Plugins" and open the Plugins Marketplace.
B. Switch to "Installed" tab and in the list of plugins installed, confirm the above decompiler plugin.
Once you show the bytecode, it should look like below :
// ================com/shubhamaher/hellokotlin/BooleanSizeKt.class =================
// class version 50.0 (50)
// access flags 0x31
public final class com/shubhamaher/hellokotlin/BooleanSizeKt {
// access flags 0x19
public final static main()V
L0
LINENUMBER 4 L0
ICONST_1
ISTORE 0
L1
LINENUMBER 5 L1
RETURN
L2
LOCALVARIABLE booleanTrue Z L1 L2 0
MAXSTACK = 1
MAXLOCALS = 1
// access flags 0x1009
public static synthetic main([Ljava/lang/String;)V
INVOKESTATIC com/shubhamaher/hellokotlin/BooleanSizeKt.main ()V
RETURN
MAXSTACK = 0
MAXLOCALS = 1
@Lkotlin/Metadata;(mv={1, 1, 15}, bv={1, 0, 3}, k=2, d1={"\u0000\u0008\n\u0000\n\u0002\u0010\u0002\n\u0000\u001a\u0006\u0010\u0000\u001a\u00020\u0001\u00a8\u0006\u0002"}, d2={"main", "", "hellokotlin"})
// compiled from: BooleanSize.kt
}
// ================META-INF/hellokotlin.kotlin_module =================
,
com.shubhamaher.hellokotlin
BooleanSizeKt
Disassembly Explanation
In above kotlin/java bytecode asm disassembly, the main section is the instructions used in “public final static main()V” i.e. “ICONST_1” and “ISTORE 0” followed by a “RETURN”.
Note that everything after end of L1 section, is just verbose information given as part of the disassembly.
So internally in the JVM, an “int” constant with value “1” (true) is pushed on the operand stack by “ICONST_1” and further stored on the local variables array at index 0 by “ISTORE 0”.
Observation
This tells us that the JVM “int” data type with value 0/1 is used internally for representing false/true value of a boolean, respectively.
For more information on the internal JVM instruction set and its working(Local variables array, Operand stacks, Method references, Constant pool, etc.), the JVM Specification is helpful esp. the Frames section.
JVM Spec documentation
In “the boolean type” section of the above JVM Spec, it is documented clearly as :
There are no Java Virtual Machine instructions solely dedicated to operations on boolean values.
Instead, expressions in the Java programming language that operate on boolean values are compiled to use values of the Java Virtual Machine int data type.
Fun fact
Therefore the bytecode generated for above Kotlin source code using Boolean data type(BooleanSize.kt) will be same as the bytecode that will be generated for below Kotlin code snippet using the Int data type :
package com.shubhamaher.hellokotlin
fun main() {
var integer : Int = 1
}
In summary, bytecode for
var integer : Int = 1
is same as bytecode for
var booleanTrue : Boolean = true
Verdict
This behaviour of treating Boolean as int is same since very long which I had covered more than a decade back in the Java source-specific reversing blogpost. In fact, this Kotlin source-specific blogpost is a revisit to the same study of boolean size in JVM because the Kotlin source is targeted to compile into Java bytecode.
The reason for treating a boolean as int, is simply the basic fact that the minimum possible atomic size of addressable data that can be stored or worked upon, in general for computation at processor level itself, is a byte (8 bits) and not a bit (1 bit).
And hence this is the reason a boolean cannot be treated in an optimum way practically as a single bit even if it theoretically represents at the most only two values(false/true) represented by a single bit change (0/1).