Kotlin Reverse Engineering Tutorial 1 - Study of Boolean Datatype Size

Intro

In this blogpost we will see the basic disassembly of code generated for JVM bytecode from a compiled Kotlin class to understand how boolean data type is treated internally.

Kotlin source code file (BooleanSize.kt) :

package com.shubhamaher.hellokotlin

fun main() {
    var booleanTrue : Boolean = true
}

Generated bytecode disassembly in Intellij IDEA

1. Open your Kotlin source file (BooleanSize.kt in this example).
2. Goto "Tools" --> "Kotlin" --> "Show Kotlin Bytecode"

Note: Make sure you have “Kotlin to Java Decompiler” plugin installed.

To confirm this do,

A. Goto "Help" --> "Find Action" --> type "Plugins" and open the Plugins Marketplace.

B. Switch to "Installed" tab and in the list of plugins installed, confirm the above decompiler plugin. 

Once you show the bytecode, it should look like below :

// ================com/shubhamaher/hellokotlin/BooleanSizeKt.class =================
// class version 50.0 (50)
// access flags 0x31
public final class com/shubhamaher/hellokotlin/BooleanSizeKt {


  // access flags 0x19
  public final static main()V
   L0
    LINENUMBER 4 L0
    ICONST_1
    ISTORE 0
   L1
    LINENUMBER 5 L1
    RETURN
   L2
    LOCALVARIABLE booleanTrue Z L1 L2 0
    MAXSTACK = 1
    MAXLOCALS = 1

  // access flags 0x1009
  public static synthetic main([Ljava/lang/String;)V
    INVOKESTATIC com/shubhamaher/hellokotlin/BooleanSizeKt.main ()V
    RETURN
    MAXSTACK = 0
    MAXLOCALS = 1

  @Lkotlin/Metadata;(mv={1, 1, 15}, bv={1, 0, 3}, k=2, d1={"\u0000\u0008\n\u0000\n\u0002\u0010\u0002\n\u0000\u001a\u0006\u0010\u0000\u001a\u00020\u0001\u00a8\u0006\u0002"}, d2={"main", "", "hellokotlin"})
  // compiled from: BooleanSize.kt
}


// ================META-INF/hellokotlin.kotlin_module =================
,
com.shubhamaher.hellokotlin
BooleanSizeKt

Disassembly Explanation

In above kotlin/java bytecode asm disassembly, the main section is the instructions used in “public final static main()V” i.e. “ICONST_1” and “ISTORE 0” followed by a “RETURN”.

Note that everything after end of L1 section, is just verbose information given as part of the disassembly.

So internally in the JVM, an “int” constant with value “1” (true) is pushed on the operand stack by “ICONST_1” and further stored on the local variables array at index 0 by “ISTORE 0”.

Observation

This tells us that the JVM “int” data type with value 0/1 is used internally for representing false/true value of a boolean, respectively.

For more information on the internal JVM instruction set and its working(Local variables array, Operand stacks, Method references, Constant pool, etc.), the JVM Specification is helpful esp. the Frames section.

JVM Spec documentation

In “the boolean type” section of the above JVM Spec, it is documented clearly as :

There are no Java Virtual Machine instructions solely dedicated to operations on boolean values. 

Instead, expressions in the Java programming language that operate on boolean values are compiled to use values of the Java Virtual Machine int data type.

Fun fact

Therefore the bytecode generated for above Kotlin source code using Boolean data type(BooleanSize.kt) will be same as the bytecode that will be generated for below Kotlin code snippet using the Int data type :

package com.shubhamaher.hellokotlin

fun main() {
    var integer : Int = 1
}

In summary, bytecode for

var integer : Int = 1

is same as bytecode for

var booleanTrue : Boolean = true

Verdict

This behaviour of treating Boolean as int is same since very long which I had covered more than a decade back in the Java source-specific reversing blogpost. In fact, this Kotlin source-specific blogpost is a revisit to the same study of boolean size in JVM because the Kotlin source is targeted to compile into Java bytecode.

The reason for treating a boolean as int, is simply the basic fact that the minimum possible atomic size of addressable data that can be stored or worked upon, in general for computation at processor level itself, is a byte (8 bits) and not a bit (1 bit).

And hence this is the reason a boolean cannot be treated in an optimum way practically as a single bit even if it theoretically represents at the most only two values(false/true) represented by a single bit change (0/1).