The Java .class file format - Originally designed to only support the Java source language - designed in a different era: 1993-1995 - evolved (by addition) somewhat over the intervening years - serves multiple use cases: - container for executable application (most relevant to this class) - archive of compiled libraries (closed source era) - distribution format (e.g. early Web had Java applets) - upgrades and additions - version 5: additional metadata to support Java source language generics - version 6: stackmaps: help speed up bytecode verification annotations: additional metadata attached to classes and methods - version 7: invokedynamic: add more support for dynamic languages - most Java source language changes were accomplished with little or no classfile changes - inner and anonymous classes - many languages now compile *into* JVM classfiles - Kotlin - a Java-like language by JetBrains - Groovy - dynamically-typed scripting language - Scala - advanced functional/object-oriented language - Clojure - Lisp dialect - Virgil - instructor's language project, first bootstrap - big-endian: fixed-sized integers for most quantities - primarily because of Sparc CPU architecture - Single .class file contains only one class - Contains source-level names (though mangled) - Semantically high-level (methods, objects), but binary - names are significant to the runtime semantics - used as a distribution format for closed-source code - stack machine for operands - jumps, conditinal branches, two kinds of switches for control flow - Detailed description: https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.16.1 - Overall structure of a Java class file - magic number (0xCAFEBABE) - version numbers (major/minor) - Constant pool: contains strings, numbers, types, names - access flags: ACC_PUBLIC ACC_FINAL ACC_SUPER ACC_INTERFACE ACC_ABSTRACT ACC_SYNTHETIC ACC_ANNOTATION ACC_ENUM - class name: reference to a string - super class name: reference to a string - interfaces implemented: array of references to strings - fields declared: name, type, and annotations - methods declared - additional attributes: room for extensibility - Types in JVM class files - types are described by *mangled* strings in class files - primitives: B = byte C = char D = double F = float I = int J = long S = short Z = boolean - arrays: [ Type - classes/interfaces: L ; - method descriptors "(" Type* ")" (Type | "V") - signatures - encode Java source-level (generic) types - not used for execution by the VM[1], but instead for source compilation - allows programming against a closed-source library that has generics - signatures can be mapped to descriptors via *erasure* - [1] except for reflection - Constant pool entries - consists of an identifying byte, followed by kind-specific data CONSTANT_Class 7 name_index: u16 CONSTANT_Fieldref 9 class_index: u16, name_and_type: u16 CONSTANT_Methodref 10 class_index: u16, name_and_type: u16 CONSTANT_InterfaceMethodref 11 class_index: u16, name_and_type: u16 CONSTANT_String 8 string_index: u16 // must be a utf-8 CONSTANT_Integer 3 bytes: u32 CONSTANT_Float 4 bytes: byte[4] CONSTANT_Long 5 bytes: u64 CONSTANT_Double 6 bytes: byte[0] CONSTANT_NameAndType 12 name: u16, type: u16 CONSTANT_Utf8 1 length: u16, bytes: byte[length] CONSTANT_MethodHandle 15 reference_kind: u8, reference_index: u16 CONSTANT_MethodType 16 descriptor_index: u16 CONSTANT_InvokeDynamic 18 bootstrap_method_index: u16, name_and_type_index: u16 - Field declarations - name: string - access flags: ACC_PUBLIC ACC_PRIVATE ACC_PROTECTED ACC_STATIC ACC_FINAL ACC_VOLATILE ACC_TRANSIENT ACC_SYNTHETIC ACC_ENUM - type: descriptor (field type) - attributes: array of attributes, which can include annotations - Method declarations - name: string - access flags: ACC_PUBLIC ACC_PRIVATE ACC_PROTECTED ACC_STATIC ACC_FINAL ACC_SYNCHRONIZED ACC_BRIDGE ACC_VARARGS ACC_NATIVE ACC_ABSTRACT ACC_STRICT ACC_SYNTHETIC - descriptor: method signature - attributes - Attributes for methods and fields ConstantValue - used for public static final constants to switch on Code - contains actual bytecode implementation of a method StackMapTable - stack maps used to speed up verification Exceptions - for "throws" clauses in Java source declarations InnerClasses - lists inner classes for source programming EnclosingMethod - for anonymous inner classes Synthetic - means it was generated by the compiler..because reasons Signature - (generic) source signature SourceFile - for exception backtraces SourceDebugExtension LineNumberTable - for exception stacktraces LocalVariableTable - for debugging, names of local variables LocalVariableTypeTable - for debugging, source types of local variables Deprecated - marker for methods that have been removed from an API RuntimeVisibleAnnotations - annotations which can be reflected on RuntimeInvisibleAnnotations - annotations that cannot be reflected on RuntimeVisibleParameterAnnotations RuntimeInvisibleParameterAnnotations AnnotationDefault BootstrapMethods - part of invokedynamic Code_attribute { u2 attribute_name_index; u4 attribute_length; u2 max_stack; u2 max_locals; u4 code_length; u1 code[code_length]; u2 exception_table_length; { u2 start_pc; u2 end_pc; u2 handler_pc; u2 catch_type; } exception_table[exception_table_length]; u2 attributes_count; attribute_info attributes[attributes_count]; } Exceptions_attribute { u2 attribute_name_index; u4 attribute_length; u2 number_of_exceptions; u2 exception_index_table[number_of_exceptions]; } InnerClasses_attribute { u2 attribute_name_index; u4 attribute_length; u2 number_of_classes; { u2 inner_class_info_index; u2 outer_class_info_index; u2 inner_name_index; u2 inner_class_access_flags; } classes[number_of_classes]; } Only method bodies can contain bytecode instructions: Groups - misc wide (prefix) nop - constants aconst_null dconst_ fconst_ lconst_ bipush ldc ldc_w ldc2_w sipush - local variables aload aload_ astore astore_ dload dload_ dstore dstore_ fload fload_ fstore fstore_ iload iload_ istore istore_ lload lload_ lstore lstore_ iinc - stack manipulation dup dup_x1 dup_x2 dup2 dup2_x1 dup2_x2 swap pop pop2 - control flow if_acmp if_icmp if ifnonnull ifnull goto goto_w lookupswitch tableswitch - calls / return areturn dreturn ireturn freturn lreturn return invokedynamic invokeinterface invokespecial invokestatic invokevirtual - class/object operations athrow getfield getstatic instanceof checkcast monitorenter monitorexit new putfield putstatic - array operations aaload aastore anewarray multianewarray newarray arraylength baload bastore caload castore daload dastore faload fastore iaload iastore laload lastore saload sastore - int operations iadd iand imul ineg ior irem ishl ishr isub iushr ixor ladd land lcmp ldiv lmul lneg lor lrem lshl lshr lsub lushr lxor - float operations dadd dcmp ddiv dmul dneg drem dsub fadd fcmp fdiv fmul fneg frem fsub - conversions d2f d2i d2l f2d f2i f2l i2b i2c i2d i2f i2l i2s l2d l2f l2i -- DEMO -- Compiling/executing Java Making jar files A.java Main.java Constants.java Outer.java Anon.java Generics.java