From 156e3d3c980562685a77dec375334578213f1ec9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matthias=20G=C3=BCdemann?= Date: Tue, 21 Aug 2018 16:43:14 +0200 Subject: [PATCH 1/2] Document class file content for JBMC / java_bytecode readme --- jbmc/src/java_bytecode/README.md | 83 +++++++++++++++++++++++++++++++- 1 file changed, 82 insertions(+), 1 deletion(-) diff --git a/jbmc/src/java_bytecode/README.md b/jbmc/src/java_bytecode/README.md index b590319830..c6ac67eb5c 100644 --- a/jbmc/src/java_bytecode/README.md +++ b/jbmc/src/java_bytecode/README.md @@ -25,7 +25,88 @@ To be documented. \subsection java-class-section How a java program / class is represented in a .class -To be documented. +Every Java class is compiled into a .class file. Inner classes, anonymous +classes or classes for tableswitches are also compiled into their own .class +files. + +There exists an [official +specification](https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html) + +Each class files contains information about its version, the constant pool, +information about the contained class, its super-class, its implemented +interfaces, its fields, methods and finally additional attributes, such as +information about lambda functions used in the methods of the class or inner +classes the it contains. + +The content of a .class file can be inspected via the `javap` tool which is part +of the JDK installation. Useful options are `-c` for code, `-p` to display +protected / private methods `-l` for line numbers and local variable +information, as well as `-verbose` which prints a lot of additional information. + +In general, all variable length entries in a class file are represented by first +having an integer `n` that specifies the number of entries and then an array of +`n` such entries in the file. All variable length attribute also contain +information about their length in bytes. + +The integer format used in class files are unsigned integers of size 8/16/32 +bit, which are named as `u1`/`u2`/`u4`. + +\subsection Access flags + +The JVM specification defines different access flags, e.g., `final`, `static`, +`protected`, `private` etc. where different ones are applicable to the class +itself, its fields or methods. All access flags are represented as bits, the set +of bits that are defined for one entity is represented as disjunction of those +values. Each of these values is defined as a constant with a name prefixed with +`ACC_` in JBMC, e.g., as `#define ACC_PUBLIC 0x0001` or `#define ACC_ENUM +0x4000`. + +\subsection Constant Pool + +The constant pool contains all strings and referred values that are used in the +.class. This includes the names of the class itself and its super-class, as well +as the names and signatures of all fields and methods. All strings in the +constant pool are in UTF-16 format. + +\subsection Fields + +Each member variable of a class has a field entry with a corresponding field +information structure. This contains the name of the field, its raw JVM type +(called the descriptor) and an optional signature. + +A signature is an extension to the Java raw types and contains information about +the generic type of an object if applicable. + +The name of the field, the descriptor and the signature are all represented as +indices into the constant pool of the class file. + +\subsection Methods + +Methods are represented in a similar way as fields. Each method has an +associated name, descriptor and optional signature entry in the constant pool +table. + +An implemented method also has several attributes. One is the `Code` attribute +that stores the actual bytecode instructions. There is also an optional +`LocalVariableTable` which contains the names of local variables and +parameters. In addition to this there is also an optional +`LocalVariableTypeTable` that contains information about generic local variables +and parameters. Finally the exception table is defined as entries in the +`Exceptions` attribute. + +Note: most information about generic types is optional and exists mainly as +debugger information. This is because the Java compiler ensures that typing is +correct and creates code accordingly. The same holds true for information on +local variables. It is therefore advisable to compile Java projects with the +`-g` option that adds debugging information in all cases. + +\subsection Attributes + +The last section contains additional attributes, e.g., `SourceFile` which +specified from which source file the .class was compiled, `BootstrapMethods` +which is used for lambda functions or `InnerClasses` which refers to inner +classes of the class in question. + \section java-bytecode-runtime-exceptions Adding runtime exceptions (java_bytecode_instrument) From 4bfbb4835463e0051348b7f996a33603de2de76b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matthias=20G=C3=BCdemann?= Date: Sun, 26 Aug 2018 12:03:03 +0200 Subject: [PATCH 2/2] Correct section entries --- jbmc/src/java_bytecode/README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/jbmc/src/java_bytecode/README.md b/jbmc/src/java_bytecode/README.md index c6ac67eb5c..5b13ce8c7a 100644 --- a/jbmc/src/java_bytecode/README.md +++ b/jbmc/src/java_bytecode/README.md @@ -51,7 +51,7 @@ information about their length in bytes. The integer format used in class files are unsigned integers of size 8/16/32 bit, which are named as `u1`/`u2`/`u4`. -\subsection Access flags +\subsection java-class-access-flags Access Flags The JVM specification defines different access flags, e.g., `final`, `static`, `protected`, `private` etc. where different ones are applicable to the class @@ -61,14 +61,14 @@ values. Each of these values is defined as a constant with a name prefixed with `ACC_` in JBMC, e.g., as `#define ACC_PUBLIC 0x0001` or `#define ACC_ENUM 0x4000`. -\subsection Constant Pool +\subsection java-class-constant-pool Constant Pool The constant pool contains all strings and referred values that are used in the .class. This includes the names of the class itself and its super-class, as well as the names and signatures of all fields and methods. All strings in the constant pool are in UTF-16 format. -\subsection Fields +\subsection java-class-fields Fields Each member variable of a class has a field entry with a corresponding field information structure. This contains the name of the field, its raw JVM type @@ -80,7 +80,7 @@ the generic type of an object if applicable. The name of the field, the descriptor and the signature are all represented as indices into the constant pool of the class file. -\subsection Methods +\subsection java-class-methods Methods Methods are represented in a similar way as fields. Each method has an associated name, descriptor and optional signature entry in the constant pool @@ -100,7 +100,7 @@ correct and creates code accordingly. The same holds true for information on local variables. It is therefore advisable to compile Java projects with the `-g` option that adds debugging information in all cases. -\subsection Attributes +\subsection java-class-attributes Attributes The last section contains additional attributes, e.g., `SourceFile` which specified from which source file the .class was compiled, `BootstrapMethods`