jmark<title> <center> <h1> jmark</h1> <h3>Reviewed by: Qing Ju</h3> </center> <h2>Abstract</h2> "jmark" is a java-based tool that discourages program theft by embedding Java programs with a digital watermark. Embedding a program developer's copyright notation as a watermark into Java class files will ensure the legal ownership of each class file. The developers have the goal of not reducing the execution efficiency and surviving multiple kinds of attacks that attempt to erase watermarks. <h2>Introduction</h2> Java programs distributed through Internet are now suffering from program theft. It is because Java programs can be easily decomposed into reusable class files and even decompiled into source code by program users (see Figure 1). "jmark" aims to easily identify illegal Java programs containing unlicensed class files. jmark provides a practical way for encoding and decoding a digital watermark into/from Java class files. The embedding method using in "jmark" is indiscernible by program users, yet enables us to identify an illegal program that contains stolen class files. <img src="figure1.jpg" border=2> "jmark" was developed by Akito Monden, Masatake Yamato and Haruaki Tamada in Nara Institute of Science and Technology. "jmark" is an open source tool that anyone can download and use. It was originally developed in 2002, and the latest version 1.4.4 was released in October 2003. <h2>Installation</h2> "jmark" is available for download at <a href="http://se.aist-nara.ac.jp/jmark/">http://se.aist-nara.ac.jp/jmark/</a> as a tar archives including source code (jmark-1.4.4.tar.gz). The archive contains two tools: "jmark" and "jdecode". "jmark" injects a watermark into a Java classfile. "jdecode" decodes a watermark from a Java classfile. Users should at first inject a dummy method into users' Java source code, and compile it, then use the "jmark" to inject users' watermark into the dummy method. Examples of dummy method are in 'dummy' sub-directory of the archive. Binary files for win32 console environment are in 'win32bin' sub-directory. The programs run on some old version of UNIX systems, MS-Windows, MS-DOS, etc. But it's not working on the current version of Fedora or windows system. HOW TO COMPILE: <xmp>configure</xmp> <xmp>make</xmp> <h2>Usage</h2> Users should at first add a dummy method into your Java source code manually, and compile it, then use the "jmark" to inject their own watermark into the dummy method (to-be-watermarked method).(see Figure 2) </img><img src="figure2.jpg" border=2></img> Suppose users have a "TEST0.java" file to be watermarked. <a href="TEST0.java/">"TEST0.java"</a> contains 12 methods, and, the method No.10 is the dummy method(actually never invoked). Then users need to compile "TEST0.java" to "TEST.class". Below users can encode "(C) USER NAME 2008" (this is the watermark phrase) into the dummy method. Users also have to specify a key phrase "MyKey" to determine a bit assignment rule -- a mapping between character sequence (watermark phrase) and bit sequence (in Java byte code). Only the person who knows the key phrase (this case "MyKey") can decode the watermark correctly. <xmp>%jmark TEST0.class 10 c" -k"MyKey" <--- Watermark encoding></xmp> <xmp>#classfile: TEST0.class</xmp> <xmp>#method: 10</xmp> <xmp>#watermark: "(C) USER NAME 2008"</xmp> <xmp>#key: "MyKey"</xmp> <xmp>%jdecode TEST0.class -k"MyKey" <--- Watermark decoding</xmp> <xmp>#classfile: TEST0.class</xmp> <xmp>#key: "MyKey"</xmp> <xmp>#begin{watermark}</xmp> <xmp>1 "X(6B5"</xmp> <xmp>2 "))BQBQB88IQB88IQJCJE7EQ QJ"</xmp> <xmp>3 "X(2"</xmp> <xmp>4 "J1RIAR"</xmp> <xmp>5 ""</xmp> <xmp>6 ""</xmp></xmp> <xmp>7 "X(P6HXVN"</xmp> <xmp>8 " (H6BX8,"</xmp> <xmp>9 "MB28"</xmp> <xmp>10 "(C) USER NAME 2008" <--- Watermark decoded from method no.10</xmp> <xmp>11 "X(P6HXCE"</xmp> <xmp>12 "XR7RIRXRQIBI"</xmp> <xmp>#end{watermark}</xmp> Instead of specifying the to-be-watermarked method (dummy method) by its index number, users can also specify the to-be-watermarked method by its name. <xmp>%jmark TEST0.class check_std "(C) USER NAME 2008" 每k"MyKey" <--- "check_std" is the name of the to-be-watermarked method.</xmp> ALGORITHM OPTION: -a0: Default watermarking algorithm -a1: Redo encoding a character when overwriting operands -a2: Not using "Replacing opcodes" (Using only "overwriting operands") Algorithm a2 is more resistant to obfuscation attacks than a0 and a1. <h2>Internals</h2> ENCODING PROCEDURE The watermark encoding procedure of jmark consists of the following three phases: (Phase 1) Dummy method injection In the first phase of watermarking, a dummy method (of a class), which will never be executed, is appended to a target Java source program. This dummy method is a space for watermark codeword. This dummy method should have enough size for watermark injection. Next thing we should do is to append a dummy method invocation to the source program. The invocation statement could be like this: <xmp>if (Condition)</xmp> <xmp>Dummy_Method(); </xmp> "Condition" is an expression that will never become true. So, actually, dummy method is never invoked. If this expression (formula) is complex enough, it is difficult for program users to become aware of the dummy method. Since large programs originally contain many methods that are rarely executed, it is not easy for program users to locate the dummy method. Since program thieves might decompile and read the class file, the developer should carefully write a dummy method that does not seem to be a dummy. (Phase 2) Compilation In the second phase, the Java source program, in which dummy method was injected, is compiled with a Java compiler. (Phase 3) Java will check the syntactical correctness and type consistency of Java class with bytecode verifier (See Figure 3). <img src="figure3.jpg" border=3></img> In order to keep the syntactical correctness and type consistency, the algorithm uses following two approaches: (i) Overwriting numerical operands One simple way to keep syntactical correctness is to limit the place to overwrite. A numerical operand of an opcode that pushes a value to the stack, and of an opcode that increases a value on a stack, can be overwritten without syntactical incorrectness and type inconsistency. For example, an operand 'xx' of the opcode 'iinc xx' and'bipush xx' can be overwritten into any single byte. Figure 4 shows an example of numerical operands that can be overwritten. <img src="figure4.jpg" border=3></img> (ii) Replacing opcodes In order to increase the place for watermark injection, the algorithm replaces some of the opcodes, such as iadd, ifnull, and iflt, into other kind of opcode. For example, the opcode iadd can be replaced to anything among isub, imul, idiv, irem, iand, ior, and ixor. By using this ability of mutual replacement, 3 bits information is encoded into these opcodes. For example, users may assign 0002 to add, 0012 to isub, 0102 to imul, ..., and 1112 to ixor. Whichever the above opcode appeared in the dummy method, users will replace them into one of the above eight opcodes according to the bits they want to encode (see Figure 5). <img src="figure5.jpg" border=3></img> Such a bit assignment and an opcode replacement can be also applied in other opcodes(see Figure 6). In case users want to encode a sentence, such as "ABC", first they need to translate the sentence into a bit sequence, then encode the bit sequence into the program (see Figure 7). <img src="figure6.jpg" border=3></img> <img src="figure7.jpg" border=3></img> DECODING PROCEDURE The decoding algorithm is very simple. The algorithm simply does the exactly opposite procedure of watermark injection procedure, from top of every method (see Figure 8). This decoding procedure can be automated, so that even in case only a part of a program was stolen and was built into other program, the watermark is easily decoded wherever it may exist in the program. <img src="figure8.jpg" border=3></img> <h2>Evaluation</h2> 1. Experiments The developers tried two kinds of attacks, (1) obfuscator attack, and (2) decompile-recompile attack (Figure 9). For obfuscator attack, they used SourceGuard version 2.0, which is widely used as one of the strong obfuscators, to each class file. For decompile-recompile attack, they applied Mocha, the first and most widely known decompiler, to each class file, and getting the source codes of them. <img src="figure9.jpg" border=2></img> After obfuscator attack, all watermarks were decoded correctly. This is because obfuscators at that time (2002) generally translate symbols such as variable name and method name, which will not affect operands or opcodes in the method. Nowadays, however, obfuscator can obfuscate instructions or literal numbers, which means jmark probably can't survive obfuscator attack any more. For the decompile-recompile attack, the developers randomly chose 10 source files from sample Java applets in JDK 1.2. And they injected 23 methods as dummy methods. After the attack, the number of watermarks erased by decompile-recompile attack was 3 out of 23 (see Figure 9). The result showed that the decompile-recompile attack does not always succeed; and, even if it was succeeded, more than half of the watermarks (5 out of 8) was not erased. The authors then suggested that users will be able to protect their class files from decompile-recompile attack by injecting more than two watermarks into each class file.Since decompiler performs badly for the large class file, jmark is working better for large class files under decompile-recompile attack. <img src="figure10.jpg" border=2></img> 2. Improvement Introducing some random dummy method into the program is not a good idea, since it might make the dummy method look vastly different from the rest of the code. An obvious improvement (suggested by Myles et al.) is to instead embed the mark in a copy of a function already in the program. This should greatly increase stealth. Even with this improvement the resulting code is still likely to be unstealthy. Changing literal number will result in some unusual number ( other than -1, 0, 1), which might be captured by the attackers. In addition, not all arithmetic instructions occur with the same frequency (additions are much more common than divisions, for example). So if the attackers see much more divisions than normal, they will know the secrets. <h2>References</h2> <ul> <li> <a href="http://se.aist-nara.ac.jp/jmark/">http://se.aist-nara.ac.jp/jmark/</a> <li> Akito Monden, Hajimu Iida, Ken-ichi Matsumoto, Katsuro Inoue and Koji Torii, "A practical method for watermarking Java programs," The 24th Computer Software and Applications Conference (compsac2000), pp. 191-197, Taipei, Taiwan, Oct. 25-27, 2000. <li> Akito Monden, Hajimu Iida, Ken-ichi Matsumoto, Katsuro Inoue and Koji Torii, "Watermarking Java programs", International Symposium on Future Software Technology'99 (ISFST'99), pp.119-124, Nanjing, China, Oct. 1999. <li> Christian Collberg, Ginger Myles and Jasvir Nagra."Surreptitious Software", draft version, August, 2006 <li> Ginger Myles, Christian Collberg, Zachary Heidepriem, and Armand Navabi. The evaluation of two software watermarking algorithms. Software: Practice and Experience, 35(10):923每938, 2005. </ul> </body> </html>