Update: The vulnerability has been partially fixed; however, the patch unintentionally removes desired functionality as stated here.
Not too long ago, I was casually perusing my way through GitHub and found the Threadtear Java bytecode deobfuscator. For those of you who don’t know, I have a strong interest for Java bytecode-related projects — especially when deobfuscation and obfuscation are involved.
There’s a warning on the README.md of the repository which specifically informs the user it is possible to successfully execute arbitrary code through the deobfuscator for malicious purposes. So of course, I decided to take up the challenge and create a proof of concept of an ACE exploit in Threadtear.
The intent of this post is to show two main security issues to keep in mind whenever attempting to deobfuscate anything compiled to Java bytecode.
You should never dynamically load (or even load for that matter) classes into the JVM and invoke them with reflection without checking for malicious code first.
Preventing arbitrary code execution is basically impossible which essentially defeats the entire purpose of a deobfuscator.
Doing so exposes the deobfuscator instance to the program being deobfuscated which can modify its behaviour so the deobfuscator processes and transforms the program incorrectly.
You should never, ever rely on the SecurityManager functionality of the JVM as a fullproof sandbox.
Recon
The main class of the deobfuscator makes a call to System.setSecurityManager(new VMSecurityManager()) so we obviously want to take a look at what VMSecurityManager does.
Taking a look at the source of VMSecurityManager shows that it pretty much prevents us from doing most of the fun stuff (making URL connections, IO operations, etc.) but the most relevant part is the checkPermission(Permission) method.
@OverridepublicvoidcheckPermission(Permissionperm){if(perm.getName().equals("setSecurityManager")){// check if invoked in main classif(!Thread.currentThread().getStackTrace()[4].getClassName().equals(Threadtear.class.getName())){thrownewSecurityException(MSG);}}}
So here, the only actually prevented action is just overwriting the Security Manager. So to get a full, unrestricted ACE all we have to do is figure out a way to get around this particular restriction and kill the Security Manager.
Killing the SecurityManager
First, we need to get rid of the Security Manager so that we can do pretty much whatever we want. Unfortunately, the JVM deliberately prevents us from using Java’s Reflection API to do this for us. However, we can get around this inconvenience using Java’s Unsafe API.
The Unsafe API allows the programmer to perform memory-unsafe operations. Java by default, is a memory-safe programming language which prevents many people from making mistakes that you typically see of a C or C++ programmer (dangling pointers, forgetting to free allocated memory, calling free() twice on memory allocated to the heap, etc.).
Now, we simply need to locate the base address of where the static fields are located (java.lang.System stores the Security Manager in a static field).
// Note that this will not work on Java 9+ due to the removal of this methodObjectsystemBase=unsafe.staticFieldBase(System.class);
Finally, we just search for the instance of the SecurityManager starting from systemBase!
We have successfully killed the Security Manager and can now execute pretty much whatever we want!
So instead of relying on the Security Manager to do the dirty sandboxing for you, what should you do instead? The JVM is full of nooks and crannies which can be taken advantage of for the purpose of malicious code execution. The easiest way to prevent arbitrary code executions while still working with potentially malicious code is to utilize a whitelist which allows only certain actions to happen which have been deemed safe. One of the easiest ways to achieve said whitelist, is through emulation of the bytecode.
One theorem that I have learned in algebra and always forget the statement(s) of is the famous Nakayama’s Lemma. A quick Google search shows that I am probably not the only the person who has this issue. In any case, there is a particularly nice case of Nakayama’s Lemma that I use to reconstruct the more general version via quiver representation theory that I thought would be cool to write about.
When someone says “Nakayama’s Lemma,” there are at least ten different statements that I am somewhat aware of (I can never remember them anyway!) that come to mind. I will give the one that I have seen stated the most (not just in the land of representation theory). For this entire section, we will let $R$ be a (not necessarily commutative) unital ring.
[definition=%counter% (Jacobson radical)] The Jacobson radical $\rad R$ of $R$ is the intersection of all the maximal right ideals of $R$. In the special case where $R = 0$, we define $\rad R = 0$. [/definition]
As it turns out, $\rad R$ is a two-sided ideal. Not only that, but the left and right-Jacobson radicals coincide (hence, why we just say the Jacobson radical).
[theorem=%counter%] The left and right Jacobson radicals coincide. In particular, the Jacobson radical is a two-sided ideal of $R$. [/theorem]
Now we state Nakayama’s Lemma:
[theorem=%counter% (Nakayama’s Lemma)] Let $J \subseteq R$ be a right ideal contained in $\rad R$ and $M_R$ be a finitely-generated right $R$-module. If $MJ = M$, then $M = 0$. [/theorem]
Quivers Representations and Path Algebras
While I have no idea what the geometric intuition for Nakayama’s Lemma is, I am much more familiar with how the statement can be understood from the representation theory of finite-dimensional algebras via quivers (direct graphs with arcs of multiplicity allowed).
[definition=%counter% (Quivers)] A quiver $Q$ is an ordered quadruple $(Q_0, Q_1, s, t)$ where
$Q_0$ is the set of vertices.
$Q_1$ is the set of arrows (arcs).
$s:Q_1 \to Q_0$ is the “source” map that sends any arrow to its tail.
$t:Q_1 \to Q_0$ is the “target” map that sends any arrow to its head. [/definition]
[definition=%counter% (Path algebra)] Let $k$ be an algebraically closed field. Then $kQ$ is a $k$-algebra, called the path algebra of $Q$, with basis labeled by the (oriented) paths in $Q$ and multiplication given by path concatenation. [/definition]
In the case of acyclic quivers, it turns out that it is extremely easy to describe the Jacobson radical.
[theorem=%counter% (Jacobson radical is spanned by arrows)] Let $Q$ be an acyclic quiver. Then $\rad kQ$ is the two-sided ideal generated by all the arrows. [/theorem]
In the more general case where we have a quiver with relations (i.e. a path algebra modded by an admissible ideal), this statement is also true.
Now we turn to the other side of the picture: quiver representations.
[definition=%counter% (Representations of quivers)] A representation $V$ of the quiver $Q$ is an assignment of a vector space $V(i)$ to each vertex $i \in Q_0$ and a $k$-linear map $V(\alpha):V(s\alpha) \to V(t\alpha)$ to every arrow $\alpha:i \to j$ in the quiver $Q$.
We say that $f:V \to W$ is a morphism of the representations $V$ and $W$ of $Q$ provided that $f$ assigns a $k$-linear map to each vertex of $Q$ so that the diagram
commutes for every arrow $\alpha$ of $Q$.
Under this construction, we obtain the abelian $k$-category $\rep Q$ where the objects are the representations above and the morphisms are the morphisms defined above. [/definition]
A fundamental theorem of the representations of quivers is that representations of quivers naturally correspond to f.g. modules of the path algebra. Accordingly, we can construct all of the f.g. modules of various algebras by literally drawing pictures and contravariantly associating vector spaces and linear transformations between them to the associated quiver.
[theorem=%counter%] The category $\rep Q$ is equivalent to $\rmod{kQ}$ where $\rmod{kQ}$ is the category of finitely generated right $kQ$-modules. [/theorem]
The functor $\mathcal{G}:\rep Q \to \rmod kQ$ that takes representations to modules over the path algebra is the interesting one (the converse functor tells us these are the only ones we really care about). In particular, we are interested in how it maps objects. Here’s the construction.
Given a representation $V$ of $Q$, we define $\mathcal{G}(V) = \bigoplus_{i\in Q_0} V(i)$. Then we define a $kQ$-action on $\mathcal{G}(V)$ as follows: Let $p \in kQ$ be a path and $v \in V(i)$. Then
[vp = \begin{cases} V(p)(v) & \text{ if } sp = i, 0 & \text{ otherwise} \end{cases}]
where $V(p)$ is the composition of the $V(\alpha_j)$’s where $p = \alpha_1 \cdots \alpha_r$ is the arrows making up the path $p$. Furthermore, $vp$ sits in the $ti$-component of $\mathcal{G}(V)$.
Nakayama’s Lemma for Path Algebras
Now we bring back Nakayama’s Lemma and apply it to a path algebra. Under everything we mentioned above, we can construct a simple example from which Nakayama’s Lemma is actually completely trivial.
That is, the $\CC Q$-action on $M$ just applies the corresponding maps in $V$ and then pushes the result along the arrows (which is expected due to the construction of the naturality of the equivalence). The fact that the action keeps “pushing” things along the arrows means that any ideal $J$ sitting inside $\rad \CC Q = (\alpha, \beta, \alpha\beta)$ necessarily cannot have the property that $MJ = M$ as $Q$ is acyclic. So Nakayama’s Lemma is actually a completely trivial statement here. This idea generalizes as well — the lack of cycles prevents us from “landing back where we started” (i.e. $MJ = M$).
So whenever I need to reconstruct the statement of Nakayama’s Lemma, I just think about examples like this one and note that $M \neq 0$, of course, implies that any ideal $J$ sitting inside $\rad \CC Q$ is not gonna result in $MJ = M$ for acyclic $Q$. So the general statement then is fairly immediate.
When I first learned about signed measures, I found the idea pretty confusing. I was given the definition verbatim out of Folland’s book, but never actually understood why we would care about such a thing in the first place. What’s an example problem that we would want to solve via signed measures? Sure, Folland says that we can classify the signed measures via Radon-Nikodym, but why should we care about that at all? I spent a bit of time pondering this recently and I came up with an answer that I think is somewhat satisfactory to me.
Obviously, there are no shortages of applications of integrals. What actually sticks out to me so much in integration, in particular, is the Dirac delta function, a mysterious notational convenience to say the following:
[\int_{-\infty}^\infty f(x) \delta(x - a) \dd{x} = f(a).]
The Dirac delta is ubiquitous throughout applications in that it is precisely the right tool to model point masses and instantaneous changes in a function and how that would affect integration. That being said, the above is notation. A natural mathematical question is to ask how it connects back to our usual intuition of integrals? What does it mean to differentiate these things (or, for that matter any kind of limiting operation)? Classically, we know there is no connection apriori. Additionally, we come across expressions like
[A\delta(x - a) + Bf(x)]
where $A$ and $B$ are constants, in the wild (for example, we can think of these as random variables — one that is discrete and the other absolutely continuous) and a natural question is how to interpret such a thing in a formal setting. Under the much more general measure theoretic framework, we can make the relationship precise. This is a useful perspective because then we unlock the much more powerful tools of measure theory if we, say, want to talk about limiting-based operations on these kinds of sums.
Signed Measures
The naive way of thinking of measures is that measures give us the length/area/volume of a set interpreted in an appropriate sense. We are taught in calculus that the appropriate tool for these kinds of problems is the Riemann integral. Shortly after that, we are told that signed area is a thing (which has a variety of applications, negative rates of change or densities, for example). Thus, a natural generalization of a measure is to consider a more general “measure” of the form
[\nu(E) = \int_E f(x) \dd{x}]
where $E \subseteq \RR$. In general, $f$ need not be nonnegative so we call this more “general measure” a signed measure. But these are not the only kind of signed measures! For example, we know that there are signed measures of the form
[\nu(E) = \int_E (\delta(x - a) + f(x)) \dd{x}]
in the wild as well! Following physicist intuition, the latter integral is obviously just
[\nu(E) = \int_E\delta(x - a) \dd{x} + \int_E f(x) \dd{x}.]
The first “integral” is pretty imprecise — we know it’s just a notational convenience at the naive level. But this makes it abundantly clear that this more general case, if we formalize it, includes strange-looking expressions like the above. By the Radon-Nikodym theorem, it turns out that these two examples are actually the only ones we actually care about (in some sense).
From here on out, let us agree to use the definition of a signed measure as given:
[definition=%counter% (signed measures)] A signed measure on the measurable space $(X, \mathcal{M})$ is a function
[\nu : \mathcal{M} \to [-\infty, \infty]]
such that:
$\nu(\emptyset) = 0$;
$\nu$ assumes at most one of $-\infty$ or $+\infty$;
if $(E_n)\subseteq \mathcal{M}$ is disjoint, then we have the following where the series converges absolutely provided the LHS is finite:
Furthermore, let us agree that it is much more desirable to work with absolutely convergent series (due to the Riemann rearrangement theorem) so that is a reasonable request to make to avoid pathologies.
Structure Theorem: Radon-Nikodym
Once we have defined some kind of object (in this case, signed measures), a natural question is the structure of said objects. Motivated by the examples above, we construct a measure that is intended to replicate the effect of the Dirac delta.
[definition=%counter% (Dirac measure)] Let $X$ be a nonempty set and fix some $a \in X$. Then we define the Dirac measure at $a$ to be the mapping
[\begin{align} \mu : 2^X &\to [0, \infty] E &\mapsto \begin{cases} 1 & \text{ if } a \in E, 0 & \text{ otherwise}. \end{cases} \end{align}]
With this definition in mind, the following definition is natural comparing Lebesgue measure $m$ to the Dirac measure at $0$.
[definition=%counter% (mutually singular)] We say that the signed measure $\mu$ is singular with respect to $\nu$ if there are $E, F \in \mathcal{M}$ such that
$E \cap F = \emptyset$;
$E \cup F = X$;
$E$ is $\nu$-null and $F$ is $\mu$-null.
We denote this relationship as $\mu \perp \nu$ [/definition]
[example=%counter%] Let $\mu$ be the Dirac measure at $0$ on $\RR$ and $X = \RR$. Then we define the signed measure
[\nu(S) = \int_{S} \sin(x) \dd{m}(x)]
where $m$ is Lebesgue measure and $S$ is Lebesgue measurable. Setting $E = 0$ and $F = \RR\setminus 0$, it follows that $\mu$ is singular with respect to $\nu$. [/example]
It turns out that, in general, if we start off with $\sigma$-finite signed measure on $(\RR, \mathcal{L})$ (where $\mathcal{L}$ is the collection of Lebesgue-measurable sets), then there exists a unique decomposition
If we drop the Delta-like part from the expression for a second and focus on the so-called “calculus-like measure”, I really just mean that
[\nu(E) = \int_{E} f(x) \dd{x}]
for some suitable choice of $f$ that is reasonably computable via calculus. In that sense, note that this means that $\nu(E) = 0$ whenever $E$ is Lebesgue-null. This immediately motivates the following definition.
[definition=%counter% (absolutely continuous measures)] Let $\nu$ be a signed measure and $\mu$ a measure on the measurable space $(X, \mathcal{M})$. We say that $\nu$ is absolutely continuous with respect to $\mu$ provided that
[\mu(E) = 0 \quad\implies\quad \nu(E) = 0]
for any $E \in \mathcal{M}$. We denote this relationship by $\nu \ll \mu$. [/definition]
[remark=%counter%] No. I also have no idea why we use “$\ll$” as the notation for absolutely continuous. [/remark]
Before we state a general version of the Radon-Nikodym theorem that is given by Folland, we will instead give a special case for $\RR$ (since it can more easily be related to calculus that way) as follows.
[theorem=%counter% (Radon-Nikodym: special case, [Folland, Theorem 3.8])] Let $\nu$ be a $\sigma$-finite signed measure on $(\RR, \mathcal{L})$. Then there exists unique $\sigma$-finite signed measures $\lambda$ and $\rho$ on $(\RR, \mathcal{L})$ such that
$\lambda \perp m$;
$\rho \ll m$
$\nu = \lambda + \rho$.
Furthermore, there is a function $f:\RR \to \RR$, with positive or negative part integrable, such that
[\rho(E) = \int_{E} f \dd{m}]
for every $E \in \mathcal{L}$ and $f$ is Lebesgue a.e.-unique. [/theorem]
To make this statement more general, we replace Lebesgue measure $m$ with any $\sigma$-finite measure. This gives us the much more abstract (and impressive!) following statement.
[theorem=%counter% (Radon-Nikodym, [Folland, Theorem 3.8])] Let $(X, \mathcal{M})$ be a measurable space. If $\nu$ is a $\sigma$-finite signed measure and $\mu$ a $\sigma$-finite positive measure on $(X, \mathcal{M})$, then there exist unique $\sigma$-finite signed measures $\lambda$ and $\rho$ on $(X, \mathcal{M})$ such that
$\lambda \perp \mu$;
$\rho \ll \mu$
$\nu = \lambda + \rho$.
Furthermore, there is a function $f:X \to \RR$, with positive or negative part integrable such that
[\rho(E) = \int_{E} f \dd{\mu}]
for every $E \in \mathcal{M}$ and $f$ is $\mu$-a.e.-unique. [/theorem]
[example=%counter%] To keep things really concrete, let:
$(X, \mathcal{M}) = (\RR, \mathcal{L})$ and $\mu = m$ be Lebesgue measure;
$\lambda$ be Dirac measure at $0$ weighted by some function $g$; and
$\rho(E) \coloneqq \int_E f\dd{m}$ for all $E \in \mathcal{L}$, where $f(x) = \sin(x)$.
Then this basically more or less amounts to the statement:
[\nu([a, b]) = g(0) + \int_a^b \sin(x) \dd{x}]
for some $a < 0 < b$. Back in physicist land, we might say this is just the integral
[\int_a^b (\delta(x)g(x) + \sin(x)) \dd{x}.]
[/example]
There is also a natural extension of this theorem to complex measures (of interest due to Fourier theory).
References
[Folland] Real Analysis: Modern Techniques and Their Applications by Folland.
Not too long ago, I took a quick look at some of the activity that has been going on in the Java bytecode obfuscation/deobfuscation communities. So far, I have noticed that most of the same ideas since I have went inactive have remained the same:
Flow obfuscation:
opaque predicates;
reordering blocks through goto;
weird try-catch block flow;
callstack-sensitive keys used for branching;
CFG flattening;
complicate existing jumps;
etc.
Constant obfuscation:
encrypt strings via context-sensitive keys;
split numerical constants into a ton of arithmetic;
abuse constantdynamic;
etc.
Reference obfuscation:
abuse the Reflection API;
abuse invokedynamics;
proxying of method and field invocations;
changing all parameter types to java.lang.Object;
etc.
Exploits:
HTML-injection of vulnerable tools;
fake directories;
tool-specific crashers;
etc.
And the usual other stuff like class encryption and math obfuscation and whatnot.
Now by no means am I saying that the work done in these areas isn’t impressive. In fact, some of the recent work done in obfuscation and deobfuscation of Java bytecode is pretty impressive to me, especially considering how annoying (for both parties) it is to have to work with JVM verification. What did surprise me, however, I noticed that not a lot more effort has gone into novelizing ideas in obfuscation of constants (or maybe there has but I just didn’t look hard enough…). In any case, my goal in this post is to give some (meme) ideas for obfuscation of constants in hopes that people push that area a little bit more.
Current Methods of Constant Obfuscation (to my knowledge)
At this point, string encryption has realistically gotten about as advanced as we’re gonna have asides from people coming up with transformers that dynamically generate encryption/decryption routines. This is actually not that difficult to do, however, and there is plenty of example tooling that exists for this exact purpose. What has not really advanced, at least from my brief look, is obfuscation of numerical constants.
Generally speaking, the current approach of numerical constant obfuscation is to take the constant and replace it with a gigantic mess of arithmetic and bitwise operations that evaluate to the constant. Of course, there are reasons why people do not push this much further:
For the most part, arithmetic and especially bitwise are decently fast but there is visible slowdown prior to JIT-optimization (which may or may not happen) if the code is constant-heavy. More sophisticated approaches will result in worse slowdown.
Coming up with novel obfuscation techniques is actually difficult.
Constant obfuscation does not fair well against emulation and dynamic reverse-engineering attempts. This is also another reason why it’s not really worth the effort coming up with novel obfuscation techniques for constant obfuscation.
Proposed Idea
All three of the points I gave above are valid reasons not to invest (i.e. waste) time on constant obfuscation. Nonetheless, I want to propose something that, when applying sparingly and appropriately, may avoid just enough overhead to be useable. My proposition is to incorporate results from the mathematical field of numerical analysis to approximate solutions to problems where the approximate solutions are the constants of the problem.
Many problems in mathematics are really difficult and many numerical approximations are not the most obvious things to come up with. This makes static analysis more involved than usual because finding the constants by hand essentially involves knowing how to solve the problem (either symbolically or numerically). Furthermore, identifying numerical algorithms by their code is not necessarily an easy thing to do. The advantage of this is that one no longer really has to spend time coming up with a novel obfuscation idea but rather just looking at something like a partial differential equation and ways to numerically approximate its solution and boom, there’s a new obfuscation technique.
Example 1
Suppose we want to obfuscate the constants $3$, $200$, and $700$. A simple problem we might do is make them the eigenvalues of a matrix. One way one might do this is by starting with the matrix
Let us call this matrix $D$. We will then conjugate it by a nonsingular matrix that is not nearly singular (for numerical stability). For instance, I decided to make my nonsingular matrix $V$ be
By definition, the eigenvalues of this matrix are the constants we are looking for. The question, then is how do we find the eigenvalues of $A$. One thing to note is that $A$ is nonsymmetric, so we might consider using a method as described in Demmel’s Applied Numerical Linear Algebra chapter 4. The routines there are explained pretty well and programming them into Java should not be difficult. The intent is also not likely to be obvious to those not trained in numerical linear algebra.
Example 2
The Poisson equation is the elliptic PDE
[\begin{aligned} L(u) = f \end{aligned}]
where $L$ is the Laplacian, and $u$ and $f$ are real or complex-valued on an appropriate manifold. For simplicity, we will stick with $u$ and $f$ being real-valued and setting the domain to something simple like $[-1, 1]^2$ as a subset of $\RR^2$. Then the PDE becomes
along with some appropriate boundary condition. Let’s say the homogeneous Dirichlet condition. That is,
[\begin{aligned} \left. u \right|_{\partial[-1, 1]} \equiv 0. \end{aligned}]
In order to utilize this for obfuscation of constants, we somehow need to incorporate the constants into a solution $u(x, y)$ of the PDE above with our specified boundary condition. Clever choices of $u(x, y)$, therefore producing $f(x, y)$ produces a mathematical problem that is tedious to solve by hand but not too bad by numerics. According to Iserles in A First Course in the Numerical Analysis of Differential Equations, sections 8.2 and 15.1, we can discretize the Laplacian and utilize the Fast Fourier Transform along with Hockney’s method to produce a fast solver of the PDE above. As none of these things are trivial, especially in undocumented code (as would be the case with obfuscated code), static analysis becomes much less obvious.
Example 3
In numerical analysis, significant attention is given to techniques of approximating definite integrals. That is, given continuous $f:[a, b] \to \RR$, can you find a way to approximate
How quickly does the error of your method converge to $0$? These are questions asked in the study of quadrature. This is hardly a trivial question! One way of going about this is to approximate by sampling $f$ at points and weighting accordingly:
Obviously, then, one must figure out reasonable $w_n$ to use. Typically, people come up with these by coming up with specific kinds of choices of $f(x)$ for which the approximation should be exact (look up Newton-Cotes, for instance) or the error in approximation should be as close to $0$ as possible (equivalently, minimizing the error). One way to go about this is to minimize the square of the error using Newton’s method (or simplex method combined with Newton’s) on the derivative of the squared error function. As reading this in undocumented code is far from trivial, the intent is not obvious if implemented in obfuscation. Thus, what one can do is turn constants into simple definite integrals to compute via quadrature whose rule is found by the method above (ideally, the rule is computed from within the obfuscated program itself at runtime).
Wrapping Up!
The examples given above are simple examples (though, hardly simple to implement) of using ideas of approximation as means of constant obfuscation. Personally, I believe that approaching obfuscation in this way also gives a red-herring effect in the sense that it pads the program with code that is possibly important-looking when in reality it is an algorithm from numerical analysis designed to reliably approximate a solution to a mathematical problem. The obvious tradeoff is that appropximation errors can happen which I leave as an exercise to work around. One will also run into performance issues but that is moreso a matter of using obfuscation where appropriate. Other places to look for similar inspiration are:
Generating functions (see Genertingfunctionology by Wilf)
Numerical solutions of IVPs (see A First Course in the Numerical Analysis of Differential Equations by Iserles)
Condition numbers of matrices (see Applied Numerical Linear Algebra by Demmel)
If you participated in redpwnCTF 2021, you might know that I authored the javaisez3 reverse-engineering challenge. So… here is my writeup. I attempted to write this writeup in a way that is friendly to those who do not have a lot of experience with the Java Virtual Machine (JVM), so hopefully you will find this educational and helpful should you ever run into future Java bytecode reverse-engineering scenarios.
It will also help if you do a quick reading on how the JVM works. The JVM is a stack machine with a stack-per-method model (different methods have separate stacks). Each operand on the stack consists of 4 byte words and values can be stored and loaded from local variables unique to each method. Something important to keep in mind is that the JVM does not pop arguments off the stack in the reverse order for method invocations (which you expect in something like x64 assembly), so the following set of instructions is equivalent to xyz.itzsomebody.MyClass.myMethod(0, "hello world").
3rd round of your local Java rev! Note: This requires Java 11 and above to run.
…Not particularly helpful. We are provided a .jar file (the JVM binary format) so let’s begin taking a look at that. When we run the .jar file via java -jar javaisez3.jar, we get a message saying that I am a Hu Tao simp (which, yes, was the case back then…).
This is a direct quoting of Hu Tao from the game Genshin Impact, but we’re not here to discuss Chinese Teyvat anime girls… we are here to solve the challenge for points:tm:. Since there is nothing here of revelance, let’s attempt to analyze the program.
2. Attempting to extract compiled code
Something that should be realized here is that .jar files are actually PKZIP archives. This means that under ordinary conditions, we can normally extract the compiled form of Java known as “classfiles” from the .jar via something like 7zip or Linux’s unzip utility. Indeed, when running zipinfo on javaisez3.jar, we get
Something that should be striking you as extremely odd here is that the .class/ “directories” are unusually large. This observation is important and I will bring it up later. For now, let’s try extracting it:
$ unzip javaisez3.jar
Archive: javaisez3.jar
creating: net/redpwn/ctf/tetsujou.class/
inflating: net/redpwn/ctf/tetsujou.class/index bad CRC 2144df1c (should be ea80ebd2)
inflating: net/redpwn/ctf/tetsujou.class/name bad CRC 921fbb53 (should be 53c069a2)
inflating: net/redpwn/ctf/tetsujou.class/data bad CRC 632c8f2f (should be f504c297)
creating: net/redpwn/ctf/suo.class/
inflating: net/redpwn/ctf/suo.class/index bad CRC 99f8b879 (should be 93333e0f)
inflating: net/redpwn/ctf/suo.class/name bad CRC f3b3d38f (should be 6ede3719)
inflating: net/redpwn/ctf/suo.class/data bad CRC 632c8f2f (should be 1fef1206)
creating: net/redpwn/ctf/JavaIsEZ3.class/
inflating: net/redpwn/ctf/JavaIsEZ3.class/index bad CRC 8b4d1797 (should be c6b38137)
inflating: net/redpwn/ctf/JavaIsEZ3.class/name bad CRC 45ecf7f9 (should be 2468ef7d)
inflating: net/redpwn/ctf/JavaIsEZ3.class/data bad CRC 632c8f2f (should be 4fe7b231)
inflating: META-INF/MANIFEST.MF bad CRC 8fee4f7c (should be e49c6f24)
So… despite every entry in the ZIP file having an incorrect CRC value, the program was still able to run. What you should be getting from this is that the JVM does not verify entry CRCs when loading a Java program. This is a simple yet sometimes effective way of preventing newbies from extracting classfiles out of a .jar file. At this point, there should be at least two attack methods that should come to your mind:
Realize that since the program runs, it must be stored in memory somewhere; therefore, dumping the contents from memory will get around all this nonsense or
Attempt to figure out how the JVM loads .jar files via java -jar or
Use something like Recaf which is able to bypass all of the anti-extraction measures.
While the first option is definitely viable (and faster) and the third option is by far the most efficient, we will take the second option so I look cooler and because not everyone knew about Recaf during the duration of the CTF.
3. ClassLoaders and stuff
Almost everything in Java relies around classes. Just like at the source level, those classes also remain at the compiled level in the form of .class files (and are consequently known as classfiles), so it follow that there is some part of the JVM that must load said classfiles at runtime. In our particular case, we want to know specifically which part of the JVM loads a .jar’s classes. We can quickly do so by making a quick Java applet:
It seems like that jdk.internal.loader.ClassLoaders$AppClassLoader is what we should be looking at. So let’s go open up some OpenJDK code! For the record, subclasses are typically compiled as their own class identified by parent_class_name_and_path + $ + sub_class_name, so we are looking specifically for the class called ClassLoaders.java and then we will search for AppClassLoader within.
/**
* The application class loader that is a {@code BuiltinClassLoader} with
* customizations to be compatible with long standing behavior.
*/privatestaticclassAppClassLoaderextendsBuiltinClassLoader{static{if(!ClassLoader.registerAsParallelCapable())thrownewInternalError();}finalURLClassPathucp;AppClassLoader(PlatformClassLoaderparent,URLClassPathucp){super("app",parent,ucp);this.ucp=ucp;}...
There are lots of interesting things here. In particular, we could take a look at BuildinClassLoader, URLClassPath, PlatformClassLoader, and at various other parts of Java in the parts of AppClassLoader that I cut out of the snippet; however, I do not wish to drone on so I will immediately turn our attention to URLClassPath which is relevant to what we are trying to do. If you do find the time; however, you may wish to take a look at these various classes as they will enhance your understanding of the huge spaghetti design monster JVM.
I am not going to post specific snippets here (because I am lazy), but if you peek at URLClassPath.java, you will notice that .jar files are loaded via java.util.jar.JarFile. This is extremely convenient because now we have a way to extract said files. Here is a (very bad) code snippet I whipped up just now to aid us in extracting the classfiles.
publicclassJavaisez3Extractor{publicstaticvoidmain(String[]args){Fileinput=newFile("input.jar");Fileoutput=newFile("output.jar");try{JarFilejar=newJarFile(input);JarOutputStreamjos=newJarOutputStream(newFileOutputStream(output));Enumeration<?extendsJarEntry>entries=jar.entries();while(entries.hasMoreElements()){JarEntryentry=entries.nextElement();try{JarEntrynewEntry=newJarEntry(entry.getName());jos.putNextEntry(newEntry);jos.write(toByteArray(jar.getInputStream(entry)));}catch(Throwablet){System.out.println("Error while writing "+entry.getName());t.printStackTrace();}}jos.close();}catch(Throwablet){t.printStackTrace();}}publicstaticbyte[]toByteArray(finalInputStreamstream)throwsIOException{try(stream;varout=newByteArrayOutputStream()){varbuf=newbyte[0x1000];intbytesRead;while((bytesRead=stream.read(buf,0,buf.length))!=-1){out.write(buf,0,bytesRead);}out.flush();returnout.toByteArray();}}}
As you can clearly see, this obviously does not actually extract the entries to disk (it creates a “repaired” .jar and copies everything over instead). This is because I am a pepega and happened to find this on my laptop (in other words, I lied about whipping this up on the spot) and I did not want to spend effort on making an actual extractor. Anyways, now we can finally unzip the .jar without errors:
This is, of course, another Hu Tao quote, but our aim is for flag, not anime girl. Recall in section (1), I pointed out that the .class/ “directories” are unusually large. Well, that is because those “directories” are not directories at all. The JVM allows you to suffix classfiles with a forward slash when writing them as ZIP file entries and the .jar program will still run as if nothing happened. The result prevents most extraction attempts out of the box because most ZIP archivers will treat the entry as a directory rather than file. If you want to know why this works, you will need to dig around in java.util.zip.ZipFile or java.util.jar.JarFile as well as some of the other ClassLoader-related classes we touched on earlier. But for now, just assume it is dark magic (and stupid, obviously) that such a feature of the JVM exists.
So… uh… yeah…
Updated “"”extractor”””:
publicclassJavaisez3Extractor{publicstaticvoidmain(String[]args){Fileinput=newFile("input.jar");Fileoutput=newFile("output.jar");try{JarFilejar=newJarFile(input);JarOutputStreamjos=newJarOutputStream(newFileOutputStream(output));Enumeration<?extendsJarEntry>entries=jar.entries();while(entries.hasMoreElements()){JarEntryentry=entries.nextElement();try{if(entry.getName().contains(".class/")&&!entry.getName().endsWith(".class/")){// Skip fake entriescontinue;}JarEntrynewEntry=newJarEntry(entry.getName().replace(".class/",".class"));jos.putNextEntry(newEntry);jos.write(toByteArray(jar.getInputStream(entry)));}catch(Throwablet){System.out.println("Error while writing "+entry.getName());t.printStackTrace();}}jos.close();}catch(Throwablet){t.printStackTrace();}}publicstaticbyte[]toByteArray(finalInputStreamstream)throwsIOException{try(stream;varout=newByteArrayOutputStream()){varbuf=newbyte[0x1000];intbytesRead;while((bytesRead=stream.read(buf,0,buf.length))!=-1){out.write(buf,0,bytesRead);}out.flush();returnout.toByteArray();}}}
Much better. Now we can finally start some actual reverse-engineering instead of dealing with ZIP stupidity.
5. Defeating string encryption
From META-INF/MANIFEST.MF, we can see that net.redpwn.ctf.JavaIsEZ3 is the entry point. Let’s decompile the main method in Fernflower and see what we get:
Dunno about you, but this is pretty illegible to me. Plus, more than half of this isn’t legal Java. However, something that we can make out very consistently in this method (other than the invokedynamic stuff which I’ll get to later) is the static method invocation tetsujou.saisaki(). And due to the way it appears in the decompiled result, we can assume it must be some sort of string encryption. Let’s take a look at net.redpwn.ctf.tetsujou to check if this is the case.
After cleaning up net.redpwn.ctf.tetsujou a bit, we are left with
First, let’s deal with the tetsujou() method. The tetsujou() makes use of something called the Java Reflection API which allows Java to programs to introspect stuff at runtime. This is probably one of the more powerful parts Java that any reverse-engineer should know of as it is one of the few tools that allows for limited dynamic debugging. To spare you the time and effort, I will translate the reflection API stuff into the more readable and semantically equivalent direct invocation form.
privatestaticinttetsujou(StringclassName)throwsException{// Gets .jar containing this the net.redpwn.ctf.tetsujou class (in this case, javaisez3.jar)Filefile=newFile(tetsujou.class.getProtectionDomain().getCodeSource().getLocation().getPath());ZipFilezip=newZipFile(var1);// Opens as ZIPZipEntryentry=zip.getEntry(className.replace('.','/')+".class");// Gets className from javaisez3.jarStringcomment=entry.getComment();// Gets entry commentreturncomment.hashCode();// Returns hashcode}
To see why this is useful, let’s turn our attention back to saisaki(). We note that saisaki() is context-sensitive in the way that it computes four different XOR keys based on what appears on the callstack (or stacktrace, in Java-world). At any given moment, we can expect the callstack to look like the following:
callstack[0]=thread.currentStackTrace()// Last method invoked before capturing the callstack statecallstack[1]=net.redpwn.ctf.tetsujou.saisaki()// Invokes Thread.currentStackTrace()callstack[2]=anothercaller.anothermethod()// Invokes net.redpwn.ctf.tetsujou.saisaki()...// everything else is irrelevant since saisaki() only looks at indexs 0 to 2
Now we have a problem. In addition to computing keys dynamically from the callstack, saisaki() also uses tetsujou() to grab the hashcode of either the caller’s or saisaki()’s ZIP entry comment. This means that we need to extract those comments from javaisez3.jar to continue further. Fortunately, because this program is pretty small, we can just use zipinfo -v javaisez3.jar and get those by hand. If this program was larger, we would have had to have written a deobfuscator to automate the process. Here are the relevant entry comments:
Finally, we can decrypt all of the strings in the program.
publicclassJavaIsEZ3{privatestaticbyte[]araragi;privatestaticint[]hitagi;privatestaticbooleanoshino(char[][]var0){char[]var1=var0[1];Stringvar12=newString(var1);if(var12.b<invokedynamic>(var12,"java.lang.String",634352354493306863L)!=998474623){returnfalse;}else{int[]var2=newint[6];intvar3=0;intvar4;for(var4=0;var4<var1.a<invokedynamic>(var1,"net.redpwn.ctf.JavaIsEZ3",-4280091229029863812L);var4+=4){var2[var3++]=(var1[var4]<<24|var1[var4+1]<<16|var1[var4+2]<<8|var1[var4+3])^118818581;}var4=0;int[]var5=newint[15];intvar6=0;booleanvar7=true;while(true){bytevar8="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-3219759681266250937L)[var4];bytevar9;intvar11;switch(var8){case0:var9="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-3219759681266250937L)[var4+1];--var6;var2[var9]=var5[var6];var4+=2;break;case1:var9="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-3219759681266250937L)[var4+1];var5[var6++]=var2[var9];var4+=2;break;case2:returnvar7;case3:var11="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-3219759681266250937L)[var4+1]<<24|"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-3219759681266250937L)[var4+2]<<16|"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-3219759681266250937L)[var4+3]<<8|"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-3219759681266250937L)[var4+4];var5[var6++]=var11;var4+=5;break;case4:--var6;var11=var5[var6];--var6;intvar10=var5[var6];var7&=var10==var11;++var4;break;case5:var11="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-3219759681266250937L)[var4+1]<<8|"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-3219759681266250937L)[var4+2];var5[var6++]=var11;var4+=3;break;case6:var9="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-3219759681266250937L)[var4+1];var5[var6++]=var9;var4+=2;}}}}privatestaticbooleansengoku(char[][]var0){char[]var1=var0[2];long[]var2=newlong[15];intvar3=0;intvar4;for(var4=0;var4<var1.a<invokedynamic>(var1,"net.redpwn.ctf.JavaIsEZ3",-4280091229029863812L);var4+=8){var2[var3++]=((long)var1[var4]<<56|(long)var1[var4+1]<<48|(long)var1[var4+2]<<40|(long)var1[var4+3]<<32|(long)var1[var4+4]<<24|(long)var1[var4+5]<<16|(long)var1[var4+6]<<8|(long)var1[var4+7])^216743518893377301L;}Stringvar10002=newString(var1);var2[var3]=(long)var10002.b<invokedynamic>(var10002,"java.lang.String",634352354493306863L);var4=0;long[]var5=newlong[15];intvar6=0;while(true){intvar7="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4];intvar8;intvar9;longvar10;switch(var7){case0:var10=(long)"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+1]<<56|(long)"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+2]<<48|(long)"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+3]<<40|(long)"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+4]<<32|(long)"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+5]<<24|(long)"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+6]<<16|(long)"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+7]<<8|(long)"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+8];var5[var6++]=var10;var4+=9;break;case1:var10=(long)"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+1]<<24|(long)"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+2]<<16|(long)"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+3]<<8|(long)"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+4];var5[var6++]=var10;var4+=5;break;case2:var10=(long)"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+1]<<8|(long)"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+2];var5[var6++]=var10;var4+=3;break;case3:var10=(long)"net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+1];var5[var6++]=var10;var4+=2;break;case4:Stringvar11="net.redpwn.ctf.JavaIsEZ3";var8=var11.d<invokedynamic>(var11,-5227033679506699442L)[var4+1];var11="net.redpwn.ctf.JavaIsEZ3";var9=var11.d<invokedynamic>(var11,-5227033679506699442L)[var4+2];var2[0]=var2[var8]==var2[var9]?0L:1L;var4+=3;break;case5:var4="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+1];break;case6:if(var2[0]==0L){var4="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+1];}else{var4+=2;}break;case7:if(var2[0]!=0L){var4="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+1];}else{var4+=2;}break;case8:var8="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+1];var9="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+2];var2[var8]^=var2[var9];var4+=3;break;case9:var8="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+1];var9="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+2];var2[var8]|=var2[var9];var4+=3;case10:case11:case12:case13:case14:case15:default:break;case16:var8="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+1];var9="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+2];var2[var8]&=var2[var9];var4+=3;break;case17:var8="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+1];--var6;var2[var8]=var5[var6];var4+=2;break;case18:var8="net.redpwn.ctf.JavaIsEZ3".d<invokedynamic>("net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L)[var4+1];var5[var6++]=var2[var8];var4+=2;break;case19:returnvar2[0]==0L;}}}privatestaticvoidkanbaru(char[][]var0){for(intvar1=0;var1<var0.a<invokedynamic>(var0,"net.redpwn.ctf.JavaIsEZ3",-4280091229029863812L)-1;++var1){char[]var2=var0[var1];char[]var3=var0[var1+1];for(intvar4=0;var4<var2.a<invokedynamic>(var2,"net.redpwn.ctf.JavaIsEZ3",-4280091229029863812L);++var4){var3[var4]^=var2[var4];}}}publicstaticinthachikuji(Objectvar0){try{Stringvar4="java.lang.reflect.Array";Classvar1=var4.a<invokedynamic>(var4,"java.lang.Class",-2913566224361156927L);Methodvar2=var1.b<invokedynamic>(var1,"getLength",newClass[]{Object.class},"java.lang.Class",-3289775410440245109L);var2.b<invokedynamic>(var2,true,"java.lang.reflect.Method",-2856230251947868740L);Integervar5=(Integer)var2.b<invokedynamic>(var2,(Object)null,newObject[]{var0},"java.lang.reflect.Method",-5083925746546271785L);returnvar5.b<invokedynamic>(var5,"java.lang.Integer",2388217054567176175L);}catch(Throwablevar3){thrownewRuntimeException("Ayaya");}}publicstaticvoidmain(String[]var0){if(var0.a<invokedynamic>(var0,"net.redpwn.ctf.JavaIsEZ3",-4280091229029863812L)==0){try{"javax.swing.UIManager".a<invokedynamic>("javax.swing.UIManager",8560971300846057061L).a<invokedynamic>("javax.swing.UIManager".a<invokedynamic>("javax.swing.UIManager",8560971300846057061L),"javax.swing.UIManager",1235598990591485937L);null.a<invokedynamic>((Object)null,"Silly-churl, billy-churl, silly-billy hilichurl... Woooh!\n~A certain Wangsheng Funeral Parlor director\n\n(This is not the flag, btw)","javax.swing.JOptionPane",-8331272066798825690L);}catch(Throwablevar5){}}else{if(var0[0].b<invokedynamic>(var0[0],"java.lang.String",-4751795797312301073L)!=48){"java.lang.System".d<invokedynamic>("java.lang.System",474225325441265L).b<invokedynamic>("java.lang.System".d<invokedynamic>("java.lang.System",474225325441265L),"*fanfare* You've been pranked!","java.io.PrintStream",-1351703383126743055L);return;}Stringvar6="WalnutGirlBestGirl_07/15";char[]var1=var6.b<invokedynamic>(var6,"java.lang.String",3921157978488572744L);char[][]var2=newchar[][]{var1,null,null};intvar3=var0[0].b<invokedynamic>(var0[0],"java.lang.String",-4751795797312301073L)/2;for(intvar4=0;var4<2;++var4){intvar10001=var4+1;Stringvar10002=var0[0].b<invokedynamic>(var0[0],var4*var3,(var4+1)*var3,"java.lang.String",2278661231839426149L);var2[var10001]=var10002.b<invokedynamic>(var10002,"java.lang.String",3921157978488572744L);}var2.a<invokedynamic>(var2,"net.redpwn.ctf.JavaIsEZ3",-4036077825718603401L);if(var2.a<invokedynamic>(var2,"net.redpwn.ctf.JavaIsEZ3",-4328141322681971509L)&var2.a<invokedynamic>(var2,"net.redpwn.ctf.JavaIsEZ3",8504114058794503371L)&var0[0].b<invokedynamic>(var0[0],"java.lang.String",634352354493306863L)==1101317042){"java.lang.System".d<invokedynamic>("java.lang.System",474225325441265L).b<invokedynamic>("java.lang.System".d<invokedynamic>("java.lang.System",474225325441265L),"Chute. Now you know my secret","java.io.PrintStream",-1351703383126743055L);}else{"java.lang.System".d<invokedynamic>("java.lang.System",474225325441265L).b<invokedynamic>("java.lang.System".d<invokedynamic>("java.lang.System",474225325441265L),"*fanfare* You've been pranked!","java.io.PrintStream",-1351703383126743055L);}}}static{(newbyte[]{3,88,72,7,83,3,2,70,7,70,3,43,10,46,76,3,42,0,117,5,3,9,5,113,24,3,54,24,10,28,1,0,4,1,1,4,1,2,4,1,3,4,1,4,4,1,5,4,2}).f<invokedynamic>(newbyte[]{3,88,72,7,83,3,2,70,7,70,3,43,10,46,76,3,42,0,117,5,3,9,5,113,24,3,54,24,10,28,1,0,4,1,1,4,1,2,4,1,3,4,1,4,4,1,5,4,2},"net.redpwn.ctf.JavaIsEZ3",-3219759681266250937L);(newint[]{1,102,214,57,24,0,118,112,88,118,107,110,50,46,0,113,67,20,106,112,110,31,33,0,109,102,121,67,57,77,57,109,17,4,17,5,17,6,17,7,5,47,3,1,17,0,19,4,0,4,7,42,4,1,5,7,42,4,2,6,7,42,4,3,7,7,42,19}).f<invokedynamic>(newint[]{1,102,214,57,24,0,118,112,88,118,107,110,50,46,0,113,67,20,106,112,110,31,33,0,109,102,121,67,57,77,57,109,17,4,17,5,17,6,17,7,5,47,3,1,17,0,19,4,0,4,7,42,4,1,5,7,42,4,2,6,7,42,4,3,7,7,42,19},"net.redpwn.ctf.JavaIsEZ3",-5227033679506699442L);}}
6. Defeating invokedynamic
If you’ve been following along up to now, you have most definitely been wondering what <invokedynamic> is and for that I give you this: https://stackoverflow.com/a/6638764.
Now, here’s the rub about invokedynamic instructions: only a few patterns of them decompile nicely. Notable examples of nicely-decompilable ones involve lambdas and the others are for Java 11 string concatenation. Almost everything else does not decompile into valid Java code. Because of this simple fact, many Java obfuscators abuse invokedynamic instructions to make decompiled Java very messy. What you are seeing in my decompiled sample is Fernflower’s attempt to show something somewhat reasonable and for our purposes, it will be sufficient for the most part.
First, we will need to locate the bootstrap method because that tells us how the CallSite for each invokedynamic instruction is resolved. Because Fernflower isn’t nice enough to show us that, we will need to peek at the bytecode which means we need a bytecode editor of some sort. Personally, I recommend you use either Recaf or Krakatau disassembler for working with obfuscated samples. You can also use javap, but you will need to resolve the illegal method/field generics signatures before you can do by whatever means. From here on out, though, I will be using Recaf’s disassembler to aid me in attacking the invokedynamics.
Let’s take the first occurence of an invokedynamic instruction in the main method of the program:
So as we can see, this invokedynamic instruction pops 3 objects (technically 4 due to how longs/doubles work in the JVM) off the stack. It is then bootstrapped by net.redpwn.ctf.suo.oshino(). Let’s take a look at that method.
This is the bytecode equivalent of MutableCallSite var3 = new MutableCallSite((MethodType)var2). As we can see, it initializes a new MutableCallSite and passes in the second method argument casted to MethodType and assigns the value to var3. What’s more interesting is the next couple of lines:
The reason Fernflower kinda died is because of something called a MethodHandle constant (in this case, it is the LDC handle... instruction). The JVM allows you to define ldc instructions such that you can load MethodHandles as constants. There is absolutely no semantic equivalent in Java as the closest you can come in Java significantly changes the callstack. Fortunately, unlike the string decryptor from earlier, the invokedynamic bootstrap resolver isn’t context-sensitive with regards to the callstack state so we can ignore this hindsight. The closest thing you will get is something like
With some referene to the Java documentation, we can see that ldcHandle is subjected to MethodHandle#asCollector which creates an array-collecting MethodHandle that accepts a given number of trailing positional arguments and collects them into an array argument. It seems like ldcHandle, a MethodHandle to the teori() method is pretty important here. So let’s now turn our attention to that.
Just from the first few lines, we can sort of already understand the structure of this method with respect to the invokedynamic instructions we have been seeing without having to fully understand the MethodHandle documentation. Here’s the one I pointed out earlier for reference:
Notice that var10 is assigned the the second-to-last value in the array var4 which happens to be a string. In this particular invokedynamic instruction, we note that this matches up with LDC "net.redpwn.ctf.JavaIsEZ3" which positionally second-to-last in the dynamic invocation. Furthermore, we see that var6 is assigned the last value in the array var4 which happens to be a long integer. This matches up with LDC -4280091229029863812L which is positionally last in the dynamic invocation. Thus, we can now make the reasonable assumption that this invokedynamic instruction must resolve to either invoke some method or set/get some field in net.redpwn.ctf.JavaIsEZ3. Let’s continue on.
Ah, this looks like some pretty important meat. The program first figures out whether the invokedynamic instruction should resolve to a method invocation or a field getter/setter via the value of var5. If you’ve done you’re homework, then you will know that var5 is just the first character of the name of the invokedynamic instruction. So if we use our same example, we know that we’re looking at a method invocation because INVOKEDYNAMIC a (Ljava/lang/Object... indicates that a is the name of the invokedynamic instruction. Here’s the same example in case you got lost:
So following along the execution path, we see that the mizudori() method returns something with a type of java.lang.reflect.Method. This seems of interest so let’s also jump into that.
From this method, we notice that the program searches for a Method in var0 whose string name hashcode is equal to var1. If one cannot be found, then it searches implemented interfaces or the super class for the Method in question via the same matter. This is sort of pepega because if you know anything about how Java computes the hashcode for a string, you will know that it is lossy. We also note that this method calls oikura. Peeking at that, we can see that oikura computes a key based on the hashcode of each of the parameters of the Method in question as well as its return type.
However, this doesn’t get around the fact that Java string hashcodes are lossy. In other words, we cannot just go from hashcode to the original string – we need to do some trial-and-error matching; however, we can be smart about the way we do it. Let’s look at our invokedynamic instruction again:
Based on what we have seen, we know that this instruction resolves to a method in net.redpwn.ctf.JavaIsEZ3 or one of its parent classes. Seeing as that net.redpwn.ctf.JavaIsEZ3 doesn’t extend or implement any classes, we know that we at most only have to check two classes for the desired method: net.redpwn.ctf.JavaIsEZ3 and java.lang.Object (all classes eventually inherit java.lang.Object at some point in the hierarchy with the except of java.lang.Object itself). Furthermore, we know that the signed long -4280091229029863812L is used to hold two hashcodes. After pulling up jshell, we can see that the two hashcodes are -996536396 (method name) and 1063841404 (parameter + return type name hashcode).
Now we look at the method names in net.redpwn.ctf.JavaIsEZ3: oshino, sengoku, kanbaru, hachikuji, and main. Notice that none of these methods have the same name, so we may be able to get away with just checking method names instead checking both method names and parameter/return type names. Quick Java program to do figure out which one is the correct one gives hachikuji.
Now we see that var12 is set to the result of var0.unreflect(var15).asType(var14). Then, the method drops the two arguments which provide the class to search and the long containing the two hashcodes and invokes the result. This means that net.redpwn.ctf.JavaIsEZ3 and -4280091229029863812L do not actually appear in the final (unobfusated) program but are only used to resolve the MethodHandle to the correct method. Thus, we can look at our current decompiled result in the program and replace
Furthermore, we also notice that there are many occurences of someVar.a<invokedynamic>(someVar, "net.redpwn.ctf.JavaIsEZ3", -4280091229029863812L). Because we know that this particular invokedynamic will always resolve to hachikuji, we can replace each of those with hachikuji(someVar).
Before we call it a day on invokedynamics, I need to address a couple more things. First, let’s look at this more interesting example that is, again, from the main method:
if(var0[0].b<invokedynamic>(var0[0],"java.lang.String",-4751795797312301073L)!=48){// Ignored all the invokedynamics below, we're looking only at `b<invokedynamic>(var0[0], "java.lang.String", -4751795797312301073L)`"java.lang.System".d<invokedynamic>("java.lang.System",474225325441265L).b<invokedynamic>("java.lang.System".d<invokedynamic>("java.lang.System",474225325441265L),"*fanfare* You've been pranked!","java.io.PrintStream",-1351703383126743055L);return;}
It should be pretty obvious that something is being done to var0[0], but it’s not completely clear what. Also, unlike our previous example, this invokedynamic instruction has a name of b. So let us revisit our evilgood friend again:
This looks pretty similar to what we were looking at earlier; however, this time the program specifically inserts a parameter type at the beginning of the parameter list. This is confusing, so maybe we should attempt to figure out what exactly is going on. Now, unlike the previous example, java.lang.String has a fair number of overloaded methods (and a lot of methods in general), so it’s actually quite a bit faster to cheese by copy-pasting some of the program’s code (specifically the mizudori and oikura methods) into a new script and having Java locate those methods for you. Here’s how I would do it:
publicstaticvoidmain(String[]args)throwsThrowable{Methodmethod=mizudori(String.class,(int)(-4751795797312301073L>>32),(int)(-4751795797312301073L>>0));System.out.println(method);// Output: public int java.lang.String.length()}
Very clearly, we can then replace var0[0].b<invokedynamic>(var0[0], "java.lang.String", -4751795797312301073L) != 48 with var0[].length() != 48, but this still doesn’t answer our question earlier of why an Object parameter is inserted at index 0. The reason for that was because String#length() is a virtual (non-static) method. To understand why that makes such a difference, let me show you what that looks like in bytecode-land:
This is what var0[].length() looks like in bytecode and what you should be noticing here is that there is an additional object pushed on the stack despite String#length() taking no arguments. This is because invokevirtual instructions implicitly pop an additional word off the stack as the instance upon which the virtual method is invoked. This is also what allows the this keyword in Java to work: the zeroth slot in any virtual method is always the instance upon which you are invoking (var0[0] in this case). If you are familiar with Python, you will notice that Java is internally very similar to Python in this aspect in that Python forces virtual methods to have the self keyword as the first parameter.
Finally, the last thing I would like to address is fields: You can get and set fields via MethodHandle’s. Reversing invokedynamic’d fields is very similar to reversing invokedynamics to methods so I’ll leave that as an exercise to the reader.
Once you get through all of the invokedynamic instructions, you should be left with something like the following:
publicclassJavaIsEZ3{privatestaticbyte[]araragi;privatestaticint[]hitagi;privatestaticbooleanoshino(char[][]var0){char[]var1=var0[1];Stringvar12=newString(var1);if(var12.hashCode()!=998474623){returnfalse;}else{int[]var2=newint[6];intvar3=0;intvar4;for(var4=0;var4<hachikuji(var1);var4+=4){var2[var3++]=(var1[var4]<<24|var1[var4+1]<<16|var1[var4+2]<<8|var1[var4+3])^118818581;}var4=0;int[]var5=newint[15];intvar6=0;booleanvar7=true;while(true){bytevar8=araragi[var4];bytevar9;intvar11;switch(var8){case0:var9=araragi[var4+1];--var6;var2[var9]=var5[var6];var4+=2;break;case1:var9=araragi[var4+1];var5[var6++]=var2[var9];var4+=2;break;case2:returnvar7;case3:var11=araragi[var4+1]<<24|araragi[var4+2]<<16|araragi[var4+3]<<8|araragi[var4+4];var5[var6++]=var11;var4+=5;break;case4:--var6;var11=var5[var6];--var6;intvar10=var5[var6];var7&=var10==var11;++var4;break;case5:var11=araragi[var4+1]<<8|araragi[var4+2];var5[var6++]=var11;var4+=3;break;case6:var9=araragi[var4+1];var5[var6++]=var9;var4+=2;}}}}privatestaticbooleansengoku(char[][]var0){char[]var1=var0[2];long[]var2=newlong[15];intvar3=0;intvar4;for(var4=0;var4<hachikuji(var1);var4+=8){var2[var3++]=((long)var1[var4]<<56|(long)var1[var4+1]<<48|(long)var1[var4+2]<<40|(long)var1[var4+3]<<32|(long)var1[var4+4]<<24|(long)var1[var4+5]<<16|(long)var1[var4+6]<<8|(long)var1[var4+7])^216743518893377301L;}Stringvar10002=newString(var1);var2[var3]=(long)var10002.hashCode();var4=0;long[]var5=newlong[15];intvar6=0;while(true){intvar7=hitagi[var4];intvar8;intvar9;longvar10;switch(var7){case0:var10=(long)hitagi[var4+1]<<56|(long)hitagi[var4+2]<<48|(long)hitagi[var4+3]<<40|(long)hitagi[var4+4]<<32|(long)hitagi[var4+5]<<24|(long)hitagi[var4+6]<<16|(long)hitagi[var4+7]<<8|(long)hitagi[var4+8];var5[var6++]=var10;var4+=9;break;case1:var10=(long)hitagi[var4+1]<<24|(long)hitagi[var4+2]<<16|(long)hitagi[var4+3]<<8|(long)hitagi[var4+4];var5[var6++]=var10;var4+=5;break;case2:var10=(long)hitagi[var4+1]<<8|(long)hitagi[var4+2];var5[var6++]=var10;var4+=3;break;case3:var10=(long)hitagi[var4+1];var5[var6++]=var10;var4+=2;break;case4:var8=hitagi[var4+1];var9=hitagi[var4+2];var2[0]=var2[var8]==var2[var9]?0L:1L;var4+=3;break;case5:var4=hitagi[var4+1];break;case6:if(var2[0]==0L){var4=hitagi[var4+1];}else{var4+=2;}break;case7:if(var2[0]!=0L){var4=hitagi[var4+1];}else{var4+=2;}break;case8:var8=hitagi[var4+1];var9=hitagi[var4+2];var2[var8]^=var2[var9];var4+=3;break;case9:var8=hitagi[var4+1];var9=hitagi[var4+2];var2[var8]|=var2[var9];var4+=3;case10:case11:case12:case13:case14:case15:default:break;case16:var8=hitagi[var4+1];var9=hitagi[var4+2];var2[var8]&=var2[var9];var4+=3;break;case17:var8=hitagi[var4+1];--var6;var2[var8]=var5[var6];var4+=2;break;case18:var8=hitagi[var4+1];var5[var6++]=var2[var8];var4+=2;break;case19:returnvar2[0]==0L;}}}privatestaticvoidkanbaru(char[][]var0){for(intvar1=0;var1<hachikuji(var0)-1;++var1){char[]var2=var0[var1];char[]var3=var0[var1+1];for(intvar4=0;var4<hachikuji(var2);++var4){var3[var4]^=var2[var4];}}}publicstaticinthachikuji(Objectvar0){try{Stringvar4="java.lang.reflect.Array";Classvar1=Class.forName(var4);Methodvar2=var1.getDeclaredMethod("getLength",newClass[]{Object.class});var2.setAccessible(true);Integervar5=(Integer)var2.invoke((Object)null,newObject[]{var0});returnvar5.intValue();}catch(Throwablevar3){thrownewRuntimeException("Ayaya");}}publicstaticvoidmain(String[]var0){if(hachikuji(args)==0){try{UIManager.setLookAndFeel(UIManager.getSystemLookAndFeelClassName());JOptionPane.showMessageDialog("Silly-churl, billy-churl, silly-billy hilichurl... Woooh!\n~A certain Wangsheng Funeral Parlor director\n\n(This is not the flag, btw)");}catch(Throwablevar5){}}else{if(var0[0].length()!=48){System.out.println("*fanfare* You've been pranked!");return;}Stringvar6="WalnutGirlBestGirl_07/15";char[]var1=var6.toCharArray();char[][]var2=newchar[][]{var1,null,null};intvar3=var0[0].length()/2;for(intvar4=0;var4<2;++var4){intvar10001=var4+1;Stringvar10002=var0[0].substring(var4*var3,(var4+1)*var3);var2[var10001]=var10002.toCharArray();}kanbaru(var2);if(oshino(var2)&sengoku(var2)&(var0[0].hashCode()==1101317042)){System.out.println("Chute. Now you know my secret");}else{System.out.println("*fanfare* You've been pranked!");}}}static{araragi=newbyte[]{3,88,72,7,83,3,2,70,7,70,3,43,10,46,76,3,42,0,117,5,3,9,5,113,24,3,54,24,10,28,1,0,4,1,1,4,1,2,4,1,3,4,1,4,4,1,5,4,2};hitagi=newint[]{1,102,214,57,24,0,118,112,88,118,107,110,50,46,0,113,67,20,106,112,110,31,33,0,109,102,121,67,57,77,57,109,17,4,17,5,17,6,17,7,5,47,3,1,17,0,19,4,0,4,7,42,4,1,5,7,42,4,2,6,7,42,4,3,7,7,42,19};}}
7. Quick analysis of deobfuscated program (main method)
Now that we pretty much have fully deobfuscated everything, we can get to some actual reversing. Following along the main method, we can see that the program checks if the input string (assuming you passed one in) is 48 characters long. Later in the code, we can see that the input’s hashcode is compared against the value 1101317042. Again, because hashcodes are lossy, we cannot reconstruct the flag as is.
The program proceeds to split the user input in half and shove it into an array of 3 character arrays var2, the first element is "WalnutGirlBestGirl_07/15".toCharArray() and the second and third elements are the two halves of the flag, respectively. We then see that the method kanbaru is called on var2. Looking at the code of kanbaru, we see that it is a chaining block cipher (with no cipher or key, lol) with WalnutGirlBestGirl_07/15 as the IV. Afterwords, oshino and sengoku both perform some kind of checking on var2.
Before moving on, I would like to bring our attention to hachikuji. Note that it is just a reflection equivalent of
In other words, it just returns the length of an arbitrary array. Alright, onto oshino!
8. Reversing oshino’s VM
We notice that the first thing oshino does is convert the second element (var1) of the array of character arrays (i.e. the first half of the user input) back into a string and compares its hashcode to 998474623. Still lossy so no hope there. Then, we run into the following
Seeing as that var1 is the first half of the user’s input (which is of length 48 / 2 = 24), this loop creates six 4-byte integers by OR-ing the individual character bytes together then XORs the result with 118818581 or 0x07150715 in hex. Looking at the rest of the method, we can see that the main meat of the check is a simple stack VM. This allows us to decode the instructions stored in araragi as so:
If any of the comparison checks fail, then var7 is set to false. So, in other words, our CBC’d version of the first half of the user’s input’s bytes must match up against the six 4-byte integers pushed on the VMs stack. Here is a Java equivalent:
sengoku entire method operates in a very similar way to oshino’s does, though, sengoku’s VM has a different architecture. sengokus VM operates instead on 8-byte integers (or longs in Java-land) and is considerably less stack-based than oshino’s VM. sengoku’s VM also allows for flow control opcodes, which leads to a slightly more difficult reversing task. However, with some patience, we get the following:
So var1 is assigned the second half of the user’s input. Furthermore, the user’s ciphered input is shoved into the first 3 registers as 8-byte integers (24 / 8 = 3) via the for loop and var2[3] is assigned the value of the hashcode of the second half of the user’s input. A Java equivalent of the VM code would be
Since user’s input is ciphered via CBC and oshino and sengoku do what are essentially fancy byte-by-byte checking, we can extract the flag from the program by going through the process in reverse. First, we will take the integers from oshino and sengoku that are compared against ciphered user input, and reconstruct the two ciphered halves of the flag. Then, we will undo the chaining block cipher, and we should get our flag.
And indeed, we get our flag as desired: flag{d1d_y0u_kn0w?_chr1s_is_4_Hu_Tao_s1mp!_0715}
Of course, we should always check to make sure our flag actually works, so let’s do that now:
$ java -jar backup.jar "flag{d1d_y0u_kn0w?_chr1s_is_4_Hu_Tao_s1mp\!_0715}"
Chute. Now you know my secret
11. Afterword
There were very, very, few solves on JavaIsEZ3 this year. This tells me that I may have set the technicality level on this challenge much higher than I should have. To any frustrated CTFers, if you felt that was the case, I am sorry that that was the case and am currently rethinking how to present JavaIsEZ4 (if I finish it by next year) in a way that is significantly less technical while being solvable by most CTFers.
There was a lot of manual work in this writeup which would lead any person to asking if there is a way to automate the process. The answer is yes, but not many tools exist to do so. For inspiration, I suggest you look at the java-deobfuscator tools which can be found on GitHub at: https://github.com/java-deobfuscator/. Something that may also help is https://github.com/GraxCode/threadtear. Both utilize the OW2 ASM libraries to parse, edit, and write Java bytecode.
Some competitors asked me what tools should be used for Java reversing (specifically decompilation and disassembling). I generally recommend people to use https://github.com/Col-E/Recaf because
It is actively maintained.
It is designed specifically to be able to handle obfuscated JARs.
I have interacted with the developers so I have a bias.
Some other good tools are Krakatau (made by the legendary Robert Grosse) and java-disassembler (made by Bibl and cts). As for debuggers, none really exist unfortunately.