7

JEP draft: Integrity and Strong Encapsulation

 1 year ago
source link: https://openjdk.org/jeps/8305968
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Summary

The Java Platform assures the integrity of code and data with a variety of features that are on by default. Strong encapsulation is one such feature, but it can be circumvented by some APIs, causing headaches for maintenance and performance. As Java continues to move forward, it is appropriate to restrict all APIs so that they cannot break strong encapsulation, while still accommodating use cases that need to operate beyond encapsulation boundaries.

Goals

  • Allow the Java Platform to robustly maintain invariants – of its own operation as well as that of Java applications – required for maintainability, security, and performance.

  • Clarify the plethora of Java and non-Java APIs that can break strong encapsulation.

  • Differentiate use cases where breaking encapsulation is convenient from use cases where disabling encapsulation is essential.

Non-Goals

  • It is not a goal to guard against extreme situations where users manipulate the file system, operating system, or hardware beneath the Java Virtual Machine.

Motivation

Integrity by default

Over the past few years, the JDK has been inching toward a vision of integrity by default. Integrity is the ability of one part of a program to locally establish an invariant – a condition that always holds – that is then guaranteed to apply globally throughout the program. For example, the field initialization, int[] x = new int[10], establishes an invariant that the array referenced by x will never be read or written past its last element; it is an integrity invariant because the platform will guarantee that it holds everywhere in the program. In contrast, that invariant may or may not hold in a particular C program. Because it is not an integrity invariant of C, the only way to know whether or not that invariant holds is to analyze the entire program.

Here are some other integrity invariants offered by the Java Platform:

  • A program's initial state is always well defined because variables and arrays are initialized before use.

  • A program has no dangling pointers and never suffers from "use after free" because automatic memory management is ever-present.

  • A program can perform only valid operations on data – there is no unsafe pointer casting (e.g. a String cannot be cast to a Socket) – because Java programs are type-safe.

Integrity is important for human readers of the code, who need to reason about the program's correctness, as well as for the platform itself, which transforms source code (through various stages of compilation) and executes it and must be certain that the transformations it applies preserve the program's meaning.

  • Integrity is sensitive: it is a property of the overall system. If a single library that is a direct or transitive dependency of a program is somehow able to violate an invariant, it has no integrity and cannot be trusted.

  • Integrity is silent: Integrity invariants are typically safety properties that prevent "bad things" from happening. As such, you only notice how much you depend on integrity when it breaks.

  • Integrity is only sensible as the default. Adding invariants after the fact is harder than removing them when absolutely needed.

Encapsulation as the Foundation of Integrity

The integrity invariants listed above are enforced by the Java Virtual Machine; they ensure that Java, unlike C, does not have undefined behavior. Java developers also want to locally establish their own integrity invariants, specific to their application or library. To do this, developers use access control modifiers – public , private , protected , and the default "package" access – to hide code and data declared in one part of the program from other parts, thereby providing encapsulation for their code and data. For example, here is a class that establishes an invariant: its state (the field x) has the correct parity (is always even, never odd):

public final class Even {
    private int x = 0;
    public int value() { return x; }
    public void incrementByTwo() { x += 2; }
    public void decrementByTwo() { x -= 2; }
}

By declaring x as private and having all the public methods preserve the parity of x, the developer has used encapsulation to establish the invariant that every Even object in the program has even state. Encapsulation, therefore, offers an invariant (no one else can touch x) that allows establishing other invariants (x is always even).

Encapsulation is a cornerstone of programming in the large because it allows a program to be constructed from independently-developed components that interact only through their public APIs, each of them can be reasoned about in isolation. It is this ability that allows both individual Java programs and the entire Java ecosystem to scale as collections of independent, interoperating components.

From Encapsulation To Strong Encapsulation

Unfortunately, the parity invariant above does not have the integrity that the developer might hope for. This is because any code on the class path could employ deep reflection to override access control (the private modifier on x) and assign an odd value to x directly. Deep reflection has existed since JDK 1.1, when the method java.lang.reflect.AccessibleObject.setAccessible was introduced.

Given the possibility of some code calling this method, it would take a global analysis of the codebase to ensure that Even's parity is, indeed, an invariant. We may assume that no code in our application would intentionally break that invariant, but it could be broken unintentionally. For example, some other programmer in the organization could decide to serialize and deserialize instances of Even to and from JSON using a library. When deserializing JSON input, the library would bypass Even's public API and use deep reflection to set the value of x. If the JSON input contains an odd number, the invariant will be broken.

As a result of deep reflection – and other mechanisms that disregard or bypass encapsulation, to be discussed later – the meaning of Java code is provisional. A method or field is private, unless other code really wants to access it; a final field is assigned once, unless other code wants to assign it again later; the meaning of a method is defined by a block of code, unless other code decides to redefine the method later (the last case involves not deep reflection, but rather an agent, which is a class with access to a special API that allows it to change other Java code). This provisionality isn't hypothetical: Some libraries change the meaning of code outside them in arbitrary ways; we will later examine why they do this. Neither a person reading the code nor the platform itself – as it compiles and runs it – can fully trust that the code does what it says or that its meaning does not change over time as the program runs.

To allow developers to use encapsulation to truly establish integrity invariants, JDK 9 introduced modules. A module is a set of packages, some of which are designed to be used outside the module (they are exported), while others are designed to be used only inside the module (they are unexported). Everything in an unexported package is strongly encapsulated – deep reflection cannot break in. Similarly, the non-public elements of exported packages are also strongly encapsulated. Since x is a private field, strong encapsulation allows the parity invariant that is established locally in the Even class to be trusted globally.

Strong encapsulation gives integrity to encapsulation – it guarantees no one outside the class can assign x – and in so doing it gives integrity to the invariant that x is always even. Strong encapsulation offers a solid foundation to build on. Without it, code is a castle in the sand.

Other than making it easier to establish business-logic invariants important for a program's correctness, strong encapsulation is beneficial for three general reasons:

  • Maintainability: Strong encapsulation protects the integrity of code as it evolves. When evolving Java code, developers assume that private implementation details, encapsulated from clients, can be safely changed. For example, every developer assumes that changing the signature of a private method, or removing a private field, does not impact the class's clients.

  • Security: Strong encapsulation is essential for constructing any kind of robust security, whether in the application, a library, or the JDK. Suppose that a class restricts a sensitive operation as follows:

    if (isAuthorized())
        doSensitiveOperation();

    The restriction is robust only if we can guarantee that doSensitiveOperation is only ever invoked after a successful isAuthorized check. This invariant is established by the enclosing class declaring doSensitiveOperation as private and preceding all calls to it with an isAuthorized check. However, with deep reflection, doSensitiveOperation could be invoked from anywhere without an isAuthorized check, nullifying the intended restriction; even worse, an agent could modify the code of the isAuthorized method to always return true. Without strong encapsulation, a global analysis of the codebase, including the application's direct and transitive dependencies, is required to guarantee that security invariants hold in every circumstance. The circumvention of security invariants need not be intentional; a vulnerability in a library that breaks encapsulation, or in any other library that uses that library, jeopardizes any security invariant anywhere in the application.

  • Performance: In the Java runtime, certain optimizations assume that conditions that hold at the time the optimization is made hold forever. For example, the JVM can perform powerful optimizations when it knows that the value of a field will never change – not only constant-folding but also shifting the time of the initialization of final fields. This can only be done if the "finality" of final fields cannot be overridden by any mechanism (trusting the finality of final fields is a complicated subject, but strong encapsulation makes it easier). Additionally, further optimization of methods could be performed when all of their call sites are known. This could be guaranteed for strongly encapsulated methods, as they can only be called from inside their module. A tool like jlink could remove unused strongly-encapsulated methods at link time to reduce image size and class loading time. The guarantee that code may not change over time even opens the door to ahead-of-time compilation (AOT).

The Impact of Strong Encapsulation

Because Java since JDK 1.1 had allowed encapsulation to be broken via deep reflection, a number of libraries came to depend on the ability to break it. Even though most Java developers assumed encapsulation was working silently to protect their invariants and thereby the correctness of their code, a small number viewed encapsulation as an inconvenience to be worked around. They viewed the lack of integrity-by-default as a feature, not a bug. The reasons for breaking encapsulation were varied:

  • A client may desire functionality that isn't exposed through an API. For example, a client of the Even class may want to implement a method, Even add(Even a, Even b), that returns a new Even object whose value is the sum of a's and b's. Finding it hard to implement add by calling public methods of the Even class, the programmer opts to employ deep reflection to set the x field of the resulting Even instance directly. Many libraries use deep reflection to access JDK classes whose APIs were not intended for general use, such as the classes sun.misc.BASE64Encoder and Base64Decoder, the package sun.security.x509, the packages under com.sun.net.ssl, and the packages under com.sun.image.codec.jpeg. Developers encroached on the encapsulation of JDK internals because it was convenient; the resulting loss of integrity was less concerning than, for example, the addition of a dependency on Apache Commons.

  • Internal access may be needed to work around a bug before it is fixed. This, for example.

  • Internal functionality could offer better performance. For example, a client of the Even class might want to increment the counter by 100; finding fifty calls to incrementByTwo too slow, they use deep reflection to update the x field directly. As another example, libraries use sun.misc.Unsafe to compare-and-set a field atomically and quickly.

JDK 9 accommodated these libraries by only enforcing strong encapsulation at compile time; meanwhile, at run time, deep reflection was permitted, with "illegal reflective access" warnings to encourage maintainers to prepare libraries for strong encapsulation. Official replacements for the internal JDK classes above were added to the JDK, massively reducing the need to break encapsulation on modern JDKs. The VarHandle API and the Foreign Function & Memory API have made uses of sun.misc.Unsafe obsolete. Legacy bugs have been fixed so it is exceptionally rare to need to break encapsulation to work around them. Library developers wishing to target both new and old JDKs can easily do so using a Multi-Release JAR.

In 2021, JDK 16 began enforcing strong encapsulation at run time, turning the warnings into errors. Applications that encounter access errors due to encapsulation-breaking libraries must update them to versions that don't access JDK internals.

Disabling Strong Encapsulation

As a practical matter, some libraries haven't been updated to run on JDK 16 and above, but it's necessary to run them on JDK 16 and above anyway. The circumstances for breaking encapsulation, unfortunately, persist.

In addition, there are some tools and libraries whose functionality fundamentally operates beyond encapsulation boundaries. Here are a few examples:

  • White-box testing and related techniques, such as mocking, require direct access to encapsulated code and may even require changing its internal logic. This use case is only relevant during development, not production.

  • Frameworks may offer functionality, such as dependency injection, that requires operating beyond the encapsulation boundaries of their client classes.

  • Application Performance Monitoring (APM) requires instrumenting the internals of arbitrary code to emit tracing events by utilizing a JVM TI agent or a Java agent.

To balance the need for integrity with both the circumstantial, convenience uses of JDK internals and the essential uses, Java gives the user – the application's owner (typically its author, maintainer, or deployer) – the final say on which strong encapsulation boundaries are in place and which should be ignored. This freedom is offered under the guiding principle that the ability of one component to encroach on the boundaries of another must be explicitly granted by the application. Libraries cannot choose to obtain encapsulation-busting "superpowers" without the knowledge and consent of the application's owner.

Integrity by default, therefore, means that integrity may be broken – but only with the user's consent.

This consent can be granted as follows:

  • For temporary uses of convenience, the application can employ the --add-opens or --add-exports command-line options to allow code in one module to disable strong encapsulation and access strongly encapsulated classes and members in another module. This should only be done as a last resort. If an application's startup script contains, for example:

    --add-opens java.base/java.lang=ALL-UNNAMED
    --add-opens java.base/java.util=ALL-UNNAMED

    then it is a red flag indicating that libraries on the class path have not been kept up-to-date and are not portable to modern JDKs.

  • For white-box testing of code in user modules, build tools and testing frameworks should automatically emit --add-exports, --add-opens, and --patch-module for the module under test, as appropriate (for example, patching the module under test with the contents of the test module allows the testing of package access methods).

  • Frameworks should not rely --add-opens but rather have their client classes grant them encapsulation-breaking privileges. This could be done either declaratively in the client module with opens pkgName to acmeFramework (the framework may then transfer that permission with Module.addOpens), or programmatically with an appropriate MethodHandle.Lookup, a capability object that captures the client class's own access permissions. For example, a client class could grant such privilege in a static initializer as follows: static { AcmeFramework.grantAccess(MethodHandles.lookup()); }.

  • APM tools should require the application to deploy their agents with the -javaagent or -agentlib option. This explicitly grants the agent permission to instrument and modify classes. Mocking libraries that employ an agent to change classes' behavior should do the same.

Integrity requires that libraries must not encroach on other components without the application's consent; otherwise, the boundaries on the map -- and so the attack surface area of the application, its maintenance risk, and the optimizations that can be performed -- would be unknowable. When only the application is permitted to explicitly grant "superpower" privileges, the application's authors are able to better judge what risks affect them and to better control the attack surface area of the application. The command line serves as an auditable map of the codebase and its internal encapsulation boundaries that the application draws as it wishes.

Disabling strong encapsulation imposes risks:

  • A library, however well-meaning, that is granted the privilege to break the strong encapsulation of the JDK's modules is able to make use of internal JDK classes that are not subject to backward compatibility, making it non-portable. Such a library may break without warning on any release of the JDK (including patch releases) – as it may use, say, a private method whose signature has changed – and so poses a maintenance risk for an application that uses it.

  • Some performance optimizations may be hard or impossible to do when the application's author chooses to ignore a module's boundaries.

  • Strong encapsulation provides security bulkheads that restrict a vulnerability in one component from affecting others. Granting access permissions can make the application vulnerable as discussed above. If library A is granted the permission to perform deep reflection on the package where doSensitiveOperation happens to reside and library B employs library A, a vulnerability in B may allow a remote attacker to direct A to call doSensitiveOperation without the access check.

  • These risks accrue when the list of --add-opens isn't properly documented and maintained. Indeed, command-line options may be perpetuated by habit even when they're no longer needed (an application could upgrade a library that used to require a particular --add-opens but no longer does, and the option is not removed).

Overall, the burden of responsibility imposed on application maintainers who find themselves having to maintain encapsulation-disabling permissions is nowhere near as high as the cost that lacking integrity by default places on the platform and the ecosystem. A palpable demonstration of that cost was the difficulty many applications experienced when migrating from JDK 8 to later versions, which was predominantly caused by non-portable libraries.

The experience of the past few years has shown that the ecosystem is able to adapt to strong encapsulation -- at least of the JDK itself. Most Java code, which resides in applications, has never had much need to directly access JDK internals; high-level libraries and frameworks have similarly rarely reached into the innards of the JDK. Code that breaks encapsulation is usually found in low-level libraries that would normally be transitive dependencies of applications, and many libraries that had previously depended on JDK internals have stopped doing so. The impact on the ecosystem has mostly been that applications were required to upgrade their dependencies. Simultaneously, the burden placed on applications to grant libraries "superpower" privileges has put pressure on libraries to reduce their reliance on deep reflection and similar capabilities.

Beyond Deep Reflection

Integrity by default has not yet been achieved because strong encapsulation is not yet universal in the Java Platform. Some APIs allow any library to surreptitiously claim integrity-violating superpowers for itself, without the application's explicit consent, and use these superpowers to break encapsulation. Any library can:

  • Use sun.misc.Unsafe to access and modify private fields.

  • Load a native library that employs JNI to call private methods and set private fields. (The JNI API is not subject to access checks.)

  • Load an agent that changes code in a running application, using an API intended for tools only.

It is worth mentioning that sun.misc.Unsafe is able to break not only strong encapsulation but even Java's most foundational integrity mechanisms, mentioned earlier. For example, a library using Unsafe can access arrays without bounds checking, and can access an object that has been deallocated by the garbage collector; accordingly, a program utilizing Unsafe may have undefined behavior. Much of the same applies to programs which make use of native code via JNI or the "Linker" component of the Foreign Function & Memory API, although that undefined behavior is caused not by Java code but by native code.

These APIs mean that Java does not yet provide integrity by default. Invariants can be trusted neither by people nor by the platform itself. In particular, security can only be achieved with a difficult, often infeasible, global analysis of the application and its dependencies, as a vulnerability in any direct or transitive dependency could potentially be exploited and turned into a gadget that circumvents any authorization check in the application. Additionally, application authors are unable to know whether one of their dependencies relies on internal implementation details of the JDK, making the application unable to easily upgrade a JDK version.

To attain our goal of integrity by default, we will gradually restrict these APIs and close all loopholes in a series of upcoming JEPs, ensuring that no library can assume superpowers without the application's consent. Libraries that rely on these APIs should spend the time remaining until they are restricted to prepare their users for any necessary changes.

Why Now?

An obvious question: Why has the Java Platform been progressing toward integrity by default over the past few years, putting obstacles in the path of some clever, occasionally-useful tricks, when applications managed fine without strong encapsulation for two decades?

The answer is that Java must adapt to changing circumstances and requirements:

  • The platform is able to enforce primitive integrity invariants – such as the invariant that all arrays are initialized – because those invariants are maintained in native code deep inside the JVM and are therefore unaffected by encapsulation-breaking capabilities of Java code. However, more and more of the Java runtime is being written (or rewritten) in Java. For example, legacy I/O and the implementation of reflection have been rewritten in Java, while the virtual thread scheduler is written in Java and the monitors used by synchronized code are expected to be rewritten in Java in the future. (These two are important to maintain the Java Memory Model.) In future, the JVM's JIT compiler may be written in Java; breaking its encapsulation could violate any and all invariants made by the platform. In a nutshell, even the integrity of the platform's basic operations is increasingly reliant on strong encapsulation.

  • We had to make the JDK more maintainable and remove obsolete packages to be able to add new features without drowning in maintenance. Not only had the JDK itself become a Big Ball of Mud, but entire layers of libraries that reached into the JDK's innards threw themselves into the same sticky mess. That resulted in a serious evolution problem as the ecosystem ossified around a specific JDK version, which manifested in the difficulty migrating from JDK 8 to later versions of the JDK. Continuing to evolve the JDK, let alone at a faster pace, would have created such difficulties with every release, forever. The choice was between inflicting migration pain just once more by encapsulating the JDK's internals and stopping the evolution of Java.

  • Java's primary security threats have shifted from untrusted code running in the client to remote attacks on servers, which made the Security Manager an ill-suited solution. But we need a mechanism to allow the construction of robust security in layers above the JDK. (Because it is essential for security, the Security Manager did offer strong encapsulation, though not by default, and configuring it correctly in practice was difficult).

  • There is a growing demand for performance optimization of startup time and image size that are important for deploying Java applications in some emerging environments. Such optimizations require that code does not change its meaning from build time to runtime.

In short: The evolution of the JDK caused serious migration issues, there was no practical mechanism that enabled robust security in the current landscape, and new requirements could not be met.

Despite the convenience that lack of integrity has offered to "superpowered" libraries, the situation is untenable. Strong encapsulation is the linchpin of the solutions. The effort to add strong encapsulation to Java began in the 2010's, but its importance is becoming clearer with every passing year, so the effort continues.

Conclusion

  • Integrity is a solid foundation for the Java Platform and its vast ecosystem. It is a prerequisite for maintainability, robust security, and a number of optimizations that are in growing demand. Integrity by default has not yet been achieved due to loopholes that allows a library to break strong encapsulation without the application's explicit consent.

  • Integrity can be the default. The last few years have proven that the vast majority of code does not require breaking encapsulation. In special circumstances, it is useful to selectively disable encapsulation and the Java platform allows it, but only with the user's consent so that risks can be considered.

  • Integrity must be the default. We have seen the effect of it not being the default when, prior to strong encapsulation, libraries reaching for JDK internals ossified the ecosystem around a particular JDK version, making upgrades difficult.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK