Skip to content

[Bug] Potential ClassLoader Leak: ThreadLocal.withInitial lambda in Binary.java pins ClassLoader causing Metaspace OOM #3398

@QiuYucheng2003

Description

@QiuYucheng2003

Describe the bug, including details regarding any error messages, version, and platform.

(Note: This vulnerability was identified via Static Program Analysis targeting ThreadLocal misuses in long-lived thread pool environments.)

Description:
There is a severe ClassLoader leak risk (ClassLoader Pinning / Metaspace OOM) in org.apache.parquet.io.api.Binary$FromCharSequenceBinary.

Around line 264, a ThreadLocal is initialized using a method reference (Lambda):

private static final ThreadLocal<CharsetEncoder> ENCODER =
    ThreadLocal.withInitial(StandardCharsets.UTF_8::newEncoder);


The Root Cause:

1. The lambda StandardCharsets.UTF_8::newEncoder generates a dynamic Supplier implementation at runtime. This generated class is loaded by the application's current ClassLoader (e.g., Spark/Flink UserClassLoader or a Web container's WebappClassLoader).

2. ThreadLocal.withInitial() returns a SuppliedThreadLocal, which holds a strong reference to this lambda instance internally via its supplier field.

3. Binary is a low-level core API without any explicit lifecycle hooks, meaning ENCODER.remove() is never called.

4. In big data computing engines or web containers, worker threads (Executors/NIO threads) are pooled and long-lived. When a user job is cancelled or an application is hot-redeployed, the worker threads survive. The retained SuppliedThreadLocal keeps the lambda instance alive, which in turn permanently pins the entire application ClassLoader, preventing garbage collection.

Impact:
Multiple job submissions, cancellations, or hot-redeployments in the same JVM will cause linear Metaspace inflation, inevitably leading to a java.lang.OutOfMemoryError: Metaspace.

Expected behavior:
Low-level data structures like Binary should not hold static references that can unintentionally pin dynamic classloaders. The encoding logic should be stateless to ensure safe class unloading.

Suggested Fix:
Remove the ThreadLocal cache entirely. The performance gain of caching CharsetEncoder here is heavily outweighed by the ClassLoader leak risk. A stateless conversion is much safer:
private static ByteBuffer encodeUTF8(CharSequence value) {
    return ByteBuffer.wrap(value.toString().getBytes(StandardCharsets.UTF_8));
}

### Component(s)

_No response_

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions