-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Labels
Description
Describe the bug, including details regarding any error messages, version, and platform.
(Note: This vulnerability was identified via Static Program Analysis targeting ThreadLocal misuses in long-lived thread pool environments.)
Description:
There is a severe ClassLoader leak risk (ClassLoader Pinning / Metaspace OOM) in org.apache.parquet.io.api.Binary$FromCharSequenceBinary.
Around line 264, a ThreadLocal is initialized using a method reference (Lambda):
private static final ThreadLocal<CharsetEncoder> ENCODER =
ThreadLocal.withInitial(StandardCharsets.UTF_8::newEncoder);
The Root Cause:
1. The lambda StandardCharsets.UTF_8::newEncoder generates a dynamic Supplier implementation at runtime. This generated class is loaded by the application's current ClassLoader (e.g., Spark/Flink UserClassLoader or a Web container's WebappClassLoader).
2. ThreadLocal.withInitial() returns a SuppliedThreadLocal, which holds a strong reference to this lambda instance internally via its supplier field.
3. Binary is a low-level core API without any explicit lifecycle hooks, meaning ENCODER.remove() is never called.
4. In big data computing engines or web containers, worker threads (Executors/NIO threads) are pooled and long-lived. When a user job is cancelled or an application is hot-redeployed, the worker threads survive. The retained SuppliedThreadLocal keeps the lambda instance alive, which in turn permanently pins the entire application ClassLoader, preventing garbage collection.
Impact:
Multiple job submissions, cancellations, or hot-redeployments in the same JVM will cause linear Metaspace inflation, inevitably leading to a java.lang.OutOfMemoryError: Metaspace.
Expected behavior:
Low-level data structures like Binary should not hold static references that can unintentionally pin dynamic classloaders. The encoding logic should be stateless to ensure safe class unloading.
Suggested Fix:
Remove the ThreadLocal cache entirely. The performance gain of caching CharsetEncoder here is heavily outweighed by the ClassLoader leak risk. A stateless conversion is much safer:
private static ByteBuffer encodeUTF8(CharSequence value) {
return ByteBuffer.wrap(value.toString().getBytes(StandardCharsets.UTF_8));
}
### Component(s)
_No response_Reactions are currently unavailable