The Mechanism Keeping Your Java Memory Safe: An Introduction To Garbage Collection
What you should know about garbage collection in the JVM and how it works.
⭐️ Introduction
I have been working for years with the Java programming language and recently decided to dive deeper into JVM internals to understand better how the programming language of my choice operates. A significant part of this exploration is garbage collection (GC), which enables Java to be a memory-safe language. Every thread in Java has some local data, and objects are generally stored on the heap. While stack data is popped after the thread is processed, heap data may remain longer. Following the announcement from the United States office recommending the use of memory-safe languages, even C++ might introduce garbage collection. Historically, developers managed memory allocation and deallocation themselves, posing a significant risk to application safety.
👨💻 Basics of JVM Memory Management
The heap memory managed by the JVM is structured into segments to enhance application performance:
Young Generation - This is where new objects are initially allocated. It includes:
Eden Space: Most new objects start here.
Survivor Spaces: Comprising two parts, S0 and S1, objects that survive GC cycles in Eden are moved to one of these spaces.
Old Generation - Objects that have survived multiple GC cycles in the Young Generation are transferred here. This space is for objects that are likely to be needed for a longer duration.
Permanent Generation or Metaspace -
This space existed in older Java versions but was replaced by Metaspace in Java 8. PermGen stored metadata required by the JVM, such as class structures and method data, and had a fixed size.
Metaspace, the successor, is managed on native memory and allows for dynamic growth and improved performance.
These areas enable the JVM to manage memory efficiently, quickly collecting short-lived objects and minimizing performance costs for long-lived ones. The transition from PermGen to Metaspace was a critical step in modernizing the JVM to accommodate growing memory needs.
💡 Understanding Garbage Collection Mechanisms
Mark and Sweep
Perhaps you are already familiar with the Mark-and-Sweep algorithm, also utilized in Go GC. To reclaim memory no longer used by objects, we have two phases:
Mark: Starting from what is called a GC root, directly accessible in the program, we traverse through all references from there and mark each reachable reference.
Sweep: The garbage collector then moves through the heap and sweeps all objects that were not marked, considering them garbage as unreachable by the running application.
Generational Garbage Collection
Based on the "weak generational hypothesis" which suggests that most objects die young in automatically managed memory languages, Generational Garbage Collection capitalizes on this by:
Young Objects: Concentrating on the Young Generation enhances efficiency as the vast majority of objects are short-lived, allowing for rapid space clearance.
Generations: Objects that persist in the Young Generation are eventually moved to the Old Generation. This separation aids efficiency as with increasing cycles, the Old Generation does not need to be collected as often.
📌 Garbage Collection Triggers
Several events can trigger the garbage collection in the JVM:
Eden Space becomes full: Commonly, when no more room is available in the Eden Space in the Young Generation, a new GC cycle is initiated to clear space and either move objects to the survivor space or delete them.
System.gc(): Part of the standard library, the
System.gc()
call suggests a garbage collection cycle, though it is not guaranteed that the JVM will comply.Allocation failures: If there is insufficient memory in the heap to allocate a new object, this may trigger a new attempt to free up some memory with another cycle.
Old Generation becomes full: Less frequently, but occasionally, the Old Generation becomes full and starts a new cycle that includes it.
⚡ Advanced Concepts
GC Roots and Accessibility: As mentioned in the Mark-and-Sweep algorithm, GC roots are the starting points for garbage collection.
Local variables
Active threads
Static variables
JNI References are used to access Java Objects from native code. The GC must know if an object is unreachable here too.
A simple Java application has the following GC roots: local vars in the main method, the main thread, and static vars of the main class:
public class Main {
private final static int TEST = 2;
public static void main(String[] args) {
int a = 1;
}
}
Object Pointers (OOP): An OOP is a reference to an object in the heap memory. The use of OOPs is crucial for optimizing memory needs as it helps reduce the amount of memory needed to manage object references. Some JVM implementations use compressed pointers to achieve that.
Stop-The-World (STW): An STW event pauses the execution of the application. During this, all threads are paused so garbage collection can be performed. This can impact performance, especially in low-latency applications. To mitigate the impact, developers utilize Garbage Collectors with parallel or incremental processing to minimize the frequency and duration of STW events. There has been quite an evolution from serial garbage collectors to parallel garbage collectors, and nowadays we have the G1 and Z Garbage Collector. G1 partitions the heap into regions to minimize STW pauses, while the Z Garbage Collector aims to stop threads for at most 10 ms by using reference coloring.
Common Misconception: Do you think memory leaks are possible in the JVM? They are because the GC can only remove objects with no reachable references. But if not used objects are still reachable, they remain in the memory. If such leaks accumulate in big applications with complex data structures or the wrong use of callbacks and listeners, we can face problems.
The following code snippet can produce a memory leak as the reader is not properly closed.
try {
InputStreamReader reader = new InputStreamReader(System.in);
reader.read();
} catch (IOException e) {
e.printStackTrace();
}
🛠️ Practical Tips and Tools
Monitoring Tools: Here are some tools you can use to check JVM memory usage
VisualVM: VisualVM integrates several command-line JDK tools and offers profiling capabilities.
JConsole: An Oracle tool for monitoring the JVM that complies with the JMX specification.
-Xlog:gc: This unified logging option logs all levels to the standard output.
Best Practices: Generally, you should follow good coding practices to write memory-efficient code, such as using primitive data types and avoiding unnecessary objects. Close and shut down resources appropriately; the
try-with-resources
statement handlesAutoCloseable
objects like theInputStreamReader
above for us so that the memory is cleaned up properly.
🏁 Conclusion
Understanding the basics of the garbage collection process is crucial. While you generally do not need to worry about GC in your daily development activities, knowledge about GC and the JVM is key for fine-tuning low-latency applications or troubleshooting errors. Experiment with different JVM flags to see the effects of various garbage collectors. Additionally, logging GC events can be a low-cost and valuable tool for troubleshooting.
⏱️ TLDR;
JVM Heap Segmentation: The Java Virtual Machine (JVM) manages heap memory in distinct segments to optimize performance.
Young Generation: For new objects, split into Eden and two Survivor spaces (S0 and S1).
Old Generation: For objects that survive multiple garbage collection (GC) cycles, intended for long-term storage.
Metaspace (replaces PermGen in Java 8): For class metadata and grows dynamically, enhancing performance.
Garbage Collection Mechanisms:
Mark and Sweep: Identifies reachable objects (mark) and clears the unreachable ones (sweep).
Generational GC: Based on the hypothesis that most objects die young, improving efficiency by segregating object lifecycles.
GC Triggers:
Full Eden Space, explicit
System.gc()
calls, allocation failures, and a full Old Generation can initiate GC.
Advanced Concepts:
GC Roots: Starting points for GC, including local variables, active threads, and static variables.
Object Pointers (OOP): Essential for memory optimization; some JVMs use compressed pointers.
Stop-The-World (STW) Events: Pauses all threads for GC, potentially impacting performance. Modern JVMs use parallel or incremental collectors to minimize these events.
Practical Tips:
Utilize tools like VisualVM and JConsole for JVM monitoring.
Adopt best practices like using primitive data types and the try-with-resources statement to manage resources efficiently and avoid memory leaks.
📚 Resources
🔔 Connect with me on LinkedIn.