9

Breaking the Limits: How Folia Made 1,000 Player Minecraft Server a Reality

 1 year ago
source link: https://cubxity.dev/blog/folia-test-june-2023
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Breaking the Limits: How Folia Made 1,000 Player Minecraft Server a RealityBlog

Mon Jun 26 2023

Breaking the Limits: How Folia Made 1,000 Player Minecraft Server a Reality

Check out the impressive results of the large-scale Folia test that took place on June 18th, 2023. Learn more about our findings and technical challenges in this post.

image?url=https%3A%2F%2Fmedia.graphassets.com%2FikYomE2Q8iD7Hsm9MB9g&w=3840&q=75&dpl=dpl_H7XmF3Naqp53pTzV2RFSK1BFhTrZ

Introduction

Folia emerges as a promising fork of Paper, boasting an innovative implementation of regionalized multithreading on the server. Traditional Minecraft servers have always faced limitations when it came to player capacity, often struggling to support more than a few hundred players at a time. This is because Minecraft servers primarily rely on a single thread to handle all game logic and player interactions. 

Myself, Spottedleaf, and Michael conducted this test to evaluate and analyze Folia's performance and stability under various conditions. We would like to thank Tubbo for streaming this test event.

Our Test

We wanted to conduct a test with Folia and see how it can perform on “regular” hardware and configurations. The previous public test ran on absurdly powerful hardware, which would not be realistic in many of the use cases. However, it's important to note that this test only provides a glimpse into the potential of Folia and its regionalized multithreading capabilities.

The purpose of this test was to gather as much data as possible, while testing different game configurations and seeing how they performed. 

Configuration

Hardware

Neofetch on our test machine.

Neofetch on our test machine.

Our test was conducted on Hetzner’s AX102 with the following configuration:

  • CPU: AMD Ryzen 9 7950X3D
  • RAM: 128GB DDR5
  • Storage: 2 x 1.92 TB NVMe SSD in RAID 1
  • Networking: 10Gbps NIC and uplink

Software

  • Distribution: Debian Bookworm (12)
  • Kernel: 6.1.0-9-amd64
  • Java: 21-testing
$ uname -a
Linux test-fsn1-game01 6.1.0-9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.27-1 (2023-05-08) x86_64 GNU/Linux

$ java -version
openjdk version "21-testing" 2023-09-19
OpenJDK Runtime Environment (build 21-testing-builds.shipilev.net-openjdk-jdk-shenandoah-b110-20230615)
OpenJDK 64-Bit Server VM (build 21-testing-builds.shipilev.net-openjdk-jdk-shenandoah-b110-20230615, mixed mode, sharing)

Minecraft

Our Minecraft server was running Minecraft 1.20.1 on Folia build 09d8e7d (Oops). The server ran with a 100 GiB heap allocated, and Shenandoah GC was used as the garbage collector. Furthermore, Spottedleaf and Michael decided that we should try generational Shenandoah GC in OpenJDK 21. 

Our conversation about Java 21.

Our conversation about Java 21.

Paper Configuration

config/paper-global.yml:

chunk-loading-basic:
  player-max-chunk-generate-rate: 40.0
  player-max-chunk-load-rate: 40.0
  player-max-chunk-send-rate: 40.0
chunk-system:
  io-threads: 2
  worker-threads: 1
misc:
  region-file-cache-size: 512
proxies:
  proxy-protocol: true
thread-regions:
  threads: 6

config/paper-world-defaults.yml:

environment:
  treasure-maps:
    enabled: false

Spigot Configuration

settings:
  netty-threads: 6

Bukkit Configuration

spawn-limits:
  monsters: 9
  animals: 7
  water-animals: 4
  water-ambient: 7
  water-underground-creature: 3
  axolotls: 3
  ambient: 4
ticks-per:
  monster-spawns: 30
  water-spawns: 30
  water-ambient-spawns: 30
  water-underground-creature-spawns: 30
  axolotl-spawns: 30
  ambient-spawns: 30

Minecraft Configuration

allow-nether=false
hide-online-players=true
max-players=1001
network-compression-threshold=-1
spawn-protection=0
simulation-distance=5
view-distance=8

JVM Flags

-Xms100G
-Xmx100G
-XX:+AlwaysPreTouch
-XX:+UnlockDiagnosticVMOptions
-XX:+UnlockExperimentalVMOptions
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseLargePages
-XX:LargePageSizeInBytes=2M
-XX:+UseShenandoahGC
-XX:ShenandoahGCMode=generational
-XX:-ShenandoahPacing
-XX:+ParallelRefProcEnabled
-XX:ShenandoahGCHeuristics=adaptive
-XX:ShenandoahInitFreeThreshold=55
-XX:ShenandoahGarbageThreshold=30
-XX:ShenandoahMinFreeThreshold=20
-XX:ShenandoahAllocSpikeFactor=10
-XX:ParallelGCThreads=10
-XX:ConcGCThreads=3
-Xlog:gc*:logs/gc.log:time,uptime:filecount=15,filesize=1M
-Dchunky.maxWorkingCount=600

JMX flags were stripped.

Initial Thread Allocations

  • GC: 3 concurrent
  • Chunk System IO: 2
  • Chunk System Worker: 1
  • Netty: 6
  • Region Threads: 6

Total: 18

Tools

  • UnifiedMetrics: Plugin used to export Minecraft server metrics.
  • Chunky: Plugin used to pre-generate chunks.
  • node_exporter: Used to export machine metrics.
  • VictoriaMetrics: Used to scrape and store metrics data (Prometheus compatible).
  • Grafana: Observability platform used to visualize and monitor metrics.
  • VisualVM: Tool used to monitor JMX metrics.

Methodology

The server was prepared with a 100k x 100k block pre-generated world. Our custom plugin distributed new players to the least-occupied region. We had predefined spawn points as shown below. The reasoning behind this was to prevent concentrated areas with a high number of players. Furthermore, Folia benefits from having multiple regions due to its regionalized multithreading implementation, allowing for better utilization of CPU resources and improved performance.

Spawn points plotted on a plane.

Spawn points plotted on a plane.

The test was presented as an event and was streamed by Tubbo. Players were spread into 49 different teams across the map. Each team consisted of around 20 players.

Results

The event started around 16:00 UTC and we were able to gather 1,000 players on the server. 

1,000 players shown in Grafana.

1,000 players shown in Grafana.

Shortly after we unfroze the players, we experienced some lag. This lasted for ~1 minute or so. We suspected that this may be due to the sudden player movements overwhelming the Netty threads. We peaked at almost 2 Gbps outbound then.

Network throughput graph on Grafana.

Network throughput graph on Grafana.

The server ran fine for a while until it didn't. We had 6 ticking regions, which were completely utilized. A normal Paper server ticks on a single thread, and can probably handle 100 players with optimized settings. By using the same logic, we should've had at least 10 threads for things to run smoothly. However, Folia has more overhead than a normal Paper server due to scheduling, so that should be kept in mind.

Output of /tps.

Output of /tps.

Despite the lag, the server ran surprisingly well. The server ran on a consumer-grade CPU and a commonly available hardware configuration. Sustaining 1,000 players at a playable TPS could've been possible with better thread allocations, but, we don't know for sure since we never got the chance to test that out. Our CPU usage was hovering around 10-14 logical cores out of 32. This meant that we could've potentially allocated more region threads and IO threads. However, we only had 16 physical cores available, so pushing the usage over 16 logical cores may have resulted in decreased performance, depending on how the workload is scheduled.

JVM CPU metrics in Grafana.

JVM CPU metrics in Grafana.

After a while, our server crashed. Preliminary analysis suggests that this may have been caused by an unforeseen bug in the custom patch implemented. The fix was supposed to be deployed before the test, but we built Folia from the wrong branch. Regardless, we also took the opportunity to increase the thread counts:

  • Netty: 6 → 10
  • Region Threads: 6 → 12

Soon after, the server was up and running with our patch applied. We had ~630 concurrent players and it performed well at a constant 20 TPS.

Smooth 20 TPS with 600 players.

Smooth 20 TPS with 600 players.

Output from /tps.

Output from /tps.

For fun, we enabled chat for a short duration and everyone got kicked with the following message:

Oops.

Oops.

This has not been investigated yet, but it is most likely related to incorrect handling of chat signing. It is unknown whether this is related to Folia.

We were using generational Shenandoah GC in Java 21. During the period when 1,000 players were online, we reached a maximum of ~7.9GB/s heap allocation and our GC was hovering around 2-3GB/s when averaged over a minute. 

Heap allocation graph.

Heap allocation graph.

GC throughput graph.

GC throughput graph.

During the entire test, our GC pauses were mostly fine. The median GC pause duration was ~3 ms. 

GC pauses graph.

GC pauses graph.

Conclusion

  • Folia ran surprisingly well on our hardware. We could've reached higher player counts if the threads were allocated accordingly.
  • We still don't have enough data to properly determine the optimal thread allocations.

What Next?

  • Performing another test on the same hardware with different thread allocations would be useful to determine the "optimal" settings.
  • Running more tests on similar hardware or other common configurations.

Links

Thanks To

  • Spottedleaf for developing and maintaining Folia, and the organization behind it, PaperMC.
  • Michael for assisting with the test.
  • Innit, Inc. for providing the hardware and resources.
  • Tubbo for engaging the audience and streaming the test.
  • Players who participated in the test!

Supporting Folia & PaperMC

Interested in supporting the development of Folia and PaperMC software? See sponsors.

Updated Sun Jun 25 2023

Cubxity

Written by Cubxity

Full-stack developer

Contact

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK