Untold Engine Progress Update – New Editor and VisionOS Support!

This past couple of months have been amazing for the Untold Engine — from getting its first contributor and sponsorship to adding VisionOS support.

Let me tell you all about it.

Engine & Editor

You may recall that I had both the core and the editor integrated tightly in the engine. It worked nicely, but the coupling was going to give us headaches in the future.

Thanks to the effort of our first contributor miogds, the core of the engine and the editor are now de-coupled.

So, this is the new architecture of the engine:

  • Core: Handles the runtime — rendering, physics, ECS, and all engine systems.
  • Editor: A dedicated app for scene creation, entity manipulation, and asset management.
 

Untold Engine - Core

 

This separation makes development cleaner, more modular, and sets the stage for headless or custom integration workflows.

Additionally, the core engine will continue in its original repository UntoldEngine, while the editor now lives in a new, dedicated repo UntoldEditor.

 

Untold Engine Editor

 

Unit Tests & Workflows

I've also been working on making the Untold Engine repository more professional.
This includes adding unit tests, GitHub Actions workflows, and automatic formatting and linting.

My hope is that these improvements will make contributing to the project much easier and more reliable.

 
 

Website & Documentation

Another area of progress has been the new website and documentation.
The documentation site covers how to install the engine, explore the APIs, and contribute to development.

You can check it out here: Untold Engine

Each engine release will include its own version of the docs for consistent developer onboarding.

 
 

VisionOS Support

Lastly, the engine now compiles and runs on the VisionOS simulator — the first step toward supporting Apple’s Vision Pro platform.

However, this is still early support — the engine has not yet been tested on an actual Vision Pro device.
We’ve already received an issue report related to Vision Pro hardware, so if you happen to have one and would like to help debug, you’re more than welcome to contribute!

 
 

Thanks for reading.

Debugging a Flickering Issue Caused by Asynchronous Culling

After implementing frustum culling in the Untold Engine, performance improved, but right away I noticed flickering. It didn’t happen every frame, but it was noticeable whenever most of the models were in view.

So, I opened up Instruments to profile the issue. I noticed warnings that the engine was holding on to the drawable too long. I tried restructuring things to hold on to the drawable for as short a time as possible, but nothing helped.

According to Instruments, the engine was not CPU-bound or GPU-bound. There was no clear indication of the root cause of the flickering.


Digging Deeper

At that point, I decided to record a short video of the issue. I slowed it down and went frame by frame. What I saw wasn’t the usual kind of flickering—it was different.

  • Frame 1: a certain set of models was visible.
  • Frame 2: a completely different set was visible.
  • Frame 3: some disappeared, others suddenly appeared.

Models were popping in and out, almost as if something was out of sync.

This was a huge hint: it looked like a data race.


The Culprit

Looking at the code confirmed it.

In the frustum culling command buffer completion handler, I was updating the visibleEntityId array. This array held all the entities that passed the culling test.

The problem was that the GPU calls this completion handler asynchronously, while the CPU was already using that same array during the rendering passes (shadow and geometry).

 
 

In other words, the CPU was iterating over visibleEntityId at the same time the GPU might be modifying it.

Classic data race.


The Fix: Triple Buffering

The solution was to add a triple-buffered visible entity list.

During culling, the GPU writes results into buffer n+1.

 
 

During rendering, the CPU continues to read from buffer n.

 
 

When the frame finishes and the render command buffer’s completion handler triggers, I update the index so the CPU reads from the freshly written buffer n+1 on the next frame.

This guarantees that the CPU never reads data being modified by the GPU. The renderer always sees a stable snapshot of the visible entities.


The Result

With triple buffering in place, the flickering disappeared instantly. Models no longer popped in and out between frames.

This bug was a good reminder: sometimes what looks like a rendering artifact isn’t a math error at all, but a synchronization issue between CPU and GPU.


Lesson Learned

Whenever the GPU produces results asynchronously, the CPU should never iterate over those results directly. Always work with a snapshot. Triple buffering (or even double buffering) is a small architectural change that guarantees stability and avoids subtle bugs that can masquerade as rendering issues.

This experience reinforced for me how crucial synchronization and data ownership are when building GPU-driven systems—sometimes the hardest-looking bugs aren’t about shaders or math, but about who’s allowed to touch the data, and when.

Deferred Entity Destruction in ECS: A Mark-and-Sweep Approach

I found a bug in the Untold Engine in the weirdest way possible. After merging several branches into my develop branch, I decided to run a Swift formatter on the engine. Three files were changed. I ran the unit tests, they all passed, and then I figured I’d do a final performance check before pushing the branch to my repo.

So, I launched the engine, loaded a scene, and then deleted the scene.

The moment I did that, the console log started flooding with messages like:

  • Entity is missing or does not exist.
  • Does not have a Render Component.

This was the first time I had ever seen the engine behave like this when removing all entities from a scene. My first reaction was: the formatter broke something.

But the formatter’s changes were only cosmetic. There was no reason for this kind of bug.

At that point I was lost. So, I asked ChatGPT for some guidance, and it mentioned something interesting: maybe the formatter’s modifications had affected timing. That hint got me thinking.

After tinkering a bit, I realized the truth: this bug was always there. The formatter just exposed it earlier.

The Real Problem

My engine’s editor runs asynchronously from the engine’s core functions. When I clicked the button to remove all entities, the editor tried to clear the scene immediately — even if those entities were still being processed by a kernel or the render graph.

In other words, the engine was destroying entities while they were still in use. That’s why systems started complaining about missing entities and missing components.

The Solution: A Mini Garbage Collector

What I needed was a safe way to destroy entities. The fix was to implement a simple “garbage collector” for my ECS, with two phases:

  • Mark Phase – Instead of destroying entities right away, I mark them as pendingDestroy.
  • Sweep Phase – Once I know the command buffer has completed, I set a flag. In the next update() call, that flag triggers the sweep, where I finally destroy all entities that were marked.

This way, entity destruction only happens at a safe point in the loop, when nothing else is iterating over them.

Conclusion

What looked like a weird formatter bug turned out to be a timing bug in my engine. Immediate destruction was unsafe — the real fix was to defer destruction until the right time.

By adding a simple mark-and-sweep system, I now have a mini garbage collector for entities. It keeps the engine stable, avoids “entity does not exist” spam, and gives me confidence that clearing a scene won’t blow everything up mid-frame.

Thanks for reading.

From 26.7 ms to 16.7 ms: How a simple Optimization Boosted Performance

In my previous article, I talked about my attempts to improve the performance of the Untold Engine. Even after adding GPU frustum culling to reduce the CPU workload, the engine was still CPU-bound — stuck at around 26.7 ms per frame.

Profiling with Xcode Instruments pointed the finger at Metal’s encoder preparation, which appeared to take ~15 ms. Based on that, my next move seemed obvious: switch to a bindless rendering.

What does that mean? Instead of rebinding textures and material properties for every draw call, I would move everything into a single argument buffer. Each draw would reference materials by index. In theory, this should drastically cut CPU overhead and pair nicely with GPU-driven culling.

But reality didn’t match theory. After spending days moving to a bindless model, I ran the engine with 500 models — and the performance needle didn’t budge. In fact, things got worse: encoder prep time increased from ~15 ms to ~17 ms.

You can imagine my disappointment. But I kept digging. And then I found the real bottleneck. Instruments showed the CPU was spending almost 9.5 ms just preparing data for GPU frustum culling.

So the encoder wasn’t the problem after all. As I dug into the code, I discovered the true culprit: a single function that queries all entities with specific component IDs.

 
 

Here’s what was happening:

👉 My component mask was stored as an array of 64 booleans. Every time I checked an entity, the code looped through all 64 slots, read from two arrays, and branched on each one. With 500 entities, that meant tens of thousands of tiny checks every single frame. No wonder the CPU was choking.

The fix? Replace the boolean array with a single 64-bit integer and use a bitwise AND. That collapses the entire check into just two instructions. Here’s the new function:

 
 

That one change dropped the CPU frame time from 26.7 ms down to 16.7 ms. The GPU frame time sits at 9.3 ms.

In other words, the engine now runs at a solid 60 fps.

I’m happy with the results: the engine is no longer CPU-bound or GPU-bound.

But I’m not done yet. The next step is implementing occlusion culling — and I’m excited to see how far I can push performance.

Thanks for reading.

Profiling My CPU-Bound Game Engine: 50% Faster Encoder Setup

After adding several cool features to the Untold Engine (did I mention I added a console log), it was time to shift gears and focus on performance.

 
 

At the moment, rendering around 214,000 vertices (500 models), the engine was only hitting 29.51 FPS. That’s rough for real-time rendering. Clearly, something needed fixing.

Current State of the Engine: FPS 29.51

Profiling the Problem

I fired up Xcode’s GPU tools and the results were clear: the engine is CPU-bound.

  • CPU Frame Time: ~33.9ms
  • GPU Frame Time: ~8.1ms

So while the GPU was waiting around, the CPU was overloaded preparing work.

Untold Engine is CPU-Bound

Looking deeper with Instruments, I found the major culprit: Metal Encoder Setup Time. The CPU was spending ~31ms every frame just encoding commands into the GPU.

Metal Encoder Preparation

Why So Slow?

The bottleneck came from the Shadow and Geometry passes. Each frame, the CPU had to prepare encoders and push all material data for every model—base color, roughness, metallic textures, etc. With hundreds of models, this ballooned into a huge overhead.

First Fix: GPU Frustum Culling

The engine didn’t have any form of culling, so I decided to implement Frustum Culling. To avoid piling more work on the CPU, I pushed this logic onto the GPU.

The approach:

  • Construct the camera frustum.
  • Compute each entity’s world-space AABB.
  • Send bounding boxes to the GPU.
  • GPU checks if each AABB is inside the frustum.
  • If visible, the entity ID is written into an array via an atomic add.

The key here is that once the GPU returned the list of visible entities, the CPU only needed to encode draw calls for those entities—cutting encoder overhead. It’s a brute-force implementation, but it worked.

The Results

From the same view location,

  • FPS jumped from 29 → 37.
  • CPU Frame Time: 33.9ms → 26.7ms
  • Metal Encoder Setup Time: 31.0ms → 14.6ms
  • GPU Frame Time: 8.1ms (unchanged)

Improved FPS

Engine still CPU-Bound but is an improvement

Summary

Here’s the before-and-after snapshot:

  • FPS: 29 to 37 (+27% improvement)
  • CPU Frame Time: 33.9 ms to 26.7 ms (Encoder bottleneck reduced)
  • GPU Frame Time: 8.1 ms to 8.1 ms (Unchanged)
  • Metal Encoder Setup Time: 31.0 ms to 14.6 ms (Biggest gain)

Metal Encoder Duration decreased

Where Things Stand

The engine is still CPU-bound, but it’s in a noticeably better state than it was a week ago. By filtering out invisible objects early, I reduced the CPU’s workload and freed up encoder time. It’s not at 60 FPS yet—but the path forward is clearer.

What’s Next

Frustum culling was just the first step. To keep pushing toward 60 FPS, here are the next optimization I plan to explore:

  • Metal Bindless Rendering – Instead of rebinding textures and material properties for every draw, I’ll move to a bindless model. All materials will live in a single argument buffer, and each draw will reference them with a simple index. This should drastically cut down CPU encoder overhead and pair nicely with GPU-driven culling.

That’s where the engine stands today: better than last week, not yet where it needs to be. But the direction is clear, and each step forward is one step closer to real-time rendering.

Thanks for reading.