I Tried Rendering This Archviz Scene in Vision Pro… Then This Happened

I tried rendering a full archviz bedroom scene on the Apple Vision Pro using the Untold Engine… and while the results looked immersive at first, I started noticing some strange XR rendering artifacts.

In this video, I test the scene live inside Vision Pro, explore issues like shimmering walls and unstable surfaces, and talk about some of the rendering challenges that start appearing when pushing real-time archviz scenes into XR.

This is one of the reasons I enjoy building the Untold Engine in public — not just showing what works, but also documenting the rendering problems and technical walls that show up along the way.

If you’ve worked on XR rendering before, or recognize what might be causing these artifacts, I’d genuinely love to hear your thoughts.

The Untold Engine is a Swift + Metal renderer focused on XR rendering, large scene streaming, and real-time graphics for Apple platforms.

Check out the Untold Engine: https://github.com/untoldengine/UntoldEngine

Can the Untold Engine Stream a Virtual City on Vision Pro?

This 3D cartoon city contains a large amount of geometry and assets — and today, I wanted to see if the Untold Engine could actually stream it into the Apple Vision Pro in real time.

The challenge is that XR devices like the Vision Pro are very memory constrained when rendering large-scale scenes. Loading everything at once can easily overwhelm the renderer or crash the application entirely.

So instead of loading the full city into memory, the Untold Engine dynamically streams assets in and out around the camera as the user moves through the environment.

In this video, I:

  • Import the city into the Untold Editor
  • Convert the scene into the Untold streaming format
  • Build and deploy the project to the Apple Vision Pro
  • Test the streaming system outdoors in a real park

During testing, I also discovered an important limitation with large-scale tile streaming that I’ll be improving in future updates.

The Untold Engine is a Swift + Metal renderer focused on real-time rendering and XR for Apple platforms.

🔗 Untold Engine: Untold Engine GitHub

🔗 Documentation: Untold Engine Docs

I Rendered an F1 Car in Real Life (Vision Pro) | Untold Engine

I rendered an F1 car in real time inside Apple Vision Pro — powered by my custom Swift + Metal renderer, the Untold Engine.

In this video, I walk through how to load and render complex 3D models directly in your environment using the Untold Engine.

If you're building XR / spatial computing apps (Vision Pro) or want to learn real-time rendering with Swift + Metal, this is for you.

🔗 Try the Untold Engine: https://github.com/untoldengine/UntoldEngine

What It Took to Finally Stream Assets Remotely in the Untold Engine

One of the features I wanted to add to the Untold Engine for a long time was remote asset streaming.

Streaming an F1 Car using the Untold Engine + Vision Pro

The challenge wasn’t the networking part. The issue was that the engine assumed everything was loaded, available, and resident in memory at all times. That model doesn’t work once you introduce streaming.

So instead of jumping straight into it, I focused on building the systems that would make it possible.

This is a breakdown of the systems I had to implement before remote streaming finally worked — including on Vision Pro.


Batching System

The first thing I had to fix was draw calls.

Large scenes (especially architectural ones) quickly became CPU-bound because every mesh resulted in a draw call. The GPU was fine, but the CPU couldn’t keep up.

Batching helped reduce the number of draw calls by grouping meshes together.

But this introduced a constraint: once meshes are batched, you lose flexibility. You can’t easily move or unload individual pieces anymore.

This forced me to think more about how meshes are grouped, not just rendered.


LOD System

After batching, the next issue was rendering too much detail.

Even if draw calls were under control, I was still pushing too many vertices and fragments.

The LOD system allowed the engine to swap meshes based on distance. That helped performance, but more importantly, it introduced the idea that not everything needs to be at full quality all the time.

This was the first step toward selective rendering.


Geometry Streaming

Up to this point, everything was still loaded at startup.

That doesn’t scale.

The Geometry Streaming system allowed the engine to load and unload meshes dynamically. This changed several assumptions:

  • Meshes might not exist when requested
  • Systems need to handle missing data
  • Rendering depends on availability

This is where the engine stopped being static.


Mesh Resource Manager

Once streaming was introduced, I needed a system to manage it.

The Mesh Resource Manager became responsible for tracking loaded meshes, handling GPU buffers, and making sure the same asset isn’t loaded multiple times.

Without this, things get messy fast. You end up duplicating data or unloading things that are still in use.

This system made ownership clear.


Streaming a Robot mesh using the Untold Engine + Vision Pro

Memory Budget Manager

Streaming only works if you enforce limits.

The Memory Budget Manager sets a fixed budget and ensures the engine stays within it. When memory usage gets too high, assets need to be evicted.

This introduced a new kind of problem: deciding what to remove.

The engine now has to constantly answer:

What is safe to unload right now?

This is especially important for Vision Pro, where memory constraints are much tighter.


Tile Streaming

Even with streaming in place, I ran into another issue.

Some meshes were just too big.

For example, a single mesh could represent a large portion of a building. That makes it difficult to stream efficiently, because you either load the whole thing or nothing.

The solution was to break the scene into tiles.

I used a Blender pipeline to partition scenes (eventually using a quadtree). Each tile represents a localized part of the world.

Now the engine can:

  • Load only what’s near the camera
  • Avoid loading interiors when outside
  • Stream data in smaller chunks

This made a big difference for large scenes.


Native Asset Format (.untold)

At this point, most of the systems were in place, but performance still wasn’t where it needed to be.

The main issue was parsing.

Using USDZ at runtime introduced overhead:

  • CPU parsing cost
  • Memory spikes
  • Indirect data layouts

So I introduced a native format: .untold

This format is built for runtime:

  • Data is preprocessed
  • GPU upload is direct
  • Layout is streaming-friendly

USDZ is still useful as an input format, but it’s not ideal for real-time streaming.


Remote Streaming

Once all of the above systems were working, remote streaming became much simpler.

The engine already knew how to:

  • Load assets on demand
  • Stay within memory limits
  • Stream tiles based on camera position

At that point, the only change was the source of the data.

Instead of reading from disk, the engine now fetches assets over the network.

And it works — including on Vision Pro.

Below is a short clip of the Streaming System in action.


Compression (LZ4 + ASTC)

After getting remote streaming working, another issue showed up quickly: memory usage, especially from textures.

Some scenes would crash on Vision Pro due to high texture memory consumption. Even if geometry was under control, textures alone could push the system over the limit.

To address this, I integrated two forms of compression into the pipeline.

For asset streaming, I added LZ4 compression. This helps reduce the size of data being transferred and improves load times when streaming assets from a remote source. Since LZ4 is fast to decompress, it fits well into a real-time pipeline.

For textures, I integrated ASTC compression.

ASTC significantly reduces GPU memory usage while maintaining good visual quality. This made a noticeable difference on Vision Pro, where memory constraints are tighter and large uncompressed textures can quickly become a problem.

With ASTC in place:

  • Texture memory footprint is much lower
  • Scenes that previously crashed can now load
  • Streaming becomes more stable overall

At this point, compression is no longer optional. It’s part of making the system work reliably on constrained devices.


Final Thoughts

What started as a goal to stream assets remotely ended up requiring changes across the entire engine.

The biggest shift was this:

Before:

  • Everything is loaded
  • Everything is available

After:

  • Only what’s needed is loaded
  • Everything else is optional

Streaming isn’t something you add at the end. The engine has to be designed around it.

Thanks again to everyone supporting the Untold Engine on GitHub. This wouldn’t have been possible without that support.

Lessons Learned: When the Drawable Leaks Into Your Render Pipeline

This week, while rendering scenes in Vision Pro using the Untold Engine, I realized that scenes were being rendered with the incorrect color space. Well, initially, I thought it was a color space issue — but something was telling me that this was more than just a color space problem.

After analyzing my render graph and verifying the color targets I was using in the lighting pass and tone mapping pass, I realized that I had made a crucial mistake in the engine.

See, my lighting pass was doing all calculations in linear space, which is correct. However, the internal render targets were being created using the drawable's pixel format. Doing so meant that every platform could change the precision, dynamic range, and even encoding behavior of my internal buffers.

In other words, my lighting results were being stored in formats dictated by the drawable’s target format. That is wrong. The renderer should own its internal formats — not the presentation layer.

Because the drawable format differs per platform (for example, .bgra8Unorm_srgb on Vision Pro), my internal render targets were sometimes:

  • 8-bit
  • sRGB-encoded
  • Not HDR-capable

Even though my lighting calculations were done in linear space, the storage format altered how those results were preserved and interpreted.

So yes — the math was linear, but the buffers holding the results were not consistent across platforms.

That is where the mismatch came from.

To fix this, I explicitly set the color target used in the lighting pass to rgba16Float. By doing this, I ensured:

  • Stable precision
  • HDR-capable storage
  • Linear behavior
  • Platform-independent results

Now, my lighting calculations are identical regardless of the platform, because the internal render targets are explicitly defined by the engine — not by the drawable.


The Second Issue: Tone Mapping Is Not Output Encoding

The other issue was more subtle and made me realize that I still have a lot more to learn about tone mapping.

My pipeline originally followed this path:

  • Lighting Pass
  • Post Processing
  • Tone Mapping
  • Write to Drawable

The problem with this flow was that I assumed that after tone mapping, the image was ready for the screen.

But that is not true.

Different platforms expect different things:

  • Different pixel formats (RGBA vs BGRA)
  • Different encoding (linear vs sRGB)
  • Different gamuts (sRGB vs Display-P3)
  • Different dynamic range behavior (SDR vs EDR)

My pipeline above implicitly assumed that the tone-mapped result already matched whatever the drawable expected.

But tone mapping does not mean “ready for any screen.”

Tone mapping only compresses HDR → display-referred brightness range. It does not:

  • Encode to sRGB automatically
  • Convert color gamut
  • Match the drawable’s storage format
  • Handle EDR behavior

So when I wrote directly to the drawable after tone mapping, I was essentially letting the platform decide how the final color should be interpreted.

And since platforms differ, my final image differed.


What Was I Missing?

I needed to separate responsibilities more clearly.

I needed a pass that owned the creative look — fully defined and controlled by the engine:

  • Exposure
  • White balance
  • Contrast
  • Tone mapping curve

This defines how the image should look artistically.

And I needed a separate pass that is platform-aware — an Output Transform pass — that defines how the display expects pixels to be formatted:

  • Encode to sRGB or not
  • Convert to P3 or not
  • Clamp or preserve HDR
  • BGRA vs RGBA channel order
  • EDR behavior

In my original pipeline, I had collapsed Look + Output Transform into one step. I wasn’t explicitly controlling the final encoding, so the platform’s defaults influenced the final image.

With the extra passes and modifications I made, the Look pass now defines the artistic look of the image. The Output Transform defines how that look is encoded for a specific display.

Previously, I was conflating the two — which allowed the platform’s drawable format to influence the final result.

Here is the image after the fix.

After fix image

Now, the renderer owns the working color space and internal formats, and the drawable only affects the final presentation step.

Thanks for reading.