Technical Architecture

Inside the 32-bit linear pipeline of the Radiance VFX Suite.

Authors: FXTD Studios Architecture Team
Published: March 2026
Keywords: Color Science, PyTorch, GPU Compute, WebGL, VFX Pipeline

Abstract. Radiance is a 74-node ComfyUI extension implementing production-grade HDR image processing, color science, film emulation, and real-time interactive viewing for generative AI pipelines. This document describes the system architecture: the 32-bit floating-point tensor pipeline, the multi-layer color science stack built on OpenColorIO and colour-science, the WebGL-accelerated interactive viewer with LRU frame caching and ASC CDL export, the CIE L*a*b* grade matching algorithm, the 33³ .cube LUT bake/apply engine, and the motion-aware temporal smoothing system for AI-generated video flicker reduction. Extension points, performance characteristics, and design rationale are discussed.

§1 System Overview

Radiance extends ComfyUI with a complete VFX-grade post-production layer. Its design goals are:

Precision: All image data is kept as IEEE 754 fp32 tensors throughout. No clamping to [0,1] except at display output, preserving HDR values above 1.0 and below 0.0.
Composability: Every node follows the ComfyUI data-flow model — IMAGE tensors flow from outputs to inputs, enabling arbitrary graph topologies.
Industry compatibility: Outputs (EXR, .cube LUT, ASC CDL XML, OpenEXR metadata) are readable by Nuke, DaVinci Resolve, Flame, and OCIO pipelines.
Automatic registration: __init__.py uses glob("nodes_*.py") to auto-discover and register all node files — adding a new node file requires zero configuration changes.

ComfyUI Graph │ ▼ nodes_sampler.py ← Radiance Sampler Pro (Flux latent → image) │ image: Tensor[B, H, W, 3] fp32 ▼ nodes_grade.py ← Lift/Gamma/Gain, LAB match, preset, grade_info JSON │ ├──→ nodes_lut.py ← LUTBake → .cube file (Resolve/Nuke) │ LUTApply ← trilinear interp │ ├──→ nodes_temporal.py ← EMA flicker reduction (video batches) │ ├──→ nodes_scopes.py ← FalseColor, Waveform, Vectorscope │ ├──→ nodes_overlay.py ← BlendComposite, MetadataOverlay │ └──→ nodes_radiance_viewer.py │ .rhdr fp32 raw data │ .rpick zlib fp32 pick buffer ▼ radiance_webgl.js ← WebGL renderer (GLSL, fp16 textures, GPU histogram) │ └──→ radiance_viewer.js ← UI, CDL export, LRU cache, Display-P3

§2 Module Structure

The package is organized into a flat set of nodes_*.py modules (auto-discovered) and sub-packages for shared logic:

◎ nodes_grade.py

RadianceGrade (v2.3.3)
RadianceGradeMatch
RadianceApplyGradeInfo
_apply_grade(), _match_grade_params(), _rgb_to_lab()

◎ nodes_lut.py

RadianceLUTBake
RadianceLUTApply
_generate_cube_lut(), _write_cube_file()
_apply_cube_lut_to_image() — trilinear

◎ nodes_temporal.py

RadianceTemporalSmooth
RadianceFlickerAnalyze
EMA loop, motion mask, JSON stats

◎ nodes_scopes.py

RadianceWaveform
RadianceVectorscope
RadianceFalseColor (v2.3.3)
_FC_ZONES — 7-zone palette

◎ nodes_overlay.py

RadianceMetadataOverlay
RadianceBlendComposite
8 blend modes, MASK support

◎ nodes_radiance_viewer.py

RadianceProViewer
_save_pick_buffer() — zlib fp32
build_cdl_xml() — ASC CDL v1.2
RPICK_MAGIC header

◎ color/ sub-package

color_utils.py — shared transforms
Log curve encode/decode
Color space matrices (sRGB, P3, AWG4)

◎ film/ sub-package

camera_profiles.py — 30+ sensors
Grain algorithms, halation
Film stock transfer curves

◎ hdr/ sub-package

Tone mapping operators
Exposure blend (Mertens)
Highlight synthesis

◎ js/ (frontend)

radiance_webgl.js — GPU renderer
radiance_viewer.js — viewer UI
radiance_layout.js — node layouts

§3 Data Pipeline

All inter-node communication uses PyTorch fp32 CPU tensors of shape [B, H, W, C] where B = batch size, H = height, W = width, C = channels (typically 3 for RGB). This matches ComfyUI's standard IMAGE convention.

fp32 Guarantee

Every node casts inputs via .float() immediately and never calls .clamp(0,1) on intermediate data. The VFX audit test suite (89 tests) verifies this with dedicated tests:

test_pipeline_no_clamp_hdr   # values > 1.0 must survive full pipeline
test_pipeline_preserves_float32  # dtype must remain fp32 at output
test_pure_black_no_lift      # 0.0 → 0.0 through all grade nodes

Pick Buffer Sidecar (.rpick)

When the viewer processes a frame, _save_pick_buffer() downsamples the raw fp32 tensor to ≤256px and saves it as a zlib-compressed binary sidecar alongside the display PNG:

fp32 tensor [B,H,W,3]

→

resize to ≤256px

→

RPICK_MAGIC + zlib(numpy.tobytes)

→

frame_N.rpick

The JavaScript viewer fetches .rpick on hover to read true scene-linear HDR values at the cursor — bypassing the tonemapped 8-bit display PNG and providing accurate EV readout.

§4 Color Science Stack

Radiance implements a layered color pipeline that mirrors broadcast and digital cinema workflows:

Layer 1 — Input Transform (IDT)

sRGB linearize ARRI LogC3 decode ARRI LogC4 decode RED Log3G10 decode Panasonic V-Log decode Canon Log3 decode Sony S-Log3 decode

Layer 2 — Working Space

ACEScg (AP1) ACES AP0 sRGB Linear Rec.2020 Linear DaVinci Wide Gamut ARRI Wide Gamut 4 XYZ D65

Layer 3 — Grade (CDL)

Lift (per-channel) Gamma (sign-preserving power) Gain (per-channel) Offset (global) Contrast (pivot) Saturation (luma-preserving)

Layer 4 — Look (LUT)

33³ .cube trilinear OCIO CDL .cc/.ccc LAB match offset

Layer 5 — Output Transform (ODT)

ACES 2.0 RRT+ODT AgX Filmic (Hable) Reinhard sRGB OETF Rec.709 EOTF Display-P3

§5 Viewer Architecture

The Radiance Pro Viewer is a custom ComfyUI widget implemented as a full-screen canvas overlay. It communicates with the Python backend through ComfyUI's websocket api message bus:

Python backend nodes_radiance_viewer.py execute() → saves frame_N.png (display), frame_N.rhdr (raw fp32), frame_N.rpick (pick) → api.send_sync("radiance_result", { images: [...] }) │ │ WebSocket ▼ radiance_viewer.js RadianceViewer class ├─ loadCurrentFrame() ← fetches .rhdr, uploads via loadFloat16TextureCached() ├─ _fetchPickBuffer() ← fetches .rpick for fp32 hover values ├─ _renderGPUHistogram() ← delegates to renderer.renderHistogram() ├─ exportCDL() ← encodes grade_info → ASC CDL v1.2 XML, downloads └─ importCDL() ← parses .cdl XML, applies to active grade │ ▼ radiance_webgl.js RadianceWebGLRenderer ├─ loadFloat16TextureCached(frameId, data) ← LRU Map(8) ├─ renderHistogram(canvas, logScale) ← 256-bin GPU pass ├─ setLinearFalseColor(v) ← pre-OETF false color └─ static initDisplayP3(canvas) ← CSS matchMedia P3 detection

§6 WebGL Renderer Pipeline

The renderer uses WebGL 2.0 with the OES_texture_half_float extension for fp16 texture storage. The GLSL pipeline processes scene-linear data and applies the OETF (display transform) on-GPU:

// GLSL fragment shader (simplified)
uniform sampler2D u_hdrTexture;   // fp16 scene-linear
uniform float u_exposure;
uniform float u_gamma;
uniform bool u_linearFalseColor;  // v2.3.3: evaluate before OETF

vec3 hdr = texture(u_hdrTexture, v_uv).rgb;
hdr *= pow(2.0, u_exposure);

// False color evaluated in LINEAR space
if (u_linearFalseColor) {
    fragColor = vec4(falseColorLookup(hdr), 1.0);
    return;
}

// OETF (configurable: sRGB / Rec.709 / P3)
vec3 display = applyOETF(hdr, u_oetfMode);
fragColor = vec4(display, 1.0);

LRU Frame Cache

The viewer maintains an LRU Map of up to 8 WebGL texture objects keyed by frameId. This eliminates re-uploads during sequence scrubbing — a common bottleneck when working with large EXR sequences at 4K.

// js/radiance_webgl.js
loadFloat16TextureCached(frameId, data, width, height) {
    if (this._lruCache.has(frameId)) {
        // Move to end (most recently used)
        const tex = this._lruCache.get(frameId);
        this._lruCache.delete(frameId);
        this._lruCache.set(frameId, tex);
        return tex;
    }
    // Evict oldest if full
    if (this._lruCache.size >= 8) {
        const oldest = this._lruCache.keys().next().value;
        this.gl.deleteTexture(this._lruCache.get(oldest));
        this._lruCache.delete(oldest);
    }
    const tex = this._uploadHalfFloat(data, width, height);
    this._lruCache.set(frameId, tex);
    return tex;
}

§7 Temporal Processing

Exponential Moving Average (EMA)

RadianceTemporalSmooth applies per-pixel EMA across a temporal batch to reduce high-frequency flicker in AI-generated video. The update rule is:

ema_t = α · frame_t + (1 - α) · ema_{t-1}

where α ∈ (0,1] controls the blend weight. Lower α = more smoothing, higher α approaches passthrough.

Motion-Aware Masking

To preserve sharp moving objects while smoothing static background grain, the motion-aware mode computes a per-pixel motion magnitude and adapts α locally:

motion_mag = |frame_t - ema_{t-1}|.mean(dim=-1)   # H×W scalar
motion_mask = (motion_mag > threshold).float()       # 0 or 1
eff_alpha = α · (1 - motion_mask) + 1.0 · motion_mask
# → α on static pixels, 1.0 on moving pixels (no blend = sharp)

Flicker Index Metric

RadianceFlickerAnalyze computes the flicker index as the coefficient of variation of per-frame luma means:

flicker_index = std(frame_means) / mean(frame_means)

Values below 0.01 are imperceptible; above 0.05 are visible to the human eye in rapid playback. This metric matches the ITU-R BT.1203 temporal uniformity definition.

§8 LUT Engine

Baking (RadianceLUTBake)

LUT baking samples the grade function on a 33³ identity lattice and writes the .cube format:

# Build identity grid — .cube ordering: R fastest, B slowest
lin = linspace(0, 1, 33)
r_grid = lin.repeat(33 * 33)
g_grid = lin.repeat_interleave(33).repeat(33)
b_grid = lin.repeat_interleave(33 * 33)
grid = stack([r_grid, g_grid, b_grid], dim=-1)   # (33³, 3)
out = _apply_grade(grid, ...)                      # apply all grade ops
cube = clamp(out, 0, 1)                            # clamp for SDR LUT

Application (RadianceLUTApply)

LUT application uses 8-corner trilinear interpolation:

# .cube index: B*n² + G*n + R
r0, r1 = floor(R*(n-1)), ceil(R*(n-1))
# ... similarly for G, B
out = c000*(1-rf)*(1-gf)*(1-bf) + c100*rf*(1-gf)*(1-bf) +
      c010*(1-rf)*gf*(1-bf)   + c110*rf*gf*(1-bf) +
      c001*(1-rf)*(1-gf)*bf   + c101*rf*(1-gf)*bf +
      c011*(1-rf)*gf*bf       + c111*rf*gf*bf

This is implemented entirely in PyTorch tensor ops, making it GPU-acceleratable without any custom CUDA kernels.

§9 Grade Matching Algorithm

RadianceGradeMatch transfers the color statistics of a reference image to a source image using CIE L*a*b* color space. This algorithm is equivalent to the Reinhard et al. [1] color transfer method.

Algorithm

1. Convert both images to CIE L*a*b* (D65 white point)
2. Compute per-channel mean μ and std σ for source and target
3. Scale ratio: s = σ_target / σ_source
4. Shift: t = (μ_target - μ_source * s) / 100   (normalized)
5. Map L* channel → uniform gain (luminance) + offset
   Map a*, b* channels → color cast offset in R,G,B space
6. Blend computed params at match_strength ∈ [0,1]

The result is a CDL-compatible gain/offset set stored as JSON in grade_info, enabling the match to be applied to arbitrary other images via ApplyGradeInfo.

§10 Extension Points

Adding a New Node

Create nodes_myfeature.py with a class, NODE_CLASS_MAPPINGS, and NODE_DISPLAY_NAME_MAPPINGS
That's it. __init__.py discovers it automatically via glob("nodes_*.py")

Adding a New Tone Mapping Operator

All tone mapping operators are functions in hdr/ with signature f(img: Tensor) → Tensor. Register the function name in the TONEMAPPER_MAP dict in nodes_hdr.py.

Adding a New Log Curve

Encode/decode pairs are registered in color_utils.py as (encode_fn, decode_fn) tuples in the LOG_CURVES dict. A camera preset can then reference the curve by name string.

Adding a New WebGL Scope

Add a GLSL fragment shader method and a JavaScript rendering function to radiance_webgl.js. Wire the keyboard shortcut in radiance_viewer.js. No Python changes needed for display-only scopes.

§11 Performance Characteristics

Operation	Backend	GPU Speedup	Note
Tone Mapping	PyTorch GPU	20–50×	Fully vectorized over batch
Log Curves	PyTorch GPU	20×	Piecewise function via torch.where
Grade (LGG)	PyTorch GPU	25×	Fused into single tensor pass
LUT Apply	PyTorch GPU	10×	Trilinear, vectorized 8-corner gather
Temporal Smooth	PyTorch CPU	—	Loop is sequential by design (EMA)
FalseColor (node)	PyTorch GPU	8×	torch.where over zone thresholds
FalseColor (viewer)	GLSL	>100×	GPU fragment shader
GPU Histogram	GLSL	>50×	256-bin accumulate pass
Frame Upload (LRU hit)	WebGL	∞	Zero re-upload from cache
Grade Matching (LAB)	PyTorch CPU	—	Statistics-only, not per-pixel

GPU Support

All PyTorch operations respect the tensor's current device. If ComfyUI is configured with CUDA or Apple MPS, tensors are processed on-device automatically. The viewer's WebGL renderer runs entirely on the client GPU, independent of the server backend.

References

Reinhard, E., Ashikhmin, M., Gooch, B., Shirley, P. Color Transfer between Images. IEEE CGA, 2001.
Hable, J. Filmic Tonemapping Operators. GDC 2010.
Hill, S. HDR Color in Call of Duty. SIGGRAPH 2014.
Academy of Motion Picture Arts and Sciences. ACES 2.0 Reference Rendering Transform. 2024.
Sobotka, T. AgX: A Minimal Color Transform. Blender Institute, 2023.
Magnor, M. et al. Digital Video Processing for Engineers. Morgan & Claypool, 2012.
OpenColorIO Contributors. OpenColorIO v2 Architecture. ASWF, 2023.
Colour-Science for Python. https://www.colour-science.org/. 2024.
Narkowicz, K. ACES Filmic Tone Mapping Curve. Blog, 2016.
ITU-R BT.1203. Subjective Picture Quality Assessment for Digital Cable Television Systems. ITU, 1994.