Experiment E001 Β· logbook
Photorealistic 3DGS from our data: how much, where, and why not
From the work order to the evaluation scorecard: real 3D Gaussian
Splatting training on the test-gta-demo-1 scene, with the discovery of the
navigability envelope and the depth-regularized series. July 1β2, 2026, A10G GPU box.
- Question
- How much photorealism do we get from a 3DGS trained on our data
(gameplay video + poses from the LingBot-MAP reconstruction)? Enough to make it the hero
renderer of
3d-viewer? - Short answer
- Yes along the capture path (PSNR 28.7, solid surfaces, ~90 fps): a clear step up from points. No in free-fly: quality collapses ~20 cm outside the capture corridor (0.9 m), and it is a limit of the capture, not of the optimization β no regularizer moves it.
- Where
- code and docs:
3d-data-reconstruction/3dgs/(commit7a6fbe9) Β· artifacts:/home/ubuntu/3d-data-reconstruction/testruns/3dgs/test-gta-demo-1/on the box (453 MB ply, 34 MB spz, 3 checkpoints, COLMAP dataset) - Stack
- gsplat 1.5.3 (Apache-2.0) Β· torch 2.11+cu128 Β· A10G 24 GB Β· ~30 min and 4.4 GB VRAM per run
Context
From a video, the 3d-data-reconstruction pipeline (LingBot-MAP, feed-forward)
produces per-frame poses, per-pixel depth, and a dense point cloud. 3d-viewer today
renders points β the visual ceiling is "colored dots". 3DGS promises real
surfaces at the same static-hosting cost. The work order (Marco, Jun 30) asked us to
actually train and measure before investing in the integration.
Phase 0 β Tracking down the ingredients Β· Jul 1
3DGS trains against calibrated images: we needed the 300 source frames and the intrinsics. Neither was where we expected.
The frames: the demo directory does not reference them; the original command, fished out
of the bash history, pointed to ProPainter/gta_cave/19-47-44clip_300-600.mp4 (frames 0β299 at
60 fps) β not to the videos in gta_videos/session-2/ that looked like the
obvious candidates. The scene is an underground GTA bunker/workshop, not the "shop" from the
work order.
The intrinsics: LingBot-MAP predicts them (FOV in the pose encoding) but
demo-two.py does not save them. Recovering them became a small result of its own:
pointcloud.npz is exactly
depth.npy reprojected through K, confidence-filtered, concatenated in
frame/raster order. Frame 0's slice starts at offset 0 by construction: pairing
its masked pixels with the cloud's prefix yields exact pointβpixel
correspondences up to the first divergence (detected via depth
consistency). Least-squares over 63k clean pairs β fx=1100.39, fy=1089.54 @1080p
(FOV 82.2Β°Γ52.7Β°), residuals ~1e-5 px, with cx,cy landing on W/2,H/2 exactly as
the pipeline code predicts. Script: 3dgs/recover_intrinsics.py.
Reprojection sanity check (cloud β 4 frames, split-screen overlay): pixel-level alignment across the whole sequence. Only then did we start training.
E1 β Vanilla baseline Β· Jul 1
gsplat simple_trainer default, 30k iterations, frames at 1600Γ900, init from 2M
cloud points, holdout of 1 frame in 8. Deliberate choice:
--no-normalize-world-space, so the gaussians stay in the reconstruction's world
frame and everything the viewer already does (fly-through, framing) carries over unchanged.
v1.5.3 aligned with the pip wheel).
| step | PSNR | SSIM | LPIPS | gaussians |
|---|---|---|---|---|
| 3k (smoke) | 27.14 | 0.834 | 0.353 | 1.76M |
| 30k | 28.70 | 0.852 | 0.253 | 1.92M |
E2 β Depth-regularized, and the point cloud exits the stage Β· Jul 2
Two moves in one. (1) The init no longer reads pointcloud.npz: we
reproject depth.npy directly β which is the same thing by construction
(verified at float32), so the 3DGS chain sheds a 567 MB artifact. Producer
input: frames + poses + depth + K, period. (2) Depth enters the loss:
gsplat's --depth_loss supervises disparity on the SfM points anchored to the
frames; by sampling 6,700 depth pixels per frame and registering them as COLMAP tracks, the
samples become depth anchors without touching the trainer.
| run | PSNR | depth err. p50 | p90 | splats | aniso p99 |
|---|---|---|---|---|---|
| E1 baseline | 28.70 | 12.45% | 32.2% | 1.92M | 1.6k |
| E2 depth loss | 28.63 | 4.78% | 20.5% | 1.88M | 2.2k |
E3 β Shape regularizers: half lesson, half own goal Β· Jul 2
A direct attack on the tail: depth_lambda 5Γ, scale_reg 0.01
(penalizes large gaussians), opacity_reg 0.001 (discourages floaters).
| run | PSNR | depth err. p50 | p90 | splats | aniso p99 |
|---|---|---|---|---|---|
| E3 +reg | 28.57 | 3.25% | 13.7% | 1.43M (β25%) | 289k (!) |
scale_reg penalizes
absolute scale: to pay less, the needles got thinner instead of
shorter (p99 anisotropy from 1.6k to 289k, bright streaks visible off-path). To be
replaced, if anything, with a penalty on the max/min ratio of the scales.
opacity_reg, on the other hand, holds up: β25% splats at equal PSNR, faster
rendering, smaller file.
The ablation matrix (read BEFORE launching a run)
Each run costs ~30 min of A10G. Before launching one: check here that the combination has not already been tried, and change one variable at a time. Metrics: PSNR on the 38 held-out frames; depth error = rendered depth vs LingBot depth; aniso = max/min ratio of the scales per gaussian (the p99 tail is the "spaghetti").
| run | depth_Ξ» | scale_reg | opacity_reg | PSNR | depth p50 | p90 | splats | aniso p99 | verdict |
|---|---|---|---|---|---|---|---|---|---|
| E1 | β | β | β | 28.70 | 12.45% | 32.2% | 1.92M | 1.6k | vanilla baseline |
| E2 | 1e-2 | β | β | 28.63 | 4.78% | 20.5% | 1.88M | 2.2k | β geometry 2.6Γ, photometry unchanged β clean (single variable) |
| E3 | 5e-2 | 0.01 | 0.001 | 28.57 | 3.25% | 13.7% | 1.43M | 289k | β οΈ 3 variables at once β attribution impossible; scale_reg pathological (thinner needles, not shorter) |
| E4 β open | 5e-2 | β | 0.001 | ? | the attribution test: if depth p50 β ~3.3% with healthy aniso, the credit was depth_Ξ»'s (and it goes into the producer); if it stays ~4.8%, E3's gain was a scale_reg artifact | ||||
depth_conf.npy. The producer recipe today is the "E2 +
opacity_reg" row β it is updated only with a verified row of this table.
Engineering β the chain is closed Β· Jul 2
From research result to product, two commands:
# GPU box (3d-data-reconstruction): prep β training (E2+opacity_reg) β scene.sog
python 3dgs/train_gsplat.py --demo-dir <recon> --frames-dir <frames> --intrinsics <K.npz>
# viewer (3d-viewer): bounds + camera path + manifest, no GPU
python build_scene.py <recon> --name <scene> --producer gsplat --gsplat-asset <scene.sog>
The split respects the repo separation: training (GPU, conda) lives with the
pipeline; 3d-viewer receives only the ready-made .sog and stays
display+navigation. Rendering in the viewer is Spark 2.1 on modern three.js, isolated from
Potree's ancient three; camera path and fly-through unchanged. End-to-end smoke test at
300 steps: it already paid for itself by catching a relative-path bug before the first real run.
Where we are, what remains open
Producer config: E2 recipe (init from depth + depth loss) +
opacity_reg, no scale_reg β it changes only with a verified row
of the ablation matrix.
- Confidence-weighted dense depth loss β today we use 6.7k anchors/frame
out of 1.44M pixels and
depth_conf.npyonly as a binary filter; a dense L1 on the render's depth channel is a small patch to the trainer. - Capture protocol β the structural answer to free-fly: wide-baseline passes over the same scene. To be tried as soon as the capture pipeline allows it.
- Browser verification β β
done (Jul 2):
loadGsplat()implemented in3d-viewerwith Spark 2.1 + modern three.js (engine separate from Potree's ancient three, same Z-up space: camera path and fly-through unchanged). Format surprise: splat-transform 2.7's.spzis version 4, which Spark rejects β solved by exporting.sog(27 MB, even smaller). The 1.92M-splat scene renders from the first pose with the trajectory overlaid (verified headless with SwiftShader). Thegsplatproducer is closed as well β see "Engineering" above. - Product direction β 3D spatial labeling while navigating the scene: 3DGS as the display layer, labels computed in the shared world frame (which we preserved on purpose), grounding on posed RGB-D.
Logbook convention: one page per experiment campaign, in chronological order; decisions in the terracotta callouts, abandoned paths (with reasoning) in the gray callouts, surprises in olive green. Numbers always in a table; heavy artifacts stay on the box with the path noted in the scorecard.