Software archaeology
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Field notes from systems we brought back.
reading room
research and field notes on orphaned software, integration patterns, and what resurrection actually looks like.
the orphaned-software lifecycle
how custom software gets abandoned at small and mid-sized companies, and what it costs at each stage.
the eight systems every trades company runs on
a map of the tool stack at a typical under-500-employee shop, and where the integration gaps show up.
a field guide to sql database audits
how to inventory what’s inside your legacy system’s database before deciding what to do next.
why the 60-year-old contractor is right to be afraid
the fear of another bad custom-software project is justified. here’s what makes a second attempt safer than the first.
three automations every mid-sized shop quietly needs
the three small, fixed-price automations that pay for themselves fastest at a sub-500-employee shop.
what “ai front end” actually means when your developer is gone
a plain-english walkthrough of what we mean by putting an ai-driven interface on top of your existing sql database.
How we bring old systems back

Case study: Lorem ipsum manufacturing
Join Lambda's co-founder and CEO Stephen Balaban as he unveils Neural Software: an entirely new way of thinking about software that can collaborate with humans, evolve over time, and adapt like never before.
Recent projects
Scaling laws for diffusion models
Diffusion beats autoregressive in data-constrained settings.
SpatialReasoner
Tensor decomposition for force-field prediction
Bifrost-1
BLEUBERI
OverLayBench

Case study: Lorem ipsum logistics
Last year, “the ARChitects,” a Lambda-sponsored team (including Lambda researcher David Hartmann) won the ARC Prize 2024. This year, they finished second out of 1,400+ teams with a final leaderboard score of 16.53%.

What gets reanimated
A clear, data-driven comparison of today's leading large language models. Standardized benchmark results cover top contenders like Meta's Llama 4 series, Alibaba's Qwen3, and the latest from DeepSeek, with critical performance metrics measuring everything from coding ability to general knowledge.

Field notes
Your go-to source for the latest in the field, curated by AI. Sift through the excess. Make every word count.
Common patterns we see
Diffusion from scratch
Text2Video pretraining
GPU benchmarks
Throughput GPU benchmarks for training deep neural networks.
MLCommon benchmark
What clients say
Latent thought models
Teach LLMs to “think” before they speak and improve parameter and memory efficiency.
Product of experts with LLMs
Boosting performance on the ARC-AGI challenge is a matter of perspective.
* Winner, ARC-AGI 2024
DepR
Uses depth cues and diffusion models to turn a single image into a clean, instance-level 3D scene.
Video MMLU
A benchmark that tests whether models can truly follow and understand long, multi-subject lecture videos, revealing that current VLM’s limitations.
Latent adaptive planner
Robots can learn a compact “plan space” from human demos and continually updates its plan as the scene changes, enabling faster and more reliable manipulation.
AimBot
AimBot draws visual cue onto robot camera images so policies directly see where the gripper is in 3D, boosting manipulation success.
Word salad chopper
Detects when a reasoning model has drifted into meaningless repetition and cleanly cuts away those extra tokens, saving a lot of output cost with almost no quality loss.
DEL-ToM for theory-of-mind reasoning
Breaks social reasoning into “who-knows-what” steps and uses a checker to pick the most consistent answer, improving small-models’ theory-of-mind performance.
VeriFastScore
Trains a single model to extract and verify all claims in a long answer at once using web evidence, speeding up long-form factuality evaluation.
Error typing for smarter rewards
Labeling each reasoning step’s mistake type and converts those labels into scores, giving richer feedback and improving math problem solving with less data.
Industries we have touched
SAEBench
A comprehensive benchmark for sparse autoencoders in language model interpretability
Adam Karvonen, Can Rager, Johnny Lin, Curt Tigges, Joseph Bloom, David Chanin, Yeu-Tong Lau, Eoin Farrell, Callum McDougall, Kola Ayonrinde, Demian Till, Matthew Wearden, Arthur Conmy, Samuel Marks, and Neel Nanda — ICML 2025
VideoHallu
Evaluating and mitigating multi-modal hallucinations on synthetic video understanding
Zongxia Li, Xiyang Wu, Guangyao Shi, Yubin Qin, Hongyang Du, Tianyi Zhou, Dinesh Manocha, and Jordan Lee Boyd-Graber — NeurIPS 2025
VLM2Vec-V2
Advancing multimodal embedding for videos, images, and visual documents
Meng, Rui and Jiang, Ziyan and Liu, Ye and Su, Mingyi and Yang, Xinyi and Fu, Yuepeng and Qin, Can and Chen, Zeyuan and Xu, Ran and Xiong, Caiming, and others — arXiv preprint 2025
Think, prune, train, improve
Scaling reasoning without scaling models
Caia Costello, Simon Guo, Anna Goldie, and Azalia Mirhoseini — ICLR 2025 workshop
NeoBERT
A next-generation BERT
Lola Le Breton, Quentin Fournier, Mariam El Mezouar, and Sarath Chandar — TMLR 2025