Perception, control, state estimation, and navigation: the parts that turn a model into something that moves through the world, with projects and notes written as I build them.
Most robot manipulation policies today are learned from demonstrations: people teleoperate a robot, and a model is trained to reproduce that behavior. I wanted to work through that loop myself rather than just read about it. This project trains a diffusion policy from scratch on PushT, an open benchmark where a round pusher has to slide a T-shaped block onto a target, using Hugging Face's LeRobot library and dataset on a single consumer GPU. Because success rates over a few dozen rollouts are noisy and only matched comparisons mean much, the same evaluation harness runs the pretrained reference checkpoint under identical seeds, and every number ships with a confidence interval. The repo also includes a survey I wrote of the current open robot foundation models (GR00T, pi0, RDT2, OpenVLA, SmolVLA, and others), with the claims link-verified. Results, rollout clips, and the write-up live in the repo.
Produces a dense depth map from a single camera frame and uses it for obstacle awareness and visual odometry, enabling navigation without a dedicated depth sensor.
Tracks position and orientation by fusing inertial and visual measurements with an extended Kalman filter, holding an accurate pose estimate when GPS is unavailable.
Write-ups on perception, control, and state estimation.