Skip to main content

Posts

Featured

Show HN: PhAIL – Real-robot benchmark for AI models https://ift.tt/XJ5WsvG

Show HN: PhAIL – Real-robot benchmark for AI models I built this because I couldn't find honest numbers on how well VLA models [1] actually work on commercial tasks. I come from search ranking at Google where you measure everything, and in robotics nobody seemed to know. PhAIL runs four models (OpenPI/pi0.5, GR00T, ACT, SmolVLA) on bin-to-bin order picking – one of the most common warehouse operations. Same robot (Franka FR3), same objects, hundreds of blind runs. The operator doesn't know which model is running. Best model: 64 UPH. Human teleoperating the same robot: 330. Human by hand: 1,300+. Everything is public – every run with synced video and telemetry, the fine-tuning dataset, training scripts. The leaderboard is open for submissions. Happy to answer questions about methodology, the models, or what we observed. [1] Vision-Language-Action: https://ift.tt/QWXeAsP https://phail.ai March 31, 2026 at 09:55PM

Latest Posts

Show HN: My open-world voxel game with a magic system, playable in the browser https://ift.tt/DYTzZ7y

Show HN: Rusdantic https://ift.tt/AB0ELNR

Show HN: AI Spotlight for Your Computer (natural language search for files) https://ift.tt/2VoJLtF

Show HN: Memv – Memory for AI Agents https://ift.tt/oixwz8l

Show HN: I made my fitness dashboard public and Apple Health needs an API https://ift.tt/kBnd4Mg

Show HN: I made a "programming language" looking for feedback https://ift.tt/FN4pZ2v

Show HN: Timezone App – Visual meeting scheduler for distributed teams https://ift.tt/adjxLXW

Show HN: Octopus, Open-source alternative to CodeRabbit and Greptile https://ift.tt/e5LZfg7

Show HN: GitHub Copilot Technical Writing Skill https://ift.tt/tAUiDyh

Show HN: We built a multi-agent research hub. The waitlist is a reverse-CAPTCHA https://ift.tt/Agz8E06