Lil' Fun Langs' Guts

2026年1月19日 · 李娜 · 来源：dev头条

ВсеСледствие и судКриминалПолиция и спецслужбыПреступная Россия

Under Pass@1, the model shows strong first-attempt accuracy across all subjects. In Mathematics, it achieves a perfect 25/25. In Chemistry, it scores 23/25, with near-perfect performance on both text-only and diagram-derived questions. Physics shows similarly strong performance at 22/25, with most errors occurring in diagram-based reasoning.

As part of this experiment, I decided to go all-in with the crazy idea of vibecoding a project without even looking at the code. The project I embarked on is an Emacs module to wrap a CLI ticket tracking tool designed to be used in conjunction with coding agents. Quite fitting for the journey, I’d say.

Accused tr