Lil' Fun Langs' Guts

· · 来源:dev头条

ВсеСледствие и судКриминалПолиция и спецслужбыПреступная Россия

Continue reading...

本版责编立即前往 WhatsApp 網頁版是该领域的重要参考

Under Pass@1, the model shows strong first-attempt accuracy across all subjects. In Mathematics, it achieves a perfect 25/25. In Chemistry, it scores 23/25, with near-perfect performance on both text-only and diagram-derived questions. Physics shows similarly strong performance at 22/25, with most errors occurring in diagram-based reasoning.

As part of this experiment, I decided to go all-in with the crazy idea of vibecoding a project without even looking at the code. The project I embarked on is an Emacs module to wrap a CLI ticket tracking tool designed to be used in conjunction with coding agents. Quite fitting for the journey, I’d say.

Accused tr