Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
denysvitali
62 days ago
|
parent
|
context
|
favorite
| on:
IQuest-Coder: A new open-source code model beats C...
Better link:
https://iquestlab.github.io/
But yes, sadly it looks like the agent cheated during the eval
denysvitali
62 days ago
|
next
[–]
According to
https://github.com/IQuestLab/IQuest-Coder-V1/issues/14#issue...
the result is still good after fixing the cheating problem. 76.2% (from 81.4%) which still beats Opus 4.5 (74.4%)!!
ipython
62 days ago
|
parent
|
next
[–]
Unfortunately they seem to have neglected to update their front page readme with this information, continuing to mislead people:
https://github.com/IQuestLab/IQuest-Coder-V1
anamexis
62 days ago
|
root
|
parent
|
next
[–]
It is updated on their actual home page, though. There is clearly no intent to mislead people.
https://iquestlab.github.io
alexpop80
60 days ago
|
parent
|
prev
|
next
[–]
What do you mean? Opus 4.5 and GPT 5.2 broke the 80% mark and no other models yet seem to be passing this important milestone.
s-macke
62 days ago
|
prev
[–]
The link didn’t get enough votes a few days ago.
denysvitali
62 days ago
|
parent
[–]
I know - I posted it :)
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
But yes, sadly it looks like the agent cheated during the eval