Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Would it be possible to attach an LLM to a debugger sessions executing all of the fuzzer seeds, and ask it to figure out how to expand coverage?


Not to dismiss the novelty and power of LLM's but why would you turn to a black box language interface for that?

Wouldn't you expect a designed system to be more efficient, complete, reliable, and auditable? Most or those characteristics are critical to security applications and the nature of LLM's largely run counter to them.



These links are a little different to the GP comment, though. Both of these cases (which I agree show LLMs being an excellent choice for improving fuzzing coverage) are static analysis, going from the project source code to a new harness.

Some issues with that are that the model probably doesn't have enough context to be given all the project source code, so you have to work out which subset to share, including definitions of all relevant symbols (but not too many).

It helps a bit that the foundation models were already pre-trained on these large open source projects in oss-fuzz, so they already know something about the project's symbols and definitions from their original training sets -- and even from public discussions about the code! -- but that wouldn't work for a private codebase or for recent changes to a large public one.

Then the harness source that the LLM writes might have syntax errors/fail to compile, and you have to deal with that somehow, and the harness source that the LLM writes might be valid but not generate any coverage improvement and you have to deal with that, and so on.

GP seems to be talking about instead some form of LLM-aided dynamic analysis, where you are probably using some kind of symbolic execution to generate new seeds, not new harnesses.

That's important work too, because I think in this case (disagreeing with the blog post author) the vulnerable function was actually reachable by existing harnesses, just not through the seed corpora (at least the public ones).

One approach could be for the LLM to become a kind of a symbolic execution constraint solver, using the debugger as a form of path instrumentation and producing new seeds by predicting what a new input would look like when you invert each interesting constraint that the fuzzing coverage is blocked by, as the debugger hits the test for that constraint (which is difficult because it can require actual computation, not pattern matching, and because of path explosion).

Or perhaps more plausibly, the LLM could generate Z3 or other SAT-solver code to define and solve for those constraints to generate new seeds, replacing what is currently extremely tedious and time-consuming work when done by humans.


Those demonstrate that they're capable of generating capable ones, which is really cool but also not surpising.

What matters for engineering is how that technique compares to others in the context of specific requirements.

A big part of "falling for hype" is in mistaking a new and capable tool for the being the right or oprimal tool.


It's fine to have LLM skepticism as a default, but here it's not justified. Google is showing here that the LLM-written harnesses improve massively on the harnesses in oss-fuzz that were written over many years by the combined sum of open source security researchers. Most dramatically, they improved tinyxml2 fuzzing coverage by 31% compared to the existing oss-fuzz harnesses, through an entirely automated flow for harness generation by LLMs.

Whatever engineering technique you are imagining would be better is not one that humanity actually applied to the problem before the automated LLM harnesses were written. In general, writing and improving fuzzing harnesses is extremely tedious work that is not being done (or paid for) by nearly enough people to adequately protect critical open source software. The LLM approach is a legitimate breakthrough in the field of open source fuzzing.


Fair enough, interesting, and plausible! I looked at the first link and saw it as more of a capabilities demo, but didn't dig into the Google one. I'm mostly just encouraging thoughtful reflection on tool choice by raising questions, not making a case against. Very cool.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: