I think that often the most difficult part of programming is debugging: humans aren't very good at "seeing" how an app operated when it failed. You get some log output that might or might not have references to the correct file locations.
I don't see any reason, in the long term, why humans would be better than machines at debugging. How to make that happen? I assume somebody is building this already. Would it help if an AI with a large context window + access to the VM could see the whole call logs/tree and see exactly what is going on? AI could learn from other users, see everything that happens in a run without debugger/console.logs, try multiple solutions in parallel, and fix issues while you sleep.
Thoughts on this?
In principle, LLMs could help here if they make reliable inferences and stop hallucinating facts. It'll be interesting to see how far those preconditions are achieved. Without them, you'd constantly have to debug your debugger.
A lot of the times I’m most comfortable with LLMs is where I can easily check their output. Debugging is often going to work like that.
I agree with your first sentence but not your second. I've often made a bugfix that addressed the specific scenario I was repeatedly manually testing without understanding why, and as a result not fixing some other situations (and often breaking additional ones). My "explanations" can be rationalizations. Debugging is a process of understanding. You can't really judge if a bug has been fixed without understanding why it happened.
My suspicion is that LLMs will be really good at making the "letter" of arbitrary tests pass without quite meeting the "spirit" of the tests.
I think I agree, but I meant “if you had an LLM where it frequently gave the right fix a significant fraction of the time.” I think that’s a high bar, that they’re not close to meeting yet.
But in my response, I didn’t mean “check” merely as in “the tests pass”, but that you understand the change.
And I think that in that sense, it’s true that you could imagine working with a not fully reliable AI and getting benefits from it.
Leaving aside truly trivial bugs, I think in the majority of cases I work, finding the fix will take me noticeably longer than understanding why a candidate fix works. That’s why I can code review a coworker’s fix faster than I can make it myself (even if they don’t comment their fix).
So I think the target isn’t “you trust this checking things into your codebase”, but “can often find the offending line/offer a fix that you can take or leave.”
I see what you mean. The open question here seems to be about AI "pragmatics" in the sense of en.wikiversity.org/wiki/Semantics_vs_pragmatics. Perhaps I'm airing incompetence here, but I seldom check my coworkers' logic in PRs. I mostly check product concerns (are we building the right thing?), process concerns (e.g. are the right things documented?) and architecture (does this PR change roughly the places I would expect?). For logic we rely on tests, and I build up trust over time in the people (entities) I work with (or give feedback to try to gain trust).
I worry that an AI might be good at slipping through these heuristics of mine while performing its (automated or manual) tests in an antagonistic manner. Will it share my values and those of the broader culture I'm embedded in? Can I build up confidence over time that it shares my values, the way I can for the people I work with. Perhaps I'm still prejudiced because I don't live cheek by jowl with AIs yet 😅
I was about to flippantly say that I'm best at debugging when I've had a solid drink of beer or wine to start me off. But in fact, that's what is needed, human or AI: to take away the personal commitment to what you've done, and got wrong.
Humans can get many times better than today at figuring systems out. That is the premise of Moldable Development. By now we have good evidence that an order of magnitude is attainable without extraordinary effort (and no AI). AI has the potential of improving this even further, but not in how people use it today.
debugging: humans aren't very good at "seeing" how an app operated when it failed.
I think about this a lot, how we are usually limited to the sense of sight when coding. Unlike trying to fix a machine or figure out a problem IRL, we can’t feel our way to shift and steer the code out of the snowbank. We can’t listen to the sounds of loose parts rattling around in the code while debugging.
Kartik Agaram That’s reasonable.
For a I wouldn’t say I usually check my coworkers’ logic adversarially, but I usually take time to see “do I see how this fixes the bug.” More often than not I can tell quickly—not to the point that I’m sure there are no errors, but I usually get the gist of the change. When I don’t, I tend to ask, because it’s an opportunity to document/clarify.
I don’t use it often, but sound can be a really interesting debugging mechanism—play a little warble every time a particular method is entered, and you may be able to just hear when something unusual happens.
Among other concerns with machine debugging: humans are the arbiters of what constitutes buggy behavior. So at minimum you need a human providing a more detailed bug description than "it's not working. you know, the thingy" (dramatized version of real bug reports). I suspect that problem will propagate to deeper stages of the debugging process as well, where it can't be trivially eliminated with boring language safety/formal methods.
Duncan Cragg I can understand how the claim is bold. Perhaps a bolder claim is that the act of building custom tools for “reading” systems compresses communication and by this it changes the nature of programming.
Beside the links provided by others above, please take a look at Glamorous Toolkit. We built it to show how Moldable Development works in practice, and to offer an elaborate case study people can learn from, too:
📝 Home
Glamorous Toolkit is the Moldable Development environment.
I believe that Tudor Girba's bold claim is justified, but I do so after a few years of exposure to and then active use of moldable development as supported by Glamorous Toolkit. I am not sure there is a way to judge the claim without actually investing some serious efforts to gain personal experience. But it's one of those ideas that, in the end, you wonder how you ever did without. I have had only a few similar experiences in 40 years of using computers: live development, Lisp macros, immutability, version control, and most recently functional package managers. All of these have changed the way I apply computing technology to solve real-world problems (meaning problems that do not come from the computing technology itself).