Has anyone thought about or written about malleable software + LLM:s? I feel that it is an area that could be quite interesting, but I haven't got yet a clear understanding on what this will lead to.
I think that pretty soon most of end-user code will be generated with the help of LLM:s. Some thoughts/questions I've been thinking about:
- How to generate programming languages, libraries and abstractions that LLM:s can use well? Is that different from generating libraries etc for humans?
- LLM:s are faster than humans in processing data, so API:s can be really wide and probably more complex than what would be practical for devs.
- LLM:s can probably handle more condensed/optimized representation better, too.
- And be able to infer system affordances directly from code.
- How to support creating good and maintainable code from LLM:s? And will that matter? Or will actual code become irrelevant?
- How to modularize apps so that they can be easily composed by LLM:s?
- My hunch: making apps modular and composable would work really well with LLM:s already now and even better in the future. Doesn't matter if functional or OOP, as long as the LLM can understand the logic.
- What kinds of new apps and malleable software will LLM:s enable?
- Also: LLM:s could finally enable removing some boundaries between different programming tools, library ecosystems, etc, by enabling translations/bridges automatically.
Any thoughts?
đź“ť Malleable software in the age of LLMs
All computer users may soon have the ability to author small bits of code. What structural changes does this imply for the production and distribution of software?
One big question I see is: will anyone actually make an LLM that will do this job well? Are there economic incentives?
Today's LLMs are not up to the task. Whatever they produce, code or prose, needs to be checked by a human if precision matters (which is almost always does in code).
Also, today's LLMs are not up to the taks of maintaining anything, because they themselves evolve too quickly. It's very much "move quickly and break things". If you write end-user code in a natural language and count on an LLM to turn it into precise instructions, at the very least you want those precise instructions to be reproducible over time.
Thanks Paul Sonnentag, this is pretty much exactly the kind of thing I was looking for and thinking about!
Konrad Hinsen, good points! My hunch is that LLM:s will still improve a lot. Also model versioning, open-source models and fine-tuning will help at overcoming some of the other current bottle-necks. But we'll see. 🤷‍♂️
LLMs are good a producing a first draft, since they do require a lot of human oversight of the output. I'm partial to modifying programs by describing the transformation of the code. LLMs might be useful for writing the first draft of those transformations. That helps with the reproducibility issue because the LLM is describing the changes to the code in more structured way, not the code itself. Unfortunately, as far as I know, there is no widely recognized syntax or system for describing program transformations.
Right. Instead or in addition to editing raw low-level imperative code, I think LLM:s could be useful in altering module-level code/structure, such as which objects/panels are visible where, how they are interconnected, etc. And then create minimal required glue to bind them together. I think this kind of use-case could be interesting to explore and perhaps even more interesting than just creating for loops, if-statements, etc.
At least in my filter bubble, I'm seeing a lot of ferment in the area of local-only models. I think they go a long way towards addressing concerns of incentives as well as LLM evolution.
Interesting idea of having LLM:s describe transformations of code @Mark Dewing, btw. Using an LLM to create a transpiler from one language to another would be something like this. Perhaps also LLM creating glue code between API layers. Or are you thinking of something different. 🤔
OpenAI had a first-class “edit” API where you’d send it (for example) code and a description of a change, and it would return the modified version. They deprecated it this summer because it was just as reliable to use the chat API. Even without a transformation language, I find it easy to work with because I can examine the diff just like if another human made the change.
You may enjoy this: nothing concrete yet but a vision for the future: "LLM As Compiler": medium.com/redsquirrel-tech/llm-as-compiler-2a2f79d30f0b
Interesting, thanks @Greg Bylenok! So if I get this correctly, the idea is that users will do coding in high-level pseudo-language, compile it to the actual target language and then verify that it works as intended.
Something I'd like to see (but don't expect to) is the use of controlled natural languages as the intermediate layer. Generated by an LLM, checked by a human. Controlled natural languages are hard to write but easy to read, so they look like just the right level.
Problem: there is no big corpus of any controlled natural language that could be used to train LLMs. And such a corpus would be very expensive to create, because these languages are a pain to write.
I've seen some posts about enforcing constraints on LLM output. Not sure what was the one I saw on Twitter, but perhaps this is something related: github.com/IsaacRe/Syntactically-Constrained-Sampling
Something like this might work even for a controlled natural language, as long as you can formalize/validate the language?
And I really hope that some type of forced constraints will be in all LLM:s soon.
100% on the constraints. For now, we attempt to accomplish through prompting, like "put your answer in the following JSON structure"... When it neglects to do so, we have to remind it via "now put your answer in the following JSON structure". Obviously not robust, but so far "good enough" for our use case. Another change that would help: eliminating the non-determinism in the reply. Can anyone explain why today's LLMs are not deterministic? Has non-determinism been added as a "feature" to avoid repetitive responses? Or is it somehow fundamental to the inference algorithm itself?
It's a feature. And as long as LLM output is not reliable, I don't see the point of making it determinstic.
@Greg Bylenok writings.stephenwolfram.com/2023/05/the-new-world-of-llm-functions-integrating-llm-technology-into-the-wolfram-language touches on it.
“The basic issue is that current neural nets operate with approximate real numbers, and occasionally roundoff in those numbers can be critical to “decisions” made by the neural net (typically because the application of the activation function for the neural net can lead to a bifurcation between results from numerically nearby values). And so, for example, if different LLMFunction evaluations happen on servers with different hardware and different roundoff characteristics, the results can be different.”
The next few paragraphs go into how GPU parallelism inserts nondeterminism too.
But that’s just an explanation of why it’s difficult to remove entirely. Most systems do insert it intentionally to produce variation in the responses.
đź“ť The New World of LLM Functions: Integrating LLM Technology into the Wolfram Language
How to install and use Wolfram's instant LLM-powered functions. Stephen Wolfram shares dozens of examples and explains how the functions work. Also download