New visual programming languages don't fail as visual programming languages.
They fail as programming languages, which is what most new programming languages do.
The best way I would describe it is like this. Let's say that reality is like a state computation machine. How would it be possible to create a more accurate model of reality than we have currently? It would have to go beyond words, and would have to include visual, kinaesthetic constructs. It would have to be a higher dimensional representation, to the extent that we as humans can perceive a higher dimensional representation. So this means, higher than the 1D text we have, or its 2D UI, single frame in time representation. This would look like some kind of 4D temporal model, where it's possible to see the execution of a program in its entirety, all at once.
It would not only show semantic relationships in a graph sense, but also model the data structures and their operation over time. So the idea of "stocks" of data, and "flows" of data (from Donella Meadows, Systems Thinking) would apply here. React.useState might be represented as a small orb of energy, and if we setState, then it would visually change. When we assign const a = b, then it would visually indicate a flow of information from b -> a. I think this would be a much clearer representation of what a program is doing, and free up cognitive load to focus on more important tasks, as opposed to the programmer having to hold a mental model in their head. No matter how good a programmer is, if he or she possesses a better more accurate kinesthetic model of what is happening, their brain power is freed up to solve more problematic and relevant abstractions.
I've been mocking up a prototype of how this would work, and the basic idea is overlaying a tool like VSCode with "vibes" as well as flow relations. It's like an annotation layer over existing code, but this is just the basic start of how it would look like. Ultimately, it would look like a mini universe.
www.eusaybia.com for a non-programming context demonstration of what I'm talking about.