A bit off-the-wall, but I’m thinking about what true “spatial computing” (or insert buzz-term here) visual programming could look like, or instead of the word “true”, interesting, pragmatic, and unique. I’ll post something if I think of anything at all, but in the meantime, I’d be curious to hear thoughts. One thing I don’t want to do is rely on wire spaghetti everywhere. To me, interesting problems arise because of the spatial component. Things going out of view.
Oh! This is one of my favorite idea playgrounds! My guiding principle, the one that keeps me away from existing programming systems, is to stay analog, contagious, connected, smooth, linear, to avoid language, symbology, anything brittle. Adjacency, interpolation, and context should always be meaningful and retained. Let me elaborate a little...
I'm not sure I follow what you mean by spatial, Karl. Things go out of view even in node-and-wire systems, right? Or even text-based programming, with scrolling?
Is 💬 #thinking-together@2025-01-21 related? I remember I didn't really follow the connection you were making between indentation, spatial computation and replacing functions with macros. But others seemed to get it, so I think I'm just out of my element with this topic.
What spatial computation means to me, at the risk of stating the obvious:
- Ivan's visual programming codex has some examples of past systems that are at least partly spatial.
- The infinite canvas metaphor. I've experimented with putting just textual code on an infinite canvas, and more recently debug information about the internals of a running program.
@Karl Toby Rosenberg Yes, do elaborate on what you have in mind, maybe with an example. I'll share my example in a minute.
Oh, I meant it in the Apple Vision Pro sense. Being able to manipulate objects in the real world. Understood this may be contentious. Dynamicland or AR goggles don’t really matter if they lead to useful thought experiments or prototypes. One example is how our modern (and longtime) infrastructure comprises streetsigns, guiding symbols, things that connect to telegraph meanings of objects in the real world. What if we could program our environment? (Not new questions, but interesting nonetheless to revisit).
I’d argue that we already have 3D world-situated visual programming languages, just not digital or things we can manipulate.
There are a lot of research projects that just try to let you program things like IoT appliances using combinations of wire-nodes or blocks in 3D, but I’m not convinced these approaches as the best. Dynamicland also does its own thing.
So I’d say none of the above.
More along the lines of, what would it be like if we had the ability to program as part of language in a 3D space? (At a table, in a room.)
Apologies if that’s still vague.
I’d say some of what I’m interested in might be a 3D logical evolution of ideas in my previous project
Suppose you have a bunch of things: a table of records, a list of items, a sequence of keyframes, a grid of models parameterized in one way or another. Let's go with the the grid. Let's say they're paper airplanes where the angle of one fold is varied along one dimension and the angle of another fold is varied along another. In the end, the grid shows different combinations. Some look good, some look stupid, some are mediocre. Pull the pretty ones toward you, push the bad ones away. Now instead of a simple "yes/no" filter, we have this third quality dimension in addition to the two parameters, and we can do spacial things with it, like rotate the whole assembly, focusing on a cross section of similar quality or better yet, perpendiculars where the gradient is especially steep. My point is that more typical visualizations throw out the context and connectivity of things.
It sounds like you’re focusing on the “what we can do” portion vs the “how is it visualized” or maybe both. Neat. I wish there were a drawing or storyboard to visualize that.
I’m not sure if you’re still referring to a 2D screen medium, but I can imagine this working perfectly well on a 2D surface in 3D like a table, where you push/pull using gestures.
I think connectivity is pretty important for what I wanted to explore in terms of rules and cause/effect sorts of things. (When the light is white, it’s safe to cross the street.)
I realize this kind of conflicts with what you felt is cool, or maybe it’s unrelated.
Conflict is fine. It's all about having different perspectives.
Checked on DrawTalking... cute! I love it. So much fun and so subtle!
So how in does my general theme of space, curving, selection, and rotation interact with behaviors, like the boy playing ball with the dog, but instead of using the richness of language? Behavior is change over time so time is new axis we work with. Already sketching the boy is an example. Pen motion (x,y,t) is flattened to all those (x,y) that were hit while drawing the boy. Now making a symbol (in the Flash sense) amounts do delimiting where we started drawing the boy and when we stopped. Moving the boy amounts to translating where that delimited continuation starts. Now draw a ball, great.
"The boy throws the ball." First he needs to go get it. How to I model this? Similar to the paper airplanes example, take the space of places the boy can be crossed with places the ball could be. "Better" here has the boy closer to the boy closer to the ball. In the end, we want a gradient that has the boy getting closer to the ball. For good animation velocity should play a role too, but the point is that we're picking a path through the high dimensional space and then projecting onto (boy.x, boy.y , ball.x, ball.y, t)
and watching that. At the UI level, the motion should just be to drag the boy to the ball. It's up to the system to interpolate the rest according to some least action, different actions giving different easing.
And of course the real subtlety is the different states: boy going to pick up ball, ball falling into water, dog going to get ball, dog giving ball to boy.
Thanks for taking a look! (and calling it cute) I think I wasn’t focusing on animation fidelity, but I can see how you might add controls for inspecting what looks good if you were to go about using the environment to do that (based on what you said). I think when I was thinking of spatial computing, I meant some other term (there are many overloaded terms) that might capture the feeling of e.g. drawing in air or on surfaces; using 3D digital objects. More of a feel than a procedure. Most 3D “programming” uses the nodes and wires approach, but if the aim is to situate programming in an environment in which you’re communicating with other people, a sort of “design goal” might be to reduce all that visual clutter. Different/complementary problem I think.
Now... 2D vs 3D. 3D just has a lot, a lot more space to do stuff in: helpful since I dedicate a new axis so every feature and want to keep things connected. My favorite tricks are...
- sampling rather than showing the whole interpolation: a grid of airplanes rather than some 5D mess of superimposed airplanes,
- easy rotation: works way better in 3D than on screens, and
- perspective effects: put side-features on the side of a model so that those dimensions unfold when you shift your whole body on direction or another. It's more natural than, say, the keyboard acrobatics you see in CAD.
Now, let's talk about your design goal... communicating without visual clutter.
For behavior, do you mean like storyboards? Seems like the most natural way to show something like the boy-ball example: keyframes with motion arrows.
May I interject? Is this about a cube grid? I suppose I was visualizing a whole room with different objects doing Alan Kay message passing (as just an example, not necessarily what I’d do)
For elucidation, they can be more informative than than the video itself. Answers question like the trajectory of the ball or does the dog start chasing the ball before or after it lands in the water?
Your scenario description.
I’m a little confused also about the purpose of showing storyboards of the boy and dog. The objective is to create interactive game mechanics or mechanics in general, not to describe the mechanics. Apologies, I’m probably missing a key point.
No worries. The whole point of conversation is to interactively clarify. Now what's the difference between creating interactive game mechanics and describing the mechanics? Audience. Creating is for the computer, describing is for a person, and the emphasis is liable to be different.
Suppose we wanted to play a card game right here and now using some magical programming substrate to help simulate the cards. The computer really needs to know what exactly can happen when in order to implement the rules, resolve ambiguity and contradictions. But for you, I would start by explaining the goal of the game, important interactions to look out for, then we could come to the individual steps. (Mumbles something about Aristotle and causes.)
Maybe a problem with using the procedural way to talk to the computer is that getting rules to achieve goals is tricky.
That’s part of why it’s interesting. My project touched pieces of this, but in other contexts, there’s more to try.
A storyboard is explicit about where you're starting and where you want to go. The right answer for certain kinds of interaction. Of course, there's an exploratory direction too, "Hey computer, given these rules, let's see what happens!" Tension comes from the mismatch wanting a particular overall behavior but only being able to describe it with the primitives the machine already knows.
The talking in DrawTalking is cute and subtle because these basic words carry a lot of intention. The computer has to do a lot of guessing and should, but then it should be easy to understand and revise system's assumptions.
Yep, the underlying system essentially has verb primitives with attributes that act as parameters (the adjectives and adverbs). Those are built-in. Some way to parameterize those even more fluidly would be nice. For some kinds of actions, it’d be easier. For others, not so much. It’s an interesting challenge. I’m not a fan of just generating things. Better to have a limited understandable vocabulary that is easy to change. The existing verbs have a pretty closed meaning to make that possible in theory. So yes, combining the words with other approaches (direct ones for example) or correction opportunities, or other ways to “program” the user’s changed assumptions in would be nice.
Programming by demonstration’s pretty common for movement-based actions, but for more abstract kinds of actions, that’s where it gets tricky.
Well you have to pick some primitives, right? Those best for simple animation, for example, are similar yet significantly different than the ones that work well for, say, board games. Animation is more about coordinating movement. Games are more about having spots where you can put pieces, stack them, roll, and shuffle... And then "video games" as such that are more about collision and timing doubly so.
I think it would be cool if things connected automatically just by putting them next to each other.
Rather than using single pictographic images you could combine them together in this way based on visual adjacency, creating a practically unlimited range of symbols with just a couple of dozen primitives.
You could then make sub-procedures by drawing little arcs either side, to create spatial depth that could visually represent branching trees. You could ornament the arcs to denote different kinds of branches. You could even supplement this branching using the horizontal spatial position on the page.
Execution could perhaps flow downwards, but you could draw the same images in different parts to create a spatial wormhole from one part of the page to another, or create a visual looping structure with the arcs.
Even better if you could visually colour different parts based on their type. I think it could work pretty well
@Alex McLean If you’re open to drawing a little sketch / storyboard of the idea, that would be cool. I think I get it, but I don’t know if I get it.
One thing I found nice about using text labels + language is that I could command and refer to objects offscreen or just connect them with a command (the block moves to <other thing>).
Yet, it’d be nice to have a light program representation to show all the different relations built-up. (The arcs / adjacency?)
I know, but I feel maybe the term was used in error. I’m really just talking about 3D interaction in the form of something like room or world-scale Dynamicland or whatever form of Augmented Reality you like.
Not spatial in the abstract sense.
I totally agree written code is “spatial.” goto / jumps are spatial.
I think "Dynamicland but over the internet in XR" is what we should all be aiming for. 🤗
I wanted to be able to have a tabletop discussion with midair graphics magic, potentially. Probably I need to create concept art. 🤣 oh well
Dave Ackley's work is a kind of "true" spatial computing that mimics the sort of "computation" you get in moulds/fungi youtube.com/watch?v=helScS3coAE
Edited version of the same video for the time poor; youtube.com/watch?v=lbgzXndaNKk&t=0s
(answering to the original prompt)
Thanks for opening this topic. I have been mulling and tinkering with those concepts for the past year or so but never felt ripe to discuss it here so I only annoyed my non-coding friends with this this so far 🙂. So here are my half-baked thoughts anyway:
I differentiate between spatial programming and spatial computing.
In spatial programming, we have more spatial, more visual ways to describe a program than a plain text file. E.g. all the nodes n' arrows / graph representation of computation but there are for sure other approaches as well. It is all about finding more visual, more humane ways of representing programs.
In spatial computing on the other hand, the atomic units of memory are spatial and the operations that transform them are too. E.g. the above mentioned robust-first programming, all kinds of cellular automata or Scott Kim's Viewpoint system (dissertation PDF)
It seems to me that there is far more exploration in the former field and very little interesting work in the latter. Why? Here is my theory:
When you build a spatial programming system, you have a vision and an intuition on how exactly it should feel to code and then you go there, use your barely-spatial, text-based programming toolbox to make this vision come true. You have a clear goal to work towards, a problem to solve, convergent thinking to do. This style of work is taught at the engineering departments. Those engineers are the ones with the knowledge to do the exploration in the first place. Also, this serves the ego very well. You have a vision and you leave an imprint on the world by making it come true.
When you build a spatial computing system, you don't build a product, you don't work towards a clear goal. What you do is you create a new medium in which to build. It is hard to predict what you or others are going to build in the new medium. The outcome and the benefits are a lot more nebulous. It requires more divergent thinking, which is more likely taught in art schools, not engineering schools. I view it more like the act of planting a seed and then modestly taking a step back and letting it grow on it's own. Less ego involved here.
I think we should find other, more spatial representations of programs, but those should build upon a spatial model of computation. Trying to build a spatial programming system straight on the basis of the conventional, text/byte based tools means going against the grain of that medium. We should first build a spatial medium for computation and from there, visual/spatial representations of programs will emerge naturally.
The work of Scott Kim is an awesome step in that direction, even though the system could not be programmed "from the inside". In Kim's Viewpoint system, the smallest unit of memory is a pixel, which, even if embedded into a 2D grid, is still tightly married to the idea of a byte.
To expand upon this, I started to work on a system where the lowest unit of memory is a 2D line segment with a set of tags. Line segments are addressed not by their distance to some nullptr or their Cartesian coordinates on a pixel grid but by queries on the 2D canvas (e.g. "all lines with tag X, left of line Y" or "all lines touching line X without tag Y") and the basic operations/the instruction set are all about geometric transformation of those line segments.
I should have a wasm-backed demo running with a cheatsheet, example code and some basic documentation by the end of next week for you to try out 🙂
I’m still a little unsure how to convey the type of computing that’s in 3D / “augmented reality” because I don’t really like the term augmented reality. It’s just… computing in interpersonal space?
This is what I mean. 🤣 I mean virtual/augmented reality in the sense of things floating in 3D between people having a conversation, but I am unhappy with being unable to express what I really mean.
I don’t really think this is the best that can be done, but it’s A thing that can be done. Just some really really rough gif
So people would have to wear XR glasses or what technology are you thinking about?
Thanks.
I think you probably would need something like XR glasses and/or projections. I’m not exactly a fan of forcing that, but let’s say for the sake of prototyping, yes.
I actually agree that a lot of things could be 2D projected onto the table and then “pulled out” of the plane if you want to manipulate it as an object in 3D. Not everything has to be 3D.
For my case: I would love to have a line computer drive a vector display and not a pixel display. Make that a laser projection system above the table together with a camera. This is not 3D and more like a lo-fi version of dynamicland. I don't really like XR glasses. They exclude everybody who does not wear them instead of bringing people together. The whole AR corner smells too much like escapism to me.
Hmm I see the pulling out part. But then you need fancy 3D tech again which destroys accessibility.
I think of XR glasses as a prototype.
Some select things are useful to be exclusive though. For example, a moderator or teacher interacting with students might want to be able to visualize things only they can see.
Asymmetry has its uses.
Also I don’t have resources to create a laser-mediated system. :(
Hmm I think the best way to restrict information from getting to the wrong eyes/privacy is just to go to a different room. Otherwise you create weird power dynamics in the room. But maybe we are thinking about different scenarios here.
I’m absolutely not a hw person but oh well.
I think XR still lets you envision the full range of possibilities.
Actually if you very specifically want to have a private conversation with someone limited to a small room or table, then the exclusivity aspect makes a little more sense.
Yes we’re thinking of different cases. Sometimes you want power dynamics. You already have it in a classroom. The professor has more controls.
But if we’re talking about always-on situations, that’s where it gets weird.
So I would agree that asymmetric XR is weird if it’s not isolated to specific rooms or situations in which the participants haven’t agreed to a power differential. If that makes sense.
Not a big fan of power dynamics in the classroom tbh 😬. Just because that is what we are used to does not mean it is good. Of course we can not get rid of it alltogether but we should be sensible about it especially when building tools.
I really like the dynamicland approach. Even though there is not 3D, it lets everyone participate equally.
In this case, I am thinking - the professor might have a list of names they can see, or a private annotation space. Controls for turning on and off simulations. Ms. Frizzle’s Magic School bus as an equivalent. She controls the journey. (If you know that series).
But students could also have their own private thinking spaces.
Sometimes we want privacy.
I’m aware this can be used for harm as well.
Not a fan of always-on surveillance cams kinds of stuff.
ah I see. For the case of the private information I would just opt for separate devices (just a dumb old tablet?)
I see no way around it though. Maybe there is a way to ensure that the video feed is live-processed in a sealed device that only outputs the metadata/distilled geometry that is needed for XR? But even that sounds sketchy and easy to abuse.
I really like the combination of the light pen and a vector display. It is so low-tech but so powerful at the same time.
I’m just not really in hw land to the point where I’d trust myself. I’d be more than happy to collaborate with people to find such solutions.
The camera’s actually being unlocked moreso now, but there’s no real way around it if you want to do some fancy inference on what you’re looking at. Another case of “this could be used for good or evil.”
XR would let me draw on any surface wherever I am, which is nicer than a tablet. But that could be solved if all the walls were made of a magic distributed multitouch surface that could be a sprayed-on coating. That’s cyber futuristic though.
I differentiate between spatial programming and spatial computing.
I really like this distinction.