You are viewing archived messages.
Go here to search the history.

Mariano Guerra 2024-10-07 08:55:45

It takes two to think

Despite the variety of creative approaches practiced by different scientists, one tried-and-true though often overlooked — trick for generating new ideas stands out. It may sound trivial, yet it is as reliable as it is simple: talk to someone. By talking with other people, we not only pool the information or ideas that each of us individually lacks, but we are also able to improvise new thoughts that are not accessible to us alone. In this episode, Itai Yanai and Martin Lercher talk through the ideas in two of their editorials (available at nature.com/articles/s41587-023-02074-2 and genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02575-w ).

GZLj117WUAAHafh.jpeg

Mariano Guerra 2024-10-07 18:13:42

do you know of a good formal specification of the semantics of spreadsheet evaluation?

Mariano Guerra 2024-10-07 19:15:32

(re)found this, which has a lot of references: lord.io/spreadsheets

Tom Larkworthy 2024-10-07 19:27:56

Topological sort of dependancies is the v0 for spreadsheets, but actual excel supports closed loops a.k.a. circ 's basta.substack.com/p/no-sacred-masterpieces which can be used to implement iterative converging algorithms

"We use a circular reference in Excel to do linear regression."

📝 No sacred masterpieces

Or "that time I built Excel for Uber and they ditched it like a week after launch"

Francisco Garau 2024-10-13 12:57:28

Have a look at the work from Umut Acar on Self Adjusting Computations.

cs.cmu.edu/~rwh/students/acar.pdf

The work is cited by the Jane Street guys who implemented Incremental based on those ideas.

janestreet.com/tech-talks/seven-implementations-of-incremental

Or the Dependency Graph implemented by the same guys first in Goldman Sachs, then JP Morgan, Bank of America and now Beacon

beacon.io/what-are-dependency-graphs-and-why-are-they-important

📝 Seven Implementations of Incremental :: Jane Street

Jane Street is a quantitative trading firm and liquidity provider with a unique focus on technology and collaborative problem solving.

📝 What Are Dependency Graphs and Why Are They Important?

Dependency graphs in financial services are powerful tools to analyze dependencies, recalculate values, and find & act on data quickly. Learn why with Google!

Alex McLean 2024-10-09 16:14:18

I've realised what dynamic media and algorave have in common - they're both 80% community development and 15% projector placement

image.png

Orion Reed 2024-10-09 16:16:50

@Andrés Cuervo @Omar Rizwan you’re being called out

Orion Reed 2024-10-09 16:20:02

What’s the other 5% ?

Alex McLean 2024-10-09 16:50:40

A small matter of programming ;)

Brian Hempel 2024-10-10 00:27:01

Is there a citation/reference or name for the idea of the gap between the power of spreadsheets and the power of general purpose programming?

image.png

Ivan Reese 2024-10-10 00:52:24

Right off the bat: I don't have a good citation. But I will allow myself to be nerd-sniped by the graphic. Please feel free to ignore this digression.

Do you have a particular meaning of "power" in mind? Like, is it about the range of computations that can be expressed? (eg: something like Chomsky Hierarchy.) Spreadsheets can be Turing complete, after all.

Between, ASM and C++, which is more difficult and which is more powerful? What about TypeScript and JavaScript, or Agda and TCL?

Brian Hempel 2024-10-10 01:03:03

Haha, maybe Power = number of tasks that are solvable within a reasonable cognitive energy budget.

Tom Larkworthy 2024-10-10 04:23:33

I think maybe its how well it "scales" to "deep" problems. Coding can coordinate large teams on year long software projects. Spreadsheets hit a ceiling of complexity so are stuck but very efficient at solving shallow problems,

Konrad Hinsen 2024-10-10 06:26:39

"Number of tasks" is poorly defined. Tasks are not enumerable. I'd even claim that "power" is poorly defined as a property of a tool. Tom Larkworthy's depth is a much more productive criterion for classification. Another one, somewhat similar, is problem complexity.

Tom Larkworthy 2024-10-10 06:43:44

I was thinking logical depth when I said it, which is a measure of problem complexity. Code can mirror the underlying depth, but spreadsheets have a fixed number of sub-sheets so there is a limit to the recursive depth of divide-and-conquor solutions. Notebooks that can import other notebooks seem like a good middle ground and thats why I love Observable, and why it can scale!

Denny Vrandečić 2024-10-10 07:32:47

"uncanny valley of programming"

Alex McLean 2024-10-10 07:35:43

My contribution to this debate is that spreadsheets are coding. Furthermore "general purpose programming" is a myth, perpetuated on the basis that the established way of doing things is inevitable and universal.

Stefan Lesser 2024-10-10 08:10:26

Certainly not a reference for what you’re looking for, but you just reminded me that I did a short presentation at the Recurse Center in 2017 about something very similar. Here are the slides, it’s only 6 of them.

I chose difficulty and expressiveness for the axes, and explained both briefly on slide 2. If you’re confused about the planets, so am I; I think that was part of a joke because everyone was waiting for me to declare which programming language goes where, but that would’ve obviously derailed the conversation and my point was that in the grand scheme of things their differences are just a small part of the whole design space. So I switched programming languages for something neutral.

Notably, I didn’t put spreadsheets in there. Where would they go in this?

Complexity

Dan Swirsky 2024-10-10 08:32:09

I am currently “living” in the gap, tinkering with filling it by extending the spreadsheet paradigm beyond rows and columns of text, numbers, and dates. As I envision it, cells are described using text, where each cell represents a different application component, including data, GUI widgets, and even code. Groups of cells may be arranged hierarchically, such as a window-type parent cell and its child GUI widget cells. Different cell types have different attributes. Date-type cells have a value attribute that represents the date, a date format attribute, etc. GUI widget-type cells have display-type attributes. Code-type cells have code-triggering attributes (e.g., time-based, cell attribute value-based, etc.) As in a spreadsheet, cell attribute values are defined using literals or reactive formulas that may depend on other cell attribute values. As in a spreadsheet, the application “runs” when cell attribute values change and other cell attribute values are recalculated. See github.com/Danskydan/Hilltop for a brief write-up.

Duncan Cragg 2024-10-10 09:12:24

Dan Swirsky is it circular? i.e. Turing Complete?

Duncan Cragg 2024-10-10 09:15:07

To refer back to the "gap" graphic, I think the goal would be to avoid the gap all together, by simply going near-vertically up from that dot under the "s". There are easy first steps towards that: (1) drop the grid and have attribute-value blocks (2) allow circular dependency by default between blocks

Tomas Petricek 2024-10-10 09:24:32

I was looking for a citation for this picture the other day - I first saw this in Jonathan Edwards video "Transcript: End-User Programming Of Social Apps." from YOW 2015 conference. The youtube link youtube.com/watch?v=XBpwysZtkkQ is now dead though 😞

Tomas Petricek 2024-10-10 09:27:10

(although I guess this picture shows a bit different things than the one I'm using - I have a much less nice one! My use of this is in a thing I'm writing right now - page 7 at github.com/tpetricek/habilitation/blob/master/thesis.pdf )

Denny Vrandečić 2024-10-10 09:29:27

funny, due to the URL it looks like it is a master thesis, not a habilitation thesis 🙂

Dan Swirsky 2024-10-10 09:49:29

Duncan Cragg, at this stage I'm concentrating on fleshing out the language, which I hope will be Turing complete. I believe that Excel allows circular references if required, so this would be an implementation issue for greater minds than mine.

Duncan Cragg 2024-10-10 10:54:06

not too hard actually, just need a fixpoint detector and iteration limit

Guyren Howe 2024-10-10 15:20:30

Although Excel is technically Turing-Complete, that’s not how folks use it. The gap is that between First Order/declarative and Higher Order/Turing Complete.

Ivan Reese 2024-10-10 16:14:29

Right—they don't use excel as a TC lang because it's too difficult . So I think each tool deserves a curve spanning from zero to infinity along the x axis, showing how much power it gives you if you're willing to tolerate that amount of difficulty.

Brian Hempel 2024-10-10 16:53:00

There’s this fig from 1990 by MacLean et al, User-Tailorable Systems: Pressing the Issues with Buttons. Same idea but no spreadsheets (and same axes, but swapped).

image.png

Jari 2024-10-10 17:41:33

Another dimension to think about: spreadseets are easy to understand when the amount of data is small. But happen to your sheet when there are over one million rows. I am curretly tinkering in the gap and I experience that I am mainly writing excel-like functions which use variables. But let's make it one billion rows: my philosophical question is that why sheets stop working when the amount of data increases?

Ivan Reese 2024-10-10 17:48:38

Great point — spreadsheets are less difficult if you mostly care about the data you have, and want to work directly with that data, and the transformations are simple. But if you need complex transformations, conventional programming languages get you there, but also make you surrender the visibility of your data.

Data visibility gives you a power boost in the low end of the difficulty spectrum. Abstraction gives you a power boost in the high end.

Jari 2024-10-10 17:55:06

Yes indeed. another of my favorite tools is Jupyter Notebook. We can learn from it too by understanding that you can have visual clarity and stepwise feedback even if algorithm is really complex like ml algos

Ivan Reese 2024-10-10 17:58:31

Here's a completely uncontroversial figure. Cite as "Reese, et hell (2024)"

Screenshot 2024-10-10 at 11.56.41 AM.png

Konrad Hinsen 2024-10-11 06:47:53

@Jari Notebooks work for tasks that have a coarse-grained linear structure. The code in the cells usually calls library functions that are almost black-box code (even itf its Open Source, few notebook users can figure out how to see it), which is why you get only coarse-grained explanations.

Maybe more importantly, notebooks emphasize code whereas spreadsheets emphasize the input data. It would nice to be able to mix both, at the discretion of the user.

Tom Larkworthy 2024-10-11 07:06:08

Yeah I have been incredibly dissapointed at the low number of people who edit or read the code in notebooks. Makes me question whether end user programmable is that much of a useful direction overall. Sometimes I think its a UX thing, but actually I just think cognitive overhead of orientating yourself in a foreign codebase, or just not being able to read code (non-pgorammer or unfamiliar language). However, the 1% of people who do read code find it very useful. I actually think code reuse through linked notebooks is also only useful for the individual, but copy and pasting working code across notebooks between teams is more prevalent, because new contexts for code have slightly different requirements and emphasis, so its basically its better to copy and paste code and trim to the situational need than to try to build (bloated) reusable libraries.

I don't think notebooks are code-first though, they are sparse-complex values-first (list of objects/markdown), whereas spreadsheets are dense-simple values-first (matrix of scalars). You said yourself people don't read the code in notebooks.

Jason Morris 2024-10-11 16:40:18

Ivan Reese more "envelope as infographic," please.

Jonathan Edwards 2024-10-11 20:19:39

From that Yow 2015 talk

image.png

Konrad Hinsen 2024-10-12 08:03:32

Tom Larkworthy I think a big issue with today's incarnations of Notebooks is that they don't have a clear role. I see them advertised for two very different roles, which however have conflicting requirements:

  • An interactive tool for explorative programming.
  • A computational document for publication.

They do a better job at 1 than at 2, largely because prominently displayed code matters for 1, whereas the reader of a computational document would prefer to understand the data-based narrative first and only then (or maybe never) look at the code for a deeper understanding.