saw this new paper by Jonathan Edwards and @Tomas Petricek
still reading it - very fun, discusses version control for structured data and schema evolution
Baseline: Operation-Based Evolution and Versioning of Data
<snipped in favor of slack provided summary below>
looks like it builds on their previous work, eg alarmingdevelopment.org/?p=1716
i think schema evolution is one of the unsolved problems so glad to see this. i'm also in the "store structures not text blobs" camp so happy to see a framework in place that doesn't require anyone to invent the entire representation stack there.
📝 Baseline: Operation-Based Evolution and Versioning of Data
Baseline is a platform for richly structured data supporting change in multiple dimensions: mutation over time, collaboration across space, and evolution through design changes. It is built upon Operational Differencing, a new technique for managing data in terms of high-level operations that include refactorings and schema changes. We use operational differencing to construct an operation-based form of version control on data structures used in programming languages and relational databases. This approach to data version control does fine-grained diffing and merging despite intervening structural transformations like schema changes. It offers users a simplified conceptual model of version control for ad hoc usage: There is no repo; Branching is just copying. The informaton maintained in a repo can be synthesized more precisely from the append-only histories of branches. Branches can be flexibly shared as is commonly done with document files, except with the added benefit of diffing and merging. We conjecture that queries can be operationalized into a sequence of schema and data operations. We develop that idea on a query language fragment containing selects and joins. Operationalized queries are represented as a future timeline that is speculatively executed as a branch off of the present state, returning a value from its hypothetical future. Operationalized queries get rewritten to accommodate schema change "for free" by the machinery of operational differencing. Altogether we develop solutions to four of the eight challenge problems of schema evolution identified in a recent paper.
📝 Operational Version Control
Abstract of a talk I just gave: It would be useful to have version control like git but for data structures, particularly the data structures we call spreadsheets, databases, and ASTs. Operational …
i recommend this paper to anyone doing local-first, structured-editing, or generally inventing a structured store that needs object versioning.
my comments:
- formalized timelines are really aesthetically pleasing, specifically the transfer of operations across timelines.
- it handles merging where one timeline does a value replacement, while another does a type transform. i dont think crdts have anything quite like this:
- timeline-1:
{key1: 'a1'}->{key1: 'a2'} - timeline-2:
{key1: 'a1'}-(wrap in a list)->{key1: ['a1']} - merged:
{key1: ['a2']} - even works if timeline-2 appends
'b1'before merging. merge produces{key1: ['a2', 'b1']}.
- there is a notion of identity in this model that i’d like to examine. eg when
a1is replaced witha2the containing structure “remembers” which specific value was replaced by using a unique ID.
- even for lists, a unique ID is attached to each value so you can figure out how an operation on that value in one timeline applies to a forked list where that specific value has since moved.
- i feel the idea can be generalized to more than records and lists. consider if we think about a general object
Vas containing slots that have identity. slots hold values. transformations move slots around, add and remove slots or replace slot values. the projection and retraction methods may need to be supplied byVitself. anyway, haven’t fully worked this out. - the paper focusses on small data where each copy carries around the full history, and doesn’t need a central repo. however, the model could easily work with a central repo storing the committed or even parallel timelines for objects. local objects only need to hold zero or small histories with a pointer back to the committed timelines. if we generalize the fork/merge model here, it could apply to even large object graphs, parts of which get forked and then recombined.
overall quite fascinating. there’s a big section on related work i haven’t caught up on.
![[Onward!'25] Let's Take Esoteric Programming Languages Seriously](https://i.ytimg.com/vi/ewEiwdEApLc/hqdefault.jpg)