Future of Coding History

Jason Morris 2024-05-27 18:03:17

In something like prolog, terms can be nested. So I can express the idea "Socrates believes that he is mortal" with bel ieves(socrates, mortal(socrates)). Are there any popular database types that make it easy to have relations of arbitrary arity as parameters of other relations, without unduly adding to the complexity of the schema? Preferably with ungrounded statements and open-world negation? Is there some obvious reason why not? Is there a computational complexity problem that arises in the real world? RDF allows triples to be referenced, I believe, but you are limited to arity 2, which seems needlessly limiting. Labeled graphs have arbitrary arity for non-entities, but entities are limited to two, and you usually can't refer to an edge. It seems... weird to me. Is it just that we don't really have the efficient reasoners over those kinds of expressions, so it hasn't been useful?

Nilesh Trivedi 2024-05-28 11:49:03

Sounds like a "nested hypergraph" (which I have never used before): arxiv.org/abs/2405.12235

📝 Hypergraph: A Unified and Uniform Definition with Application to Chemical Hypergraph

The conventional definition of hypergraph has two major issues: (1) there is not a standard definition of directed hypergraph and (2) there is not a formal definition of nested hypergraph. To resolve these issues, we propose a new definition of hypergraph that unifies the concepts of undirected, directed and nested hypergraphs, and that is uniform in using hyperedge as a single construct for representing high-order correlations among things, i.e., nodes and hyperedges. Specifically, we define a hyperedge to be a simple hyperedge, a nesting hyperedge, or a directed hyperedge. With this new definition, a hypergraph is nested if it has nesting hyperedge(s), and is directed if it has directed hyperedge(s). Otherwise, a hypergraph is a simple hypergraph. The uniformity and power of this new definition, with visualization, should facilitate the use of hypergraph for representing (hierarchical) high-order correlations in general and chemical systems in particular. Graph has been widely used as a mathematical structure for machine learning on molecular structures and 3D molecular geometries. However, graph has a major limitation: it can represent only pairwise correlations between nodes. Hypergraph extends graph with high-order correlations among nodes. This extension is significant or essential for machine learning on chemical systems. For molecules, this is significant as it allows the direct, explicit representation of multicenter bonds and molecular substructures. For chemical reactions, this is essential since most chemical reactions involve multiple participants. We propose the use of chemical hypergraph, a multilevel hypergraph with simple, nesting and directed hyperedges, as a single mathematical structure for representing chemical systems. We apply the new definition of hypergraph to chemical hypergraph and, as simplified versions, molecular hypergraph and chemical reaction hypergraph.

Nilesh Trivedi 2024-05-28 11:49:32

Nilesh Trivedi 2024-05-28 11:51:27

Might want to check out hypergraphdb.org

The unit of storage is a tuple made up of 0 or more other tuples.

Nilesh Trivedi 2024-05-28 12:02:12

This too seems similar: wiki.opencog.org/w/AtomSpace

It hints at one of the complexities involved. Some atoms become executable. For example: Queries themselves are graphs. Atomese language is Turing-complete.

Jason Morris 2024-05-28 14:06:49

I've looked at nested hypergraphs before, but atomspace is interesting, thanks!

Chris Knott 2024-05-28 21:00:24

Can't you do this just with a unique ID/foreign key...? BELIEVES is a relation that takes a tuple as the 2nd parameter.

(SOCRATES, IS, MORTAL), 123

(SOCRATES, BELIEVES, 123), 124

Denny Vrandečić 2024-05-29 18:09:52

RDF* seems to do it that way, putting names on subgraphs and referencing those.

Jason Morris 2024-05-29 18:12:18

@Chris, I'm not sure how that representation would distinguish between asserting the fact itself, and merely asserting that the fact is believed. Statement 124 should be possible to express even if statement 124 is not known to be true.

Chris Knott 2024-05-29 18:14:36

Yeah you would have to separately distinguish which statements are TRUE. This would just be a particular relation though? It's no different from which statements are FUNNY or INTERESTING

Chris Knott 2024-05-29 18:16:34

(124, HAS PROPERTY, TRUE/FUNNY etc), 461

Denny Vrandečić 2024-05-29 18:22:01

It is slightly different, because how do you know that 461 is true?

Chris Knott 2024-05-29 18:24:25

Yeah you're right in that if you want to do some kind of meta level inference there's a special kind of "true" which is USER thinks X is true

Chris Knott 2024-05-29 18:24:52

The system/UI would special case this

Chris Knott 2024-05-29 18:25:54

But you presumably want to preserve the ability to say that Socrates believes 1 = 2 is true etc though

Denny Vrandečić 2024-05-29 18:29:25

yes, but I am not sure it has to look the same as "these are the things the system should believe to be true"

Denny Vrandečić 2024-05-29 18:29:36

special casing the latter is probably OK

Jason Morris 2024-05-29 18:33:15

If I have to look to see if the statement exists, and then also check if it is listed as true, that's a lot of needless cognitive overhead for the human user. And as the nesting gets deeper, reading statements is going to get harder and harder.

Guyren Howe 2024-05-30 22:31:30

The reason the relational model forbids this — the reason it is restricted to first order logic — is that this runs into the halting problem.

Roughly, once relations can refer to other relations, they can form arbitrarily complex structures where queries never finish. The relational model is, almost mathematically so, the richest model you can have where queries always finish.

Note that in the above suggestions where you refer to other relations by a code, you have to go into your external, Turing-complete language to use this.

Not a reason a database should not have this feature, of course, but this might be a useful nuance to be aware of.

Jason Morris 2024-05-30 23:21:13

That helps explain it, thanks.

Denny Vrandečić 2024-05-31 00:54:12

Every programming language runs into the halting problem, and yet, we don't care. Why do we care for a knowledge language?

Guyren Howe 2024-05-31 01:00:30

SQL has poisoned the entire industry. So you’re used to thinking of a relational database as this heavyweight, fixed sort of thing with a terrible query language.

The correct way to understand the relational model is that it lets you separate your first order logic (FOL) computation from the stuff that should be (either logically or pragmatically) done in a Turing complete language.

The vast majority of the logic in just about any program can be written in FOL. Folks don’t even think about doing that because SQL is terrible.

One of the great features of the relational model is that precisely because of its limits, optimisation is a relatively simple problem. We should be able to use relations freely across all of our programming tasks. The result would be that you can just declaratively specify what comes from what, and the query optimiser will make it work efficiently.

These considerations still apply to SQL databases. If you execute a query, it will eventually finish and when it does, you’re guaranteed that if some data satisfies your query, it will be retrieved.

Guyren Howe 2024-05-31 01:02:26

If we use a database that is Turing complete, you can write something that is logically correct, but the query engine might not be able to determine that. It is a great virtue that if something is true of your data, if you can express it in FOL, your database can find it.

Jason Morris 2024-05-31 03:04:06

I have difficulty gauging how practical a virtue the guarantee of termination really is. I can end up with non-terminating code elsewhere, and I just fix it. Is the benefit perhaps that someone cannot add data that makes a query stop terminating that used to terminate? Because not a lot of people are letting users run arbitrary untested queries against their data. So it feels like you would know in advance it will terminate unless the data can stop that from happening, maybe?

Guyren Howe 2024-05-31 03:11:36

The relational model is, basically, “essence of declarative programming”. 2nd order logic, subject to halting problems, currently mostly needs to be written by humans. But the rest of it can be declarative — most of most programs, in fact. The part you can express in FOL benefits from this essence of declarative stuff insofar as you can just declare the rules of what facts follow from what other facts, and then the computer writes an efficient algorithm to implement the logic for you.

The relational model provides a theory for generating code from declarations about what is true and what follows from what.

Jason Morris 2024-05-31 03:25:48

Don't get me wrong. I'm sold on declarative logic code. Love it, terminating issues and all. Big fan. I just am not sold on the FOL limitation, and I'm having difficulty believing that anyone without a comp sci degree really cares about guaranteed termination that much, unless it is guaranteed against changes in the data.

Guyren Howe 2024-05-31 03:27:40

But that right there is one reason for the design. If you have a language in which someone ~can’t express~ a non-terminating query, they don’t have to understand any of that, but they know they have a completely reliable system that always answers their questions.

Jason Morris 2024-05-31 03:46:46

Regardless of the data, or just regardless of the query?

Tom Lieber 2024-05-31 04:10:58

I’m curious too. If every query has a 60-second deadline, valid FOL and SOL queries will both sometimes fail. How much does it matter why? Are the performance problems harder to diagnose and fix with the more complicated languages? Are they more common?

Tom Lieber 2024-05-31 04:12:07

I was playing with the idea of a query language the other day and realized that I was limiting myself to FOL. So I questioned the choice.

I came to realize that apps will need additional flexibility in how they process data before it can be presented, and limiting the query language wasn’t going to make that requirement go away. All I was doing was forcing algorithms to be split across database and client, instead of allowing the entire algorithm to be done in one language. Worse, if the seam is in the middle of an algorithm, I might be forcing multiple network calls, with all the de/serialization that entails, to satisfy one semantic query. For what? So that I can say that every operation my database supports is fast? Wouldn’t I rather say that my database is easy to use?

Tom Lieber 2024-05-31 04:14:15

I decided that I’d rather bring relational operations to a more complex language, and just isolate them enough that they can be plucked out by an optimizer.

Guyren Howe 2024-05-31 04:17:07

Aha! You discovered the weakness in my cunning ruse!

Not really. You can certainly write SQL queries that will time out. But in practice, those are rare, and if you had to solve the same problem in a Turing-complete language, there’s a good chance you have no other way to fix it.

Computation complexity is always with us, either way.

And in practice, there might be some things that are easier to write in a TC language. But I’d have to struggle with some quite unusual sort of data for a while to produce an example.

So: a relational database can represent most of most programs more cleanly and simply than any mainstream TC language can, and it does a lot of the programming for you, particularly in terms of working out what order it should process each part of your logic in. And, because it can do that dynamically and automatically, it can retain efficiencies even as you change the model, whereas the typical TC solution might need substantial re-engineering to retain efficiency.

Guyren Howe 2024-05-31 04:18:24

There is absolutely nothing wrong with a higher-order/TC query/logic language. Prolog is one such, is very useful for some types of problems, and is I believe easier for non-developers to learn than most TC languages. Mercury is another such, not so beginner friendly but very expressive and flexible.

Guyren Howe 2024-05-31 04:20:45

Codd’s original conception was that the relational database engine would be used in combination with a TC language. Predicates for filters, for example, are not part of the relational theory but are just supplied from a TC language.

So your idea for a language that contains both but separates them is not terrible. Although even with SQL, you can see the advantage of separating the relational side. One of the most underappreciated advantage of a separate relational storage engine is that you can use the same data store from multiple programs, written in different languages.

Guyren Howe 2024-05-31 05:12:54

SQLite is interesting, in that actually leans a fair way into injecting the TC language. The features are not as well known as they should be, and depend on whether the language embedding supports it, but in SQLite, the host application/language can potentially define:

regular functions;
aggregate functions; and;
(what most folks don’t know) custom type deserialisers.

Postgres has close similar features also.

Jason Morris 2024-05-31 15:39:37

The fact that all queries terminate isn't the actual benefit. As a result of that fact, the user can alter the data arbitrarily, and there is no possibility that by doing so they cause queries to become non-terminating. That's the actual upside. The user can't break it. If you are using a higher-order system, you have to either a) be careful not to use potentially infinite structures in your queries (in which case why not just use FPL), or b) guard against the user adding data that causes infinite loops. For my use case, b) was already taken care of in the context of the limited queries I have. Thanks for the enlightening chat!

Guyren Howe 2024-05-31 19:12:21

You certainly need TC behaviour to be available. But separating what you can into the relational model has enormous benefits in terms of simplicity and flexibility.

You are viewing archived messages. Go here to search the history.

You are viewing archived messages.
Go here to search the history.