NHacker Next
login
▲Datalog in Rustgithub.com
307 points by brson 1 days ago | 34 comments
Loading comments...
Leynos 1 days ago [-]
It's funny seeing this as the top story.

I'm in the middle of putting together a realtime strategy game using Differential Datalog[1] and Rust, with DDL managing the game's logic. Mostly as an excuse to expose myself to new ideas and engage in a whole lot of yak shaving.

[1] https://github.com/vmware-archive/differential-datalog

lsuresh 2 hours ago [-]
That's a cool demo you're building using ddlog! FWIW, we, the ddlog team have moved on to found Feldera (https://github.com/feldera/feldera). You could consider using DBSP directly through Rust.
foota 12 hours ago [-]
I wonder if you could make a frankenstein version of differential datalog by combing the OP repo with salsa[1] (the crate that powers rust-analyzer)

[1] https://github.com/salsa-rs/salsa

Yoric 1 days ago [-]
On, nice!

I'll be interested in reading how this goes!

cmrdporcupine 1 days ago [-]
Very cool, I'm curious to see what the state of that implementation is and how far you get, since DDLog is not being actively maintained anymore.
akavel 2 hours ago [-]
I had some light contact with Prolog long ago during my studies - I have a rough idea how it is used and what it can be useful for, but only on surface, not deep at all. I keep hearing about Datalog since, as some amazing thing, but I don't seem able to understand what it is - i.e. to grasp an answer to a simple question:

what is that Datalog improves over Prolog?

Just now I tried to skim the Wikipedia page of Datalog; the vague theory I'm getting from it, is that maybe Prolog has relatively poor performance, whereas Datalog dramatically improves performance (presumably allowing much bigger datasets and much more parallelized processing), at the cost of reducing expressiveness and features in some other important ways? (including making it no longer Turing-complete?) Is that what it's about, or am I completely missing the mark?

codesnik 2 hours ago [-]
from what I know, prolog looked declarative, in a way that you just encode relations and it figures out the answers, but it really depended on the order of those rules, and some additional instructions like "cut" which not only prevented waste computations, but could affect the results.

datalog on the other hands is more or less a relation db with a different syntax.

burakemir 23 hours ago [-]
I made some progress porting mangle datalog to Rust https://github.com/google/mangle/tree/main/rust - it is in the same repo as the golang implementation.

It is slow going, partly since it is not a priority, partly because I suffer from second system syndrome. Mangle Rust should deal with any size data through getting and writing facts to disk via memory mapping. The golang implementation is in-memory.

This post is nice because it parses datalog and mentions the LSM tree, and much easier to follow than the data frog stuff.

There are very many datalog implementations in Rust (ascent, crepe) that use proc-macros. The downside is that they won't handle getting queries at runtime. For the static analysis use case where queries/programs are fixed, the proc macro approach might be better.

tulio_ribeiro 1 days ago [-]
"I, a notorious villain, was invited for what I was half sure was my long-due comeuppance." -- Best opening line of a technical blog post I've read all year.

The narrator's interjections were a great touch. It's rare to see a post that is this technically deep but also so fun to read. The journey through optimizing the aliasing query felt like a detective story. We, the readers, were right there with you, groaning at the 50GB memory usage and cheering when you got it down to 5GB.

Fantastic work, both on the code and the prose.

baby 1 days ago [-]
[flagged]
Philpax 1 days ago [-]
No offence, but your comment violates the HN guidelines: https://news.ycombinator.com/newsguidelines.html
maweki 23 hours ago [-]
It is nice to see a core group of Datalog enthusiasts persist, even though the current Datalog revival seems to be on the decline. The recent Datalog 2.0 conference was quite small compared to previous years and the second HYTRADBOI conference was very light on Datalog as well, while the first one had a quarter of submissions with Datalog connection.

I'm encouraged by the other commenters sharing their recent Datalog projects. I am currently building a set of data quality pipelines for a legacy SQL database in preparation of a huge software migration.

We find Datalog much more useful in identifying and looking for data quality issues thatn SQL, as the queries can be incredibly readable when well-structured.

kmicinski 23 hours ago [-]
No offense, but I wouldn't take Datalog 2.0's small attendance as an exemplar of Datalog's decline, even if I agree with that high-level point. Datalog 2.0 is a satellite workshop of LPNMR, a relatively-unknown European conference that was randomly held in Dallas. I myself attended Datalog 2.0 and also felt the event felt relatively sparse. I also had a paper (not my primary work, the first author is the real wizard of course :-) at the workshop. I myself saw relatively few folks in that space even attending that event--with the notable exception of some European folks (e.g., introducing the Nemo solver).

All of this is to say, I think Datalog 2.0's sparse attendance this year may be more indicative of the fact that it is a satellite workshop of an already-lesser-prestigious conference (itself not even the main event! That was ICLP!) rather than a lack of Datalog implementation excitement.

For what it's worth, none of what I'm saying is meant to rebut your high-level point that there is little novelty left in implementing raw Datalog engines. Of course I agree, the research space has moved far beyond that (arguably it did a while ago) and into more exotic problems involving things like streaming (HydroFlow), choice (Dusa), things that get closer to the general chase (e.g., Egglog's chase engine), etc. I don't think anyone disagrees that vanilla Datalog is boring, it's just that monotonic, chain-forward saturation (Horn clauses!) are a rich baseline with a well-understood engineering landscape (esp in the high-performance space) to build out more interesting theories (semirings, Z-sets, etc..).

BartjeD 9 hours ago [-]
The USA is a hostile environment to visit, for Europeans. It's a theme all over western and northern Europe to avoid it.

If you were expecting European attendance, that would explain why it's different from the past.

It's the same in science circles, tourism and even business travel.

freilanzer 2 hours ago [-]
As a European, I'm definitely avoiding the US. I was asked whether I wanted to visit our US branch and I declined - which was simply accepted and probably expected in advance. Almost everyone I know does the same.
banana_feather 22 hours ago [-]
I like the author's datalog work generally, but I really wish his introductory material did not teach using binary join, which I found to get very messy internally as soon as you get away from the ideal case. I found the generic join style methods to be much, much simpler to generalize in one's head (see https://en.wikipedia.org/wiki/Worst-case_optimal_join_algori...).
davery22 20 hours ago [-]
related: McSherry's preceding blog post was all about demonstrating how binary joins can achieve worst-case optimal runtime, given suitable adjustments to the query plan.

- https://github.com/frankmcsherry/blog/blob/master/posts/2025...

kmicinski 20 hours ago [-]
For materialization-heavy workloads (program analysis, etc.), we often find that optimized binary join plans (e.g., profile-optimized, hand-optimized, etc.) beat worst-case optimal plans due to the ability to get better scalability (less locking) without the need to use a trie-based representation. Within the space of worst-case optimal plans, there are still lots of choices: but a bad worst-case optimal plan can often beat a bad (randomly-chosen) binary plan. And of course (the whole point of this exercise), there are some queries where every binary plan explodes and you do need WCOJ. There's also some work on making more traditional binary joins robust (https://db.in.tum.de/people/sites/birler/papers/diamond.pdf), among other interesting work (https://arxiv.org/html/2502.15181v1). Effectively parallelizing WCOJs is still an open problem as far as I am aware (at least, this is what folks working on it tell me), but there are some exciting potential directions in tackling that that several folks are working on I believe.
rnikander 20 hours ago [-]
Some Clojure fans once told me they thought datalog was better than SQL and it was a shame that the relational DBs all used SQL. I never dug into it enough to find out why they thought that way.
kragen 14 hours ago [-]
Basically Datalog is much less verbose than SQL, imposes much lighter tariffs on factoring out views, and supports transitive closure enormously better. I started http://canonical.org/~kragen/binary-relations off with a simple nonrecursive query for which the SQL translation (given below) is already criminal and whose properly factored SQL solution merits the death penalty.

Recent additions to ANSI SQL have added the capacity for recursion, so it's no longer completely impossible. But they have three big disadvantages:

1. They accidentally made SQL Turing-complete. Datalog queries, by contrast, are guaranteed to terminate.

2. They're still extremely clumsy to use.

3. Because of #1, they're often not implemented fully, so they are hard to rely on.

ulrikrasmussen 9 hours ago [-]
Yes, #1 basically means that they screwed up the design from the get go, since it is impossible to reap the actual benefits of Datalog when the language you evaluate is not, in fact, Datalog. Recursive queries have the ability to perform arbitrary computation in projections, so for starters any top-down evaluation strategy or hybrid evaluation such as magic sets is ruled out.
jitl 17 hours ago [-]
I struggle to understand the Clojure/Datomic dialect, but I agree generally. I recommend Percival for playing around with Datalog in a friendly notebook environment online: https://percival.ink/

Although there’s no “ANSI SQL” equivalent standard across Datalog implementations, once you get a hang of the core idea it’s not too hard to understand another Datalog.

I started a Percival fork that compiles the Datalog to SQLite, if you want to check out how the two can express the same thing: https://percival.jake.tl/ (unfinished when it comes to aggregates and more advanced joins but the basic forms work okay). Logica is a much more serious / complete Datalog->SQL compiler written by a Google researcher that compiles to BigTable, DuckDB, and a few other SQL dialects (https://logica.dev/).

One area Datalog is an order of magnitude easier is when working with recursive queries / rules; this is possible in SQL but feels a bit like drinking playdough through a straw. Frank’s Materialize.com has a “WITH MUTUALLY RECURSIVE” SQL form (https://materialize.com/blog/recursion-in-materialize/) that’s much nicer than the ancient ANSI SQL recursive approach, we’re evaluating it for page load queries & data sync at Notion.

Feldera has a similar form for recursive views as well (https://www.feldera.com/blog/recursive-sql-queries-in-felder...). I like that Feldera lets you make each “rule” or subview its own statement rather than needing to pack everything into a single huge statement. Main downside I found when testing Feldera is that their SQL dialect has a bunch of limitations inherited from Apache Calcite, the Materialize SQL dialect tries very hard to be PostgresSQL compatible.

ben_pfaff 14 hours ago [-]
> Main downside I found when testing Feldera is that their SQL dialect has a bunch of limitations inherited from Apache Calcite

At Feldera, we're adding features to our SQL over time, by contributing them upstream to Calcite, making it better for everyone. Mihai Budiu, who is the author of the Feldera SQL compiler, is a Calcite committer.

jitl 11 hours ago [-]
Thanks for contributing. I see Mihai implemented the UUID type in Calcite (https://issues.apache.org/jira/browse/CALCITE-6738) back in January which is one of the issues I hit, so for sure my experience with Feldera is 6 months out of date and y'all move pretty quick.

Most of what I mean is places where Feldera/Calcite has slightly different syntax from Postgres for things. For example, Postgres syntax for cast to bigint is `some_expresion::bigint` although Postgres also supports ANSI SQL `CAST(some_expression AS bigint)`, most examples I find in the wild and in my own Postgres SQL use the Postgres special syntax. JSON syntax also differs; Feldera uses its own pretty elegant `VARIANT` type and `some_expression[key_expression]` to access properties, where Postgres calls this `json` or `jsonb`, and uses `some_expression->key_expression` to access properties. In those cases it's not like Feldera is wrong or lacks some support, but it's a bit harder to work with for me because I'm so used to Postgres syntax and I need to do some regex replace whenever I bring a query from Postgres over to Feldera.

Definitely not a deal-breaker, I am a Feldera enjoyer, but it does add some friction.

lsuresh 2 hours ago [-]
Thanks for the kind words. :) We hear you on the dialect differences.

An interesting case of a user dealing with this problem: they use LLMs to mass migrate SparkSQL code over to Feldera (it's often json-related constructs as you also ran into). They then verify that both their original warehouse and Feldera compute the same results for the same inputs to ensure correctness.

rienbdj 1 days ago [-]
A new McSharry post! Excellent

Last I checked, VMWare had moved away from differential datalog?

jitl 1 days ago [-]
The Differential Datalog team founded Feldera: https://www.feldera.com/

They switched from differential Datalog to differential SQL, I think because they realized Datalog is a really tough sell.

rebanevapustus 1 days ago [-]
They did, and their product is great.

It is the only database/query engine that allows you to use the same SQL for both batch and streaming (with UDFs).

I have made an accessible version of a subset of Differential Dataflow (DBSP) in Python right here: https://github.com/brurucy/pydbsp

DBSP is so expressive that I have implemented a fully incremental dynamic datalog engine as a DBSP program.

Think of SQL/Datalog where the query can change in runtime, and the changes themselves (program diffs) are incrementally computed: https://github.com/brurucy/pydbsp/blob/master/notebooks/data...

gunnarmorling 23 hours ago [-]
> It is the only database/query engine that allows you to use the same SQL for both batch and streaming (with UDFs).

Flink SQL also checks that box.

jitl 20 hours ago [-]
Flink SQL is quite limited compared to Feldera/DBSP or Frank’s Materialize.com, and has some correctness limitations: it’s “eventually consistent” but until you stop the data it’s unlikely to ever be actually correct when working with streaming joins. https://www.scattered-thoughts.net/writing/internal-consiste...
rebanevapustus 23 hours ago [-]
Not true.

There has to be some change in the code, and they will not share the same semantics (and perhaps won't work when retractions/deletions also appear whilst streaming). And let's not even get to the leaky abstractions for good performance (watermarks et al).

29athrowaway 1 days ago [-]
If you wish to use Datalog and Rust, cozodb is written in Rust and has a Datalog query syntax.
jitl 23 hours ago [-]
Cozodb seems cool but also inactive. I poked around about in November 2024 and found some low hanging fruit in the sqlite storage backend: https://github.com/cozodb/cozo/issues/285
29athrowaway 19 hours ago [-]
It's not a lot of code so it's easy to tinker with.
rc00 1 days ago [-]
Posted 1 day ago

https://news.ycombinator.com/item?id=44274592