logseq-sync

module
v0.0.0-...-b1f9b26 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 3, 2024 License: MIT

README

Logseq Sync

An attempt at an open-source version of the Logseq Sync service, intended for individual, self-hosted use.

It's vaguely functional (see What Works? below), but decidedly pre-alpha software. Definitely don't try to point a real, populated Logseq client at it, I have no idea what will happen.

What's Done/Exists?

Right now, the repo contains (in cmd/server) a mostly implemented version of the Logseq API, including credentialed blob uploads, signed blob downloads, a SQLite database for persistence, and most of the API surface at least somewhat implemented.

Currently, running any of this requires a modified version of the Logseq codebase (here), and the @logseq/rsapi package (here)

On that note, many thanks to the Logseq Team for open-sourcing rsapi recently, it made this project significantly easier to work with.

What Works?

With a modified Logseq, you can use the local server to

  1. Create a graph
  2. Upload (passphrase-encrypted) encryption keys
  3. Get temporary AWS credentials to upload your encrypted files to your private S3 bucket
  4. Upload your encrypted files

And that's basically the full end-to-end flow! The big remaining things are:

  • Implement the WebSockets protocol
  • Figure out how/when to increment the transaction (tx) counter
API Documentation

There's some documentation for the API in docs/API.md. This is the area I could benefit the most from having more information/help on, see Contributing below

Open Questions

S3 API

The real Logseq Sync API gets temp S3 credentials and uploads files direct to S3. I haven't looked closely enough to see if we can swap this out for something S3-compatible like s3proxy or MinIO, see #2 for a bit more discussion.

Currently, amazonaws.com is hardcoded in the client, so that'll be part of a larger discussion on how to make all of this configurable in the long run.

Associated Changes to Logseq

Being able to connect to a self-hosted sync server requires some changes to Logseq as well, namely to specify where your sync server can be accessed. Those changes are in a rough, non-functional state here: https://github.com/logseq/logseq/compare/master...bcspragu:logseq:brandon/settings-hack

Adding a database migration

The self-hosted sync backend has rudimentary support for persistence in a SQLite database. We use sqlc to do Go codegen for SQL queries, and Atlas to manage generating diffs.

The process for changing the database schema looks like:

  1. Update db/sqlite/schema.sql with your desired changes
  2. Run ./scripts/add_migration.sh <name of migration> to generate the relevant migration
  3. Run ./scripts/apply_migrations.sh to apply the migrations to your SQLite database
Why do it this way?

With this workflow, the db/sqlite/migrations/ directory is more or less unused by both sqlc and the actual server program. The reason it's structured this way is to keep a more reviewable audit log of the changes to a database, which a single schema.sql doesn't give you.

Contributing

If you're interested in contributing, thanks! I sincerely appreciate it. There's a few main avenues for contributions:

Getting official buy-in from Logseq

The main blocker right now is getting buy-in from the Logseq team, as I don't want to do the work to add self-hosting settings to the Logseq codebase if they won't be accepted upstream. I've raised the question on the Logseq forums, as well as in a GitHub Discussion on the Logseq repo, but have received no official response.

Understanding/documenting the API

One area where I would love help is specifying the official API more accurately. My API docs are based on a dataset of one, my own account. So there are areas that are underspecified, unknown, or where I just don't understand the flow. Any help there would be great!

Specifically, I'd like to understand:

  1. The details of the WebSocket protocol (doc started here), and
  2. How and when to update the transaction counter, tx in the API
Debugging S3 signature issues

I believe there's a bug (filed upstream, initially here) in the s3-presign crate used by Logseq's rsapi component, which handles the actual sync protocol bits (encryption, key generation, S3 upload, etc).

The bug causes flaky uploads with self-hosted, AWS-backed (i.e. S3 + STS) servers, but I haven't had the time to investigate the exact root cause. The source code for the s3-presign crate is available here, the GitHub repo itself doesn't appear to be public.

Directories

Path Synopsis
Package blob defines domain types for interacting with blob storage.
Package blob defines domain types for interacting with blob storage.
awsblob
Package awsblob provides the blob operations needed by Logseq Sync, backed by AWS's S3.
Package awsblob provides the blob operations needed by Logseq Sync, backed by AWS's S3.
cmd
server
Command server will eventually feature a self-contained Logseq Sync service.
Command server will eventually feature a self-contained Logseq Sync service.
tools/signtest
Command signtest is a quick tool for testing the generation and use of presigned S3 upload URLs.
Command signtest is a quick tool for testing the generation and use of presigned S3 upload URLs.
db
Package db contains domain types for working with persisted Logseq data.
Package db contains domain types for working with persisted Logseq data.
mem
Package mem implements an in-memory version of our DB interface, for quick iteration and local testing.
Package mem implements an in-memory version of our DB interface, for quick iteration and local testing.
sqlite
Package sqlite provides a thin wrapper over the sqlc-generated SQLite wrapper to adhere to our server's DB interface.
Package sqlite provides a thin wrapper over the sqlc-generated SQLite wrapper to adhere to our server's DB interface.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL