Category: Linux

  • How I recovered from Mastodon/Fediverse (actually Akkoma) database corruption

    I moved domains again, this time from sour.coffee to momandpop.network. I also run Fediverse servers.

    Why so quickly? In the typical AuDHD fashion, I didn’t like sour.coffee too much.

    I run a VPS host for a living, and have two VPSes running not-Tor exit relays: one running my legacy neelc.org domain (running Rocky Linux 9) and one running the newer domains (running Rocky Linux 10).

    I decided to “migrate” to momandpop.network by setting up Akkoma on the “legacy” VPS via Docker. Initially, it went smoothly. Well, until I wanted to migrate to the non-legacy VPS.

    What did I encounter?

    db-1      | chmod: /var/run/postgresql: Operation not permitted
    db-1      | 
    db-1      | PostgreSQL Database directory appears to contain a database; Skipping initialization
    db-1      | 
    db-1      | 2026-04-05 22:03:26.747 UTC [1] LOG:  starting PostgreSQL 14.22 on x86_64-pc-linux-musl, compiled by gcc (Alpine 15.2.0) 15.2.0, 64-bit
    db-1      | 2026-04-05 22:03:26.747 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
    db-1      | 2026-04-05 22:03:26.747 UTC [1] LOG:  listening on IPv6 address "::", port 5432
    db-1      | 2026-04-05 22:03:26.748 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
    db-1      | 2026-04-05 22:03:26.750 UTC [14] LOG:  database system was shut down at 2026-04-05 20:33:08 UTC
    db-1      | 2026-04-05 22:03:26.750 UTC [14] LOG:  invalid magic number 0000 in log segment 000000010000000000000001, offset 0
    db-1      | 2026-04-05 22:03:26.750 UTC [14] LOG:  invalid primary checkpoint record
    db-1      | 2026-04-05 22:03:26.750 UTC [14] PANIC:  could not locate a valid checkpoint record
    db-1      | 2026-04-05 22:03:26.914 UTC [1] LOG:  startup process (PID 14) was terminated by signal 6: Aborted
    db-1      | 2026-04-05 22:03:26.914 UTC [1] LOG:  aborting startup due to startup process failure
    db-1      | 2026-04-05 22:03:26.921 UTC [1] LOG:  database system is shut down

    In short, database corruption.

    While typically you could run pg_resorewal. But when I tried that, it left Postgres in a non-operable state.

    I’m just grateful that

    • The signing keys weren’t lost
    • If you have signing keys, your “followers” will re-sync
    • Nothing of significance was posted to Fedi from my “new server”

    The second point is how some Mastodon administrators migrated to GoToSocial. I in fact initially used GTS before migrating to Akkoma.

    I had a feeling this would eventually happen, but at least it didn’t happen on an “established” server. I’m not a Postgres fan myself, I actually prefer plain-old MariaDB/MySQL.

    And if you want to follow me on Fedi/Mastodon, I’m @noc@momandpop.network.

  • Rocky Linux/RHEL 10: Fixing “Invalid UID in persistent keyring name” with AD and SSSD

    I run a Samba Active Directory in my homelab, with a Wireguard VPN to my dad’s house between my and his MikroTik routers.

    I recently reinstated the HPE ProLiant ML30 Gen9 running Rocky Linux 10 colocated at his house. With that, I rejoined the server to a new AD domain I made. I wasn’t able to log in, since the SSSD cache doesn’t get flushed.

    While I used this guide on Rocky Linux, it should be the same on AlmaLinux, CentOS or RHEL.

    Going back, the error I got was:

    Feb 13 15:11:01 oldsai.sc.lan krb5_child[2258]: Invalid UID in persistent keyring name
    Feb 13 15:11:01 oldsai.sc.lan sshd-session[2254]: pam_sss(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=::1 user=blackbird
    Feb 13 15:11:01 oldsai.sc.lan sshd-session[2254]: pam_sss(sshd:auth): received for user blackbird: 4 (System error)

    To fix this, first stop sssd:

    systemctl stop sssd

    Clear the cache with sss_cache:

    sss_cache -E

    Now remove the stray cache files:

    /var/lib/sss/db/*

    Note: this command is important, as SSSD doesn’t flush caches upon unjoining and rejoining, even with different user IDs.

    Now start sssd:

    systemctl start sssd

    The error should go away. Keep in mind that if UIDs changed for a particular user, you will need to delete or chown their home directory.

    Source. Thanks, Jarrod Farncomb.