tl;dr: to reduce federation api calls and to reduce issues with defederation, maybe some instances should only be for communities with no user signups, and some instances should be only for user signups with no communities (you would have to make a post or PM to request the admin to make a community for you)

cross-posted from: https://fanaticus.social/post/265339

DRAFT Work in Progress - Updates will be noted in the comments.

I find I have been a bit repetitive in different threads talking about Lemmy / Kbin (collectively, the Threadiverse), so I thought I would put my thoughts together in one place where I can cover it in detail, and revise my thoughts as they evolve. So, here it is…

The Problems

Why can’t we merge / sync all the communities with the same name?

This illustrates a fundamental misunderstanding of how Lemmy and ActivityPub work. It implies that local communities are somehow better than federated communities, and that synchronizing different communities would somehow be better / more efficient than just subscribing to the federated community. That’s just plain wrong.

Once a community is federated, accessing and interacting with the community is exactly the same as for a local community. The content is exactly the same, and changes are automatically shared among subscribing servers.

The real problem is every instance wanting to be the instance for Reddit knock-off communities. I won’t deny that there are significant financial and ego reasons why admins want to accomplish this end. However, this is not the best approach for Lemmy.

The admins of my instance are doing bad things!

Folks, admins need to admin. Each instance is going to have its own policies driven by their personal values and by the legalities of where the server is hosted.

I want to host an instance, but the storage & network requirements are too high

This is a genuine concern - there are two things fundamental to Lemmy that cause this:

  1. Each instance needs to keep a complete copy of every community that any user on the instance subscribes to. The storage overhead per user is especially high on instances with not a lot of users.
  2. Each community has to share its changes with every instance that has subscribed to it. So when a user on instance A makes a post to a community on instance B, A sends that info to B, then B must send a copy of that post to every other instance with subscribers.

The Solution

Communities

Communities should be spread out across multiple instances, with a small number of like-minded communities on each instance. An example of something like this this would be Discord servers with multiple channels.

  • Users on community focused instances should be limited to admins and mods. These should not be primary browsing accounts.
  • Community instances can be much more restrictive with their login & firewall policies, making these more secure. Improved remote moderation could limit logins to admins, so the UI itself could be firewalled.
  • Businesses, News Media, Celebrities, etc., should host their own community instances so that they can protect their brand and not be subject to third party content policies. Further, instances which are not compatible with the brand’s image can be defederated without disrupting the brand’s online presence.

Users

Users should congregate on user focused instances.

  • Local communities on user instances should be limited to meta topics and possibly a few broad general interest communities.
  • User instances can serve as a cache for the distributed network of communities, limiting the duplication of content.
  • User instances can be hardened for user facing security

How does this address the problems

Storage & Network Requirements

Having users concentrate on user instances reduces the storage overhead per user, because if multiple users on an instance subscribe to the same communities, there is still only one copy of the community for the instance.

On the network side of things, this reduces the amount of redistribution required by the community instance, because there would be fewer user instances to host subscribers.

In summary, the approach of split user & community instances is really optimal for ActivityPub, because user instances effectively become cacheing servers for communities. This greatly reduces the cost to host community instances.

randomly found this post, curious what other people think about this approach

this is exactly what I do with https://lemmy.mods4ever.com/

only my admin user is on there and it isn’t subscribed to any remote communities, Lemmy is barely using any resources on my server it’s basically free

I’ve actually thought about running 2 separate instances like lemmyusers.mods4ever.com and lemmycommunities.mods4ever.com or something like that

originally posted by @[email protected] aka @[email protected] aka @[email protected] (according to their profile)

  • @[email protected]
    link
    fedilink
    English
    2
    edit-2
    11 months ago

    I think this solution misses both the point of the current solution and doesn’t actually solve many of the problems the current design has. Others have made some good points, do I’ll leave that to other threads for discussion.

    I’m actually working on a project that I think will solve more of the problems (though it has a few of its own), but I think it’s incredibly unlikely that I’ll finish it, let alone have it be as popular as Lemmy. So I’ll put my notes here in case someone else wants to carry the torch (if not, I’ll post a repo once it’s ready enough).

    Architecture

    The service will be a distributed, P2P network based on distributed hash tables.

    In other words, there are no instances, and all data is stored by users.

    Communities

    Communities are a namespace, not a centralized location, so creating content is local first and eventually consistent as people pull it in (DHT will ensure high availability). This has interesting implications for moderation, which I’ll discuss later.

    In practice, there will probably be storage instances to help out with availability, at least in the early days as the software is tuned. Compute costs would be incredibly low and can run off a simple key/value lookup, so hosting should be relatively cheap.

    Authentication

    User authentication is based on a blockchain. Basically, any time you make an account or change a password, it’s verified by others. The churn here should be low, so that processing probably doesn’t need to be rewarded and users’ devices would just do it in the background.

    All posts are signed with the key that’s used on the blockchain, so you can always verify authorship of a post. And that brings us to:

    Moderation

    Moderation is also distributed, so there are no mods as such, only a web of trust. Basically, you pick certain users that you think are trustworthy, and their actions on the network will determine whether you see a post or not. For example:

    • if a post is reported as CSAM by enough of your trusted users, you never see the post
    • upvotes and downvotes of people you trust have greater weight than votes by everyone else (may not even see votes by other users)
    • you never see posts by users that enough people you trust have blocked

    And so on. This whole process will be transparent, so you can always audit what a user has reported, blocked, or voted on.

    Client

    The client will be cross platform day one, written in Rust and React using Tauri. It’ll start desktop only, though Tauri is working on mobile support (currently in beta) so that could change before it’s released.

    The reasoning here is that the networking layer i intend to use (Iroh) is written in Rust, and it’s usually pretty easy to find React devs. I also like that Tauri generally has a smaller install package, which can make the reserved storage space requirement an easier pull to swallow.

    Limitations/Problems

    Searching

    Searching is complicated in a distributed environment since you never know how many nodes you need to hit to get a satisfactory answer. This will require lots of tuning and there may need to be a full text search instance/cluster set up to help.

    Lemmy doesn’t have this, so I’ll probably put it off for later as well.

    Availability

    A lot of users will likely use phones as their primary or perhaps only interface, and phones are very sensitive to data usage. I think users may be okay with it if the app can track and limit data usage while on data, which adds complexity to the app.

    Web app

    Web apps need a server to make requests to, and that just doesn’t exist, so there would need to be some kind of bridge into the network.

    Latency

    Theoretically, searching and fetching data from a distributed network is fast, but that may not be true in practice given the target demographic for this app (i.e. lots of mobile users).

    Persistence

    Data only lives as long as someone has it on their device, so less popular content could just disappear. I think this is maybe desirable in some cases (e.g. CSAM), but losing data makes me feel a little uncomfortable, so I’ll probably build an archive insurance to store stuff encrypted (to avoid legal liability for stuff like CSAM).

    Illegal content

    There’s no way for an admin to delete something, so there’s a lot of reliance on the web of trust to prevent illegal content from getting to your device. I’m hoping that this will be a non-issue, provided people set up their WoT properly.

    Content

    It’s not designed to be an ActivityPub service, so it doesn’t get that data for free. I may look into building a bridge, but it’s not going to be designed in.

    Problems it solves

    This design solves a bunch of issues with Lemmy, enough that I think the above limitations (and others I didn’t mention) are worth it. For example:

    • communities - no more instances, so there’s only ever one community for a given name
    • costs - hosting costs are minimal (just need a few STUN servers), and even caching servers should be much cheaper than larger Lemmy instances
    • durability - since it’s distributed by nature, there’s no possibility of a a large instance going down, so no worries about communities disappearing
    • bad mods/admins - you pick your moderation, and you only saw content that people you trust are okay with
    • illegal content - illegal content should never appear in the first place, if your Web of trust is robust enough; there could even be AI users that identify CSAM and whatnot automatically so humans don’t need to see it
    • performance - instances can get busy, a P2P network is much harder to grind to a halt; there will need to be protections against DDOS attacks though

    Conclusion

    I’m still in the early days of figuring everything out, but I think a distributed design is the way forward here. I’ll continue to use and support lemmy until I find a viable alternative though, since it’s at least good enough for now. I just worry about long term viability as instance hosts get tired of hosting.

    Anyway, I’d love to hear your thoughts.