The social media platform Bluesky recently had an incident where a user created an account with a racial slur as the handle. The Bluesky team quickly removed the account but realized they should have had automated filters in place to prevent such issues. They are now implementing a two-step automated filtering and flagging system for user handles while still involving human moderators. The team acknowledges they were too slow to communicate with the community about the incident and are working to improve their Trust and Safety team and communication processes going forward. They are committed to learning from this mistake and building a safer and more resilient social media platform over time.
Previous post about this topic https://beehaw.org/post/2152596
Bluesky allowed people to include the n-word in their usernames | Engadget
Bluesky, a decentralized social network, allowed users to register usernames containing the n-word. When reports surfaced about a user with the racial slur in their name, Bluesky took 40 minutes to remove the account but did not publicly apologize. A LinkedIn post criticized Bluesky for failing to filter offensive terms from the start and for not addressing its anti-blackness problem. Bluesky later claimed it had invested in moderation systems but the oversight highlighted ongoing issues considering Twitter co-founder Jack Dorsey backs the startup. The fact that Bluesky allowed such an obvious racial slur shows it was unprepared to moderate a social network effectively.
On one hand they definitely should have been aware about the possibility of abuse like this, especially since so many of them came from Twitter but on the other hand I’ve always thought that it was asking a lot to have to have developers be exposed and put in a list of slurs specifically to be able to block them out. :(
They probably don’t have a list of slurs as much as they use partial variations in Regular Expressions for filtering, which I guess could be better or worse, depending on how you look at it. Better: they don’t have to see the whole slur. Worse: they have to think deeply about the slur and all the variations of it that might arise.
I remember some post where someone’s username Nasser got censored to N***er making it look way fucking worse. One of the Dark Souls games.
As they mentioned in the blog post though, simply matching slurs inside of a string will ban a lot of innocent people
It’s the Scunthorpe problem.
yeah wordlists for any kind of moderation can easily catch false positives