Server side message filtering #38

Closed
opened 2023-09-28 21:40:01 +00:00 by kirsle · 0 comments
Owner

Currently, the chatbot can be used to help monitor conversations in public channels and look for red flag keywords to either delete the message or report them to your site admin. But, private DMs between users are private (not stored to disk anywhere and with no admin backdoor to monitor them).

This feature idea could support server-side (automated) monitoring of DM chats for either censorship (of racial slurs for example) or reporting to your site admin.

As parts of this feature:

  • The Chat Server would hold a temporary buffer of recent messages sent in DM chats (e.g., the most recent 10 messages in each DM thread) - so in case a chat needs to be reported to your site admin, surrounding context can be included.
    • The buffer would be kept in RAM only and not easily spyable by admin users.
  • The settings.toml file of BareRTC is where you would configure your keyword filters and actions: so we don't need to hard-code them and so end users can't find the list of keywords in order to self-censor and dance around them.

Example config file format might look like:

# An array type so you can configure multiple sets of rules
[[MessageFilters]]
Enabled = false
PublicChannels = false
PrivateChannels = true
KeywordPhrases = [
    "red flag keyword 1",
    "(red|maroon) flag keyword \d+"
]
CensorMessage = true
ForwardMessage = false
ReportMessage = true
ChatServerResponse = "Your message has not been sent. Please make better choices."

With the meaning of the options being:

  • Enabled (bool): globally enable or disable the filter. The default settings.toml would have an example filter template laid out, not Enabled, to let the chat admin configure it if wanted.
  • PublicChannels (bool): apply the filter in public channels.
  • PrivateChannels (bool): apply the filter in private channels (DMs)
  • KeywordPhrases (string array): a listing of regular expression capable phrases to look for in the user's message as it's being sent out.
  • CensorMessage (bool): if true, the user's message will be sent while substituting asterisks in place of the censored word/phrase that matched. e.g. "suck my ****"
  • ForwardMessage (bool): whether or not the message should appear for other parties, or only be echoed back to the sender.
    • Example: in a DM chat with ForwardMessage=false, the sender would see their own echo in chat (possibly with the keyword censored out) but the message is not given to their chat partner.
    • In a public channel: the message would not be broadcast to the whole room.
  • ReportMessage (bool): whether to deliver a report to your site's report API (if enabled).
    • The report would contain the recent context around the message (recent 10 messages sent in the chat before)
    • The report would contain the raw, original text that the user wrote (not the asterisk censored copy) so the site admin can see exactly how it was written.
  • ChatServerResponse (string): optional message that ChatServer should deliver into the chat thread.
    • Example: user trips a red flag, their message is not forwarded to their partner, the chat is reported to the site admin, and ChatServer says into the DM thread (to the sender): "make better choices, your message has not been sent and this chat has been reported to the site admin".
    • An empty ChatServerResponse would result in no visible response/feedback given to the user (silent reporting, and the user might not know whether their message was forwarded or not).

Example use cases/user stories:

  • You could apply a swear words filter to all channels where swears are censored with asterisks but still forwarded to the other parties.
    • PublicChannels=true, PrivateChannels=true, CensorMessage=true, ForwardMessage=true
  • You could add red flag auto-reporting to your site admin for phrases that egregiously violate your site's TOS.
    • PrivateChannels=true, CensorMessage=true, ForwardMessage=false, ReportMessage=true.
Currently, the chatbot can be used to help monitor conversations in public channels and look for red flag keywords to either delete the message or report them to your site admin. But, private DMs between users are private (not stored to disk anywhere and with no admin backdoor to monitor them). This feature idea could support server-side (automated) monitoring of DM chats for either censorship (of racial slurs for example) or reporting to your site admin. As parts of this feature: * The Chat Server would hold a temporary buffer of recent messages sent in DM chats (e.g., the most recent 10 messages in each DM thread) - so in case a chat needs to be reported to your site admin, surrounding context can be included. * The buffer would be kept in RAM only and not easily spyable by admin users. * The settings.toml file of BareRTC is where you would configure your keyword filters and actions: so we don't need to hard-code them and so end users can't find the list of keywords in order to self-censor and dance around them. Example config file format might look like: ```toml # An array type so you can configure multiple sets of rules [[MessageFilters]] Enabled = false PublicChannels = false PrivateChannels = true KeywordPhrases = [ "red flag keyword 1", "(red|maroon) flag keyword \d+" ] CensorMessage = true ForwardMessage = false ReportMessage = true ChatServerResponse = "Your message has not been sent. Please make better choices." ``` With the meaning of the options being: * Enabled (bool): globally enable or disable the filter. The default settings.toml would have an example filter template laid out, not Enabled, to let the chat admin configure it if wanted. * PublicChannels (bool): apply the filter in public channels. * PrivateChannels (bool): apply the filter in private channels (DMs) * KeywordPhrases (string array): a listing of **regular expression capable** phrases to look for in the user's message as it's being sent out. * CensorMessage (bool): if true, the user's message will be sent while substituting asterisks in place of the censored word/phrase that matched. e.g. "suck my ****" * ForwardMessage (bool): whether or not the message should appear for other parties, or only be echoed back to the sender. * Example: in a DM chat with ForwardMessage=false, the sender would see their own echo in chat (possibly with the keyword censored out) but the message is _not_ given to their chat partner. * In a public channel: the message would not be broadcast to the whole room. * ReportMessage (bool): whether to deliver a report to your site's report API (if enabled). * The report would contain the recent context around the message (recent 10 messages sent in the chat before) * The report would contain the raw, original text that the user wrote (_not_ the asterisk censored copy) so the site admin can see exactly how it was written. * ChatServerResponse (string): optional message that ChatServer should deliver into the chat thread. * Example: user trips a red flag, their message is not forwarded to their partner, the chat is reported to the site admin, and ChatServer says into the DM thread (to the sender): "make better choices, your message has not been sent and this chat has been reported to the site admin". * An empty ChatServerResponse would result in no visible response/feedback given to the user (silent reporting, and the user might not know whether their message was forwarded or not). Example use cases/user stories: * You could apply a swear words filter to all channels where swears are censored with asterisks but still forwarded to the other parties. * PublicChannels=true, PrivateChannels=true, CensorMessage=true, ForwardMessage=true * You could add red flag auto-reporting to your site admin for phrases that egregiously violate your site's TOS. * PrivateChannels=true, CensorMessage=true, ForwardMessage=false, ReportMessage=true.
kirsle added the
enhancement
label 2023-09-28 21:40:01 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: apps/BareRTC#38
No description provided.