Skip to content

Rebuilding My TG Forward Bot

The forwarding bot I used before was Node Forward Bot. I had used it for quite a while, but it kept receiving all kinds of ads, which got extremely annoying. When I checked again, I found that the project had already been inactive for a year. Since NFD2 is not open source (I think? I could not find a repository), I did not try it, and I also do not know whether it has ad-blocking features. So I decided to reinvent the wheel for my own use.

At first, I tried stuffing it with a huge list of sensitive words so it would automatically block messages when they appeared. Then I got mysterious results like blocking x86_64 (64), Steam platform exclusive (Taiwan independence in Chinese substring matching), Python scripts (cheat tools), and listening port (monitoring).

Ugh… clearly this path was not going to work. So I listened to some group friends and tried writing a bunch of regular expressions. But as everyone knows, Chinese spam is very mysterious: 丅子, 微 P 嗯, weird emoji splicing… Can regex really defend against all of that? Maybe it can, but my brain obviously is not that powerful (sweat), so in the end I decided to use an LLM for moderation.

I needed a model that was as fast as possible, did not need to be too smart, but could still understand the text. As everyone knows, Google has a model called gemini-3-flash, and it felt suitable for content moderation, so I used it. I wrote a simple prompt and asked it to judge the user input, then output either SAFE or UNSAFE.

const MODERATION_PROMPT = `
# Role
Content Moderator API. Output one word only.
# Rules
UNSAFE if:
- Real human nudity/sex
- QR codes/spam/ads/gambling promotion
- Real gore/shock content
- Illegal content promotion
- Scam/phishing attempts
SAFE if:
- 2D/Anime/Cartoon (even suggestive)
- Normal photos/text/screenshots
- Regular conversation
# Output
One word: "SAFE" or "UNSAFE"
Analyze the content:`;

There is a little bit of latency, but it is almost negligible. Compared with traditional rule matching, an LLM can understand context and semantics, so the false-positive rate is much lower. The downside is probably cost, but Gemini has a free quota, though not a large one. With three APIs connected, I only get about 60 calls per day, which is barely enough for ad blocking.

The whole project uses a modular structure:

.
├── src
│ ├── ai.ts
│ ├── config.ts
│ ├── handlers
│ │ ├── admin
│ │ │ ├── callbacks.ts
│ │ │ ├── commands.ts
│ │ │ ├── index.ts
│ │ │ ├── replies.ts
│ │ │ └── shared.ts
│ │ └── guest.ts
│ ├── i18n
│ │ ├── en.ts
│ │ └── zh.ts
│ ├── i18n.ts
│ ├── index.ts
│ ├── storage.ts
│ ├── telegram.ts
│ └── types.ts
├── tsconfig.json
└── wrangler.toml

Implemented features:

FeatureDescription
LLM Content ModerationLLM-based harmful content detection
Ban ListView all banned users
Content Hash CacheAvoid wasting tokens on repeated spam content
Blacklist SystemUsers are blacklisted after repeated blocks
Whitelist SystemStop moderation after consecutive clean messages
Stats SystemMessage count, user count, and AI block count
Multi API Key RotationGoogle’s API quota is too small

Who is 20 API calls per day enough for? It used to be 100 calls per day. Gemini CLI and Antigravity were generous, but the API is just stingy.
Second edit: now Gemini CLI and Antigravity are not enough either, FUCKING GOOGLE.

The whole project runs on Cloudflare Workers (same as NFD, convenient, useful, and free), making it a completely zero-cost solution. The LLM is also free.

Finally, I pushed the code to GitHub and open sourced it under the BSD2 license.

GitHub: kokosa-forward-bot