Build a Bot to Detect and Quarantine AI-Generated Images in Discord
botssecuritytutorials

Build a Bot to Detect and Quarantine AI-Generated Images in Discord

ddiscords
2026-01-30
11 min read
Advertisement

Build a Discord bot to detect, quarantine, and route AI-generated sexualized images into a mod queue with safe automated sanctions.

Hook: You're drowning in image reports — here's a bot that stops deepfakes before they spread

Every moderator knows the moment: someone drops a shocking image, the server erupts, mods scramble, and the damage is already visible in screenshots. In 2026 the problem's worse — advanced generative tools (remember the late-2025 Grok findings?) mean sexualized and nonconsensual images can be created and posted within seconds. If you run an active gaming or esports community, you need more than manual triage. You need an automated, transparent system that detects, quarantines, and routes suspicious images into a moderation queue, with safe automated sanctions and an appeals workflow.

What you'll build in this guide

This article walks you through a practical, production-ready architecture and gives code-level examples to build a Discord bot that:

  • Ingests image attachments and image links from messages
  • Calls one or more ML detection APIs (adult/sexualized content + deepfake detection)
  • Quarantines content by moving it to a private mod-queue channel
  • Applies configurable automated sanctions (timeout, role removal) when confidence is high
  • Keeps an auditable evidence log and human-in-the-loop review with appeals

Platforms and regulators tightened rules after high-profile 2024–2025 misuse cases. Yet generative models became more accessible and realistic. The Guardian’s late-2025 coverage of Grok’s misuse highlighted how quickly AI images can hit public timelines; platform-level moderation alone doesn't protect private and community spaces.

Key 2026 trends to design for:

  • Hybrid detection pipelines — combine several detectors (adult content classifiers + deepfake models + perceptual hashing) to reduce false positives.
  • Regulatory pressures like the EU's Digital Services Act and national online-safety laws pushing platforms to improve nonconsensual content removal and age verification.
  • More adversarial image generation — watermarks and provenance aren't enough; active detection is required. For provenance questions see how a seemingly small clip can affect claims.

Keep the bot modular. Separate the Discord client, worker queue, detection adapters, and audit store.

  • Discord Listener — Listens to messageCreate / messageUpdate events and extracts image URLs/attachments.
  • Processing Queue — Enqueue detection jobs to avoid blocking the event loop (RabbitMQ, BullMQ, or a managed queue).
  • Detector Adapters — Pluggable connectors to ML APIs (SafeSearch-like adult detectors, deepfake classifiers, CLIP-similarity checks).
  • Moderator Queue — Private channel or web dashboard where detected items are posted with evidence and one-click actions.
  • Sanction Engine — Applies meeting server policies: auto-timeout, role change, delete message, or escalate to human review.
  • Audit & Appeals Store — Immutable logs, hashes of media, metadata, and reviewer actions for transparency and appeals. Consider a scalable store such as ClickHouse for scraped/audit data when volume is high.

Prerequisites

  • Node.js 18+ (this guide uses discord.js v14) or Python (discord.py) if you prefer — Node.js examples below.
  • Discord Bot token and privileged intents: Message Content Intent and Guild Members if you plan to apply sanctions.
  • API keys for at least one adult-content ML API and one deepfake-detection API (examples below).
  • A hosting plan (Docker, Fly.io, Railway, or VPS) with HTTPS outbound for calling ML APIs — consider edge/offline-first nodes for low-latency or geo-distributed moderation.

Choose your detection stack (2026 recommendations)

No single API is perfect. In 2026 best practice is to combine detectors:

  1. Safe content classifier — Google Cloud Vision SafeSearch, Azure Content Moderator, or open-source Hugging Face models to flag adult/racy content.
  2. Deepfake detector — Specialized deepfake APIs (Reality Defender, Sensity-style services, or custom models trained on deepfake datasets) that analyze facial artifacts, temporal inconsistencies for video, and compression signatures. Building your policy is part of deepfake risk management.
  3. CLIP or similarity checks — Use CLIP embeddings to detect suspicious edits relative to a suspected victim image if you have a known image set. This helps detect nonconsensual edits.
  4. Perceptual hashing — pHash or ssdeep to identify copies and near-duplicates to stop reuploads.

Combine outputs with a weighting system to compute a final confidence score (0–100).

Designing safe policy rules

Before automating sanctions, define clear thresholds and human-in-the-loop gates:

  • Confidence > 90% — auto-quarantine and temporary sanction (e.g., 1-hour timeout) + flag for review.
  • Confidence 60–90% — quarantine and send to mod-queue for manual review, do not auto-sanction.
  • Confidence < 60% — log for telemetry, optionally run a slower additional detector.

Always provide a reviewer with the original message context, timestamps, and cryptographic hash of the media for audit — provenance matters when legal or trust issues arise (see example).

Practical build: Node.js + discord.js example

Below is a compact, practical example illustrating the core flow: detect image → quarantine → mod queue → sanction. This is not a drop-in production bot, but a clear structure to implement safely.

1) Setup essentials

Install dependencies:

npm init -y
npm install discord.js node-fetch bullmq dotenv

Create an .env with:

DISCORD_TOKEN=your-bot-token
MOD_QUEUE_CHANNEL_ID=1234567890
QUARANTINE_ROLE_ID=9876543210
DETECTOR_API_KEY=your-detector-key
DETECTOR_ENDPOINT=https://api.example.com/detect

2) Listener + queueing

Key points: don’t perform slow HTTP calls in the event handler. Enqueue jobs instead.

const { Client, GatewayIntentBits } = require('discord.js');
const { Queue } = require('bullmq');
const queue = new Queue('image-jobs');
const client = new Client({ intents: [GatewayIntentBits.Guilds, GatewayIntentBits.GuildMessages, GatewayIntentBits.MessageContent] });

client.on('messageCreate', async (msg) => {
  if (msg.author.bot) return;
  const attachments = Array.from(msg.attachments.values());
  if (attachments.length === 0) return;

  for (const at of attachments) {
    await queue.add('detect', { url: at.url, messageId: msg.id, channelId: msg.channel.id, guildId: msg.guildId, authorId: msg.author.id });
  }
});

client.login(process.env.DISCORD_TOKEN);

3) Worker: call detectors and decide

This worker pulls jobs and calls your detector API(s). Use exponential backoff on transient errors.

const { Worker } = require('bullmq');
const fetch = require('node-fetch');

const worker = new Worker('image-jobs', async job => {
  const { url, messageId, channelId, guildId, authorId } = job.data;

  // Example detector request — send image url
  const res = await fetch(process.env.DETECTOR_ENDPOINT, {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${process.env.DETECTOR_API_KEY}`, 'Content-Type': 'application/json' },
    body: JSON.stringify({ image_url: url, checks: ['adult','deepfake'] })
  });
  const json = await res.json();

  // Weighted scoring (example)
  const adultScore = json.scores.adult || 0; // 0-1
  const deepfakeScore = json.scores.deepfake || 0; // 0-1
  const finalScore = Math.min(100, Math.round((adultScore * 0.6 + deepfakeScore * 0.4) * 100));

  // Decision: quarantine if >= 60
  if (finalScore >= 60) {
    // Post to mod queue and optionally apply temporary sanction
    await postToModQueue({ url, messageId, channelId, guildId, authorId, finalScore, raw: json });

    if (finalScore >= 90) {
      await applyAutoSanction(guildId, authorId, messageId);
    }
  }

  // Store audit log: hash, inference, TTL
  await storeAudit({ guildId, messageId, url, finalScore, detectorResult: json });
});

4) Quarantine & mod-queue posting

Your mod-queue should be a private channel with a mod-only role. Post the image, the confidence score, a short reason, and two action buttons (Approve / Remove / Appeal). Buttons require an interaction handler.

async function postToModQueue(payload) {
  const modChannel = await client.channels.fetch(process.env.MOD_QUEUE_CHANNEL_ID);
  await modChannel.send({
    content: `Suspected AI-generated image — score ${payload.finalScore}%\nAuthor: <@${payload.authorId}>\nSource channel: <#${payload.channelId}>`,
    files: [payload.url]
  });
}

5) Sanctions: safety-first automation

When auto-sanctioning, choose reversible actions: temporary timeout, message deletion, and assigning a quarantine role that hides channels (instead of permanent bans).

async function applyAutoSanction(guildId, userId, messageId) {
  const guild = await client.guilds.fetch(guildId);
  try {
    const member = await guild.members.fetch(userId);
    // Timeout for 1 hour (Discord API supports timeouts)
    const until = new Date(Date.now() + 60 * 60 * 1000);
    await member.timeout(60 * 60 * 1000, 'Auto-sanction: detected high-confidence AI sexual content');
    // Delete the offending message
    const channel = await client.channels.fetch(payload.channelId);
    const message = await channel.messages.fetch(messageId);
    await message.delete();
  } catch (err) {
    console.error('Sanction failed', err);
  }
}

Detector integration tips & best practices

  • Use image URLs where possible — sending base64 is heavier; many APIs accept URLs or multipart uploads.
  • Fallback detectors — if a primary API is rate-limited, have a lower-cost fallback to ensure coverage. See approaches in multimodal media workflows.
  • Ensemble scoring — weight detectors differently for sexualized vs deepfake signals and tune per-community tolerance.
  • Cache and dedupe — hash images (pHash) to prevent repeated analysis on reuploads; store results for a TTL.
  • Privacy by design — retain only what you need. Store cryptographic hashes instead of raw images when possible and clearly document retention policy. Also review secure-agent and policy patterns described in secure desktop AI agent guidance.

Handling false positives and appeals

False positives are inevitable. Your UX must be forgiving and transparent:

  • Record reviewer decisions and propagation: if a human reviewer marks an image safe, unblock the user and mark the hash as safe to prevent future auto-sanctions.
  • Provide an appeals channel and a clear lookback period (e.g., moderators must resolve appeals within 48 hours).
  • Support rollback actions: un-timeout members, restore messages where appropriate, and give public transparency reports if your community expects it.

Logging, metrics, and safety signals

Monitor these KPIs to iterate your thresholds and detector mix:

  • Detection rate: percent of image messages flagged
  • Auto-sanction rate and reversal rate (false-positive indicator)
  • Time-to-resolution for mod-queue items
  • Repeat offenders per 30-day window

In 2026 moderators must be aware of changing rules around nonconsensual content and age verification:

  • Document your decision matrix and keep logs to show due diligence if you need to work with platform trust & safety teams or law enforcement.
  • Respect privacy laws — remove or redact identifying info on request and log retention compliant with GDPR-style rules.
  • If you detect potential child sexual content, follow mandatory reporting obligations in your jurisdiction and the platform's takedown pathways. Also consider operational lessons from postmortems on high-impact outages when coordinating with platform teams.

Hardening against adversarial uploads

Bad actors try to evade detection via small edits, obfuscation, or steganography. Strengthen detection:

  • Run detectors on multiple formats and at multiple resolutions.
  • Normalize images (resize / recompress) prior to detection to reduce evasion from subtle transformations.
  • Use temporal consistency checks for short video or GIF uploads — deepfake signals often appear across frames.

Scaling & operational concerns

Large servers see thousands of images per day. Plan for:

  • Rate limits — batch requests or set conservative sampling for low-risk channels.
  • Cost controls — use tiered detectors, only run heavy deepfake models when a fast classifier raises a flag.
  • Fail-open vs fail-closed — choose default behavior when detectors are unavailable. Fail-open (allow content) reduces wrongful automated bans but increases risk; fail-closed (block) reduces exposure but raises false positives. Operational playbooks and outage postmortems such as incident responder learnings are useful when deciding default behavior.

Real-world example: Moderator workflow

  1. Image flagged with 72% finalScore — bot removes original message and posts to #mod-queue with the image and metadata.
  2. Two on-duty moderators inspect the evidence, see it’s AI-generated sexualized content and confirm removal. They click "Confirm" in the mod-queue UI.
  3. Bot records the review in a fast analytics store (for example, archived rows in a ClickHouse column store) and applies a 24-hour temp-ban, notifies the user with a DM explaining the reason and how to appeal.
  4. User appeals; after 1 business day a moderator reviews and finds it’s a false positive — bot reverses the temp-ban and restores privileges. The image hash is marked as safe to avoid repeat auto-actions.

Testing and deployment checklist

  • Unit tests for detector adapter responses and scoring logic.
  • Integration tests that mock API responses for edge cases (low confidence, multiple detectors disagreeing).
  • Load-test the queue with representative traffic to ensure workers scale — introduce chaos engineering and controlled failure drills as in process-roulette vs chaos engineering.
  • Deploy with feature flags: start with detection-only mode (log but don’t quarantine) before enabling automated sanctions.

Future-proofing and roadmap (2026+)

AI models will keep changing. Build for agility:

  • Make detector adapters pluggable — swap vendors as models evolve.
  • Invest in a small in-house classifier trained on community-flagged samples to boost recall on the kinds of images your server sees most; be mindful of training costs and resource constraints described in AI training pipeline best practices.
  • Provide clear transparency to members: publish your moderation rules and offer opt-in safety settings (e.g., strict image filters for younger members).

Ethics & community trust

Automated systems risk chilling legitimate expression. Keep the community involved:

  • Publicize your thresholds and allow trusted members to be part of your reviewer rotation.
  • Offer a fast appeals path and publish periodic transparency notes: removal counts, reversal rates, and improvements made.

“Platforms and tools have improved, but community moderation remains the last line of defense — give moderators the tools to act fast and fairly.”

Actionable checklist to ship in 7 days

  1. Day 1: Create bot, add intents, and spin a minimal listener that logs image messages.
  2. Day 2: Add a processing queue and a simple detector adapter that calls an adult-content API.
  3. Day 3: Implement mod-queue posting and create a private mod channel.
  4. Day 4: Add quarantine role and reversible auto-timeout for high-confidence cases.
  5. Day 5: Build an appeals command and an audit log in a small DB (SQLite for MVP) or plan for a scalable store like ClickHouse as you grow.
  6. Day 6: Run a closed beta in a test server, collect false-positive stats and tune thresholds.
  7. Day 7: Roll out to production with detection-only mode for a week, then enable automated sanctions behind a feature flag.

Final recommendations

Start conservative. Use an ensemble of detectors, keep humans in the loop at medium confidence, and make all sanctions reversible. Document everything for accountability and regulatory compliance. With this approach you protect members, reduce moderator burnout, and maintain community trust even as generative AI gets more powerful in 2026.

Call to action

Ready to build your bot? Clone a starter repo, test the detection adapters against your community’s sample data, and join other server owners at discords.pro for ready-made moderation templates and vetted detector configs. If you want, I can produce a trimmed Node.js starter repo (discord.js + BullMQ + detector adapter) tuned for gaming communities — tell me your hosting preferences and detector APIs and I’ll scaffold it for you.

Advertisement

Related Topics

#bots#security#tutorials
d

discords

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-02-04T00:23:00.615Z