Automated Intelligence

The Content Engine

A multi-stage pipeline that transforms raw RSS feeds, reddit posts, and other data sources into semantically clustered, AI-summarized, and trend-analyzed insights—from scrape to Pulse.

1. Scrape

Readability & Markdown

2. Tag

RAKE Keyword Extraction

3. Cluster

DBSCAN & TF-IDF

4. Summarize

Gemini AI Multi-Step

5. Pulse

Velocity & Trends

Extraction Layer

ContentScraper

Fetches and extracts main article content from a URL and cleans it for downstream processing.

Key methods
  • scrapeArticleContent(string $url) — Fetches HTML, uses fivefilters/Readability to extract core content (no ads/nav), then converts to clean Markdown.
  • scrapeArticleWithMetadata(string $url) — Same plus title, author, site name, excerpt.

Content is converted to Markdown via a ContentCleaner for high-density input to clustering.

PHP
$readability->parse($html);
$content = $readability->getContent();
$contentCleaner = new ContentCleaner;
$cleanContent = $contentCleaner->htmlToMarkdown($content);

return [
    'content'   => $cleanContent,
    'title'     => $readability->getTitle(),
    'author'    => $readability->getAuthor(),
    'site_name' => $readability->getSiteName(),
    'excerpt'   => $readability->getExcerpt(),
];
Tagging

TagSuggester

Analyzes text and suggests relevant, SEO-friendly tags using the RAKE algorithm.

Key functionality
  • suggestTags(string $text, int $numTags = 5) — Uses donatelloza/rake-plus (Rapid Automatic Keyword Extraction): word frequency and co-occurrence.
  • Blacklist filtering — Filters out non-descriptive words (e.g. "today", "article") so only meaningful keywords are suggested.
PHP
$rake = RakePlus::create($text, 'en_US');
$phrases = $rake->keywords();

foreach ($phrases as $phrase) {
    if ($this->isBlacklisted(Str::lower($phrase))) continue;
    $filteredTags[] = Str::slug($cleanPhrase);
}
return $filteredTags;
Machine Learning

ArticleClusteringService

Groups semantically similar articles into Topic Clusters—the core of trending story detection.

Key functionality
  • runFullClustering() — Fetches recent articles, categorizes them (TagSuggester + taxonomy), then runs clustering per category.
  • DBSCAN (Rubix ML) — Density-based clustering; finds clusters of arbitrary shape and treats outliers as noise.
  • TF-IDF — Titles/descriptions vectorized so DBSCAN can measure semantic distance.
  • Dynamic Epsilon — Category-specific neighborhood size (e.g. tech-news stricter, media-we-love looser).
PHP
$epsilon = match ($categorySlug) {
    'tech-news', 'artificial-intelligence' => 0.35, // Stricter
    'media-we-love' => 0.45,                       // Looser
    default => 0.4,
};
$this->epsilon = $epsilon;
$result = $this->group($articlesInCategory);
Generative AI

AutoSummarizationService

Takes a TopicCluster and generates a human-readable summary via Gemini AI, with a full audit trail in summary_generation_steps.

Step-by-step generation
  • scrape_article_* — Scrapes each article in the cluster if needed.
  • individual_summary_* — Summary per article.
  • master_summary — Single cohesive summary of the topic.
  • suggest_title — SEO-friendly cluster title from the master summary.
Scrape Step 1
Individual Summaries Step 2
Master Synthesis Step 3
Title Generation Step 4
Trend Intelligence

PulseService

Generates daily Pulse stats per category: velocity, trend lines, trending topics, and the SVG charts used on the homepage and category pages.

Key functionality
  • generateDailyPulse() — Runs per category with configurable time windows (7–21 days).
  • Smart trend analysis — Compares current vs previous window; fast categories (e.g. Tech) use 7-day, slower (e.g. Digital Art) use 21-day.
  • Pulse typetrend_line (meaningful activity) or minimal_activity (low activity).
  • SVG generationgenerateTrendGraph() (sparklines), generateLargeTrendGraph() (area charts with gradients/glow).
PHP
// Category-specific time windows
'tech-news' => 7,
'artificial-intelligence' => 7,
'webdev' => 10,
'hardware' => 14,
'digital-art' => 21,
Frontend

Pulse Cards & Category Stats

Pulse cards (pulse-card.blade) show velocity, trend lines or minimal-activity view, activity trend (↗/↘), and trending topics. Category stats (category-stats.blade) show N-day velocity, trending topic badges, and total articles. Both use stats_json from DailyPulse (velocity, svg_points, trending_topics, confirmation_score, etc.).

Phase 2

Deep Intelligence

  • NoveltyClusterInsightService::computeNovelty() vs historical clusters; labels: New, Recurring, Ongoing.
  • ControversycomputeControversyScore() from keywords in titles/descriptions; labels: low, medium, high.
  • Hidden trends — Small clusters with high novelty/momentum or rapid 24h growth; "Early Trend Alert" in pulse modal.
  • Coverage bias — Diversity (unique feeds / total articles); labels e.g. Industry-driven vs Grassroots.
  • Narrative shiftsNarrativeShiftService and "What Changed Today" on the homepage.
Admin & UX

Admin Dashboard & User Flow

Admin (/admin/content-engine): dashboard (last run, last pulse, tokens, limits), Content Engine settings (DB overrides), and Runs list with Livewire. Clustering and summarization are triggered from here.

User experience: Homepage pulse cards (e.g. "Tech News ↗ +45%") → category deep-dive with trending topics and SVG trend charts → time windows and thresholds adapt per category for meaningful insights.

The Trending Update Debate: IOS Vs Android Edition
The Trending Update Debate: IOS Vs Android Edition

Activity in apple-android is stable.

+0% Apple Android
Hot Take: Trending Update Is The Future (Or Just More Hype)
Hot Take: Trending Update Is The Future (Or Just More Hype)

Activity in artificial-intelligence is stable.

+0% Artificial Intelligence
The Trending Update Showcase: Technology Meets Creativity
The Trending Update Showcase: Technology Meets Creativity

Activity in digital-art is stable.

+0% Digital Art
Hot Take: Trending Update Is The Hardware We've Been Waiting For
Hot Take: Trending Update Is The Hardware We've Been Waiting For

Activity in hardware is stable.

+0% Hardware
Why Trending Update Is Peak Internet Culture
Why Trending Update Is Peak Internet Culture

Activity in internet-culture is stable.

+0% Internet Culture
The Trending Update Experience: Art, Music, Film, And More
The Trending Update Experience: Art, Music, Film, And More

Activity in media-we-love is stable.

+0% Media We Love
Breaking: Trending Update Developments (And Why They Matter)
Breaking: Trending Update Developments (And Why They Matter)

Activity in tech-news is stable.

-100% Tech News
The Trending Update Approach: Building Better, Not Just Faster
The Trending Update Approach: Building Better, Not Just Faster

Activity in webdev is stable.

+0% Webdev
Engine
  • Last Run 00:23
  • Total Clusters (7d) 142
  • AI Tokens Used 4.2M