Automated Intelligence

The Content Engine

A multi-stage pipeline that transforms raw RSS feeds, reddit posts, and other data sources into semantically clustered, AI-summarized, and trend-analyzed insights—from scrape to Pulse.

1. Scrape

Readability & Markdown

2. Tag

RAKE Keyword Extraction

3. Cluster

DBSCAN & TF-IDF

4. Summarize

Gemini AI Multi-Step

5. Pulse

Velocity & Trends

Extraction Layer

ContentScraper

Fetches and extracts main article content from a URL and cleans it for downstream processing.

Key methods

scrapeArticleContent(string $url) — Fetches HTML, uses fivefilters/Readability to extract core content (no ads/nav), then converts to clean Markdown.
scrapeArticleWithMetadata(string $url) — Same plus title, author, site name, excerpt.

Content is converted to Markdown via a ContentCleaner for high-density input to clustering.

PHP

$readability->parse($html);
$content = $readability->getContent();
$contentCleaner = new ContentCleaner;
$cleanContent = $contentCleaner->htmlToMarkdown($content);

return [
    'content'   => $cleanContent,
    'title'     => $readability->getTitle(),
    'author'    => $readability->getAuthor(),
    'site_name' => $readability->getSiteName(),
    'excerpt'   => $readability->getExcerpt(),
];

Tagging

TagSuggester

Analyzes text and suggests relevant, SEO-friendly tags using the RAKE algorithm.

Key functionality

suggestTags(string $text, int $numTags = 5) — Uses donatelloza/rake-plus (Rapid Automatic Keyword Extraction): word frequency and co-occurrence.
Blacklist filtering — Filters out non-descriptive words (e.g. "today", "article") so only meaningful keywords are suggested.

PHP

$rake = RakePlus::create($text, 'en_US');
$phrases = $rake->keywords();

foreach ($phrases as $phrase) {
    if ($this->isBlacklisted(Str::lower($phrase))) continue;
    $filteredTags[] = Str::slug($cleanPhrase);
}
return $filteredTags;

Machine Learning

ArticleClusteringService

Groups semantically similar articles into Topic Clusters—the core of trending story detection.

Key functionality

runFullClustering() — Fetches recent articles, categorizes them (TagSuggester + taxonomy), then runs clustering per category.
DBSCAN (Rubix ML) — Density-based clustering; finds clusters of arbitrary shape and treats outliers as noise.
TF-IDF — Titles/descriptions vectorized so DBSCAN can measure semantic distance.
Dynamic Epsilon — Category-specific neighborhood size (e.g. tech-news stricter, media-we-love looser).

PHP

$epsilon = match ($categorySlug) {
    'tech-news', 'artificial-intelligence' => 0.35, // Stricter
    'media-we-love' => 0.45,                       // Looser
    default => 0.4,
};
$this->epsilon = $epsilon;
$result = $this->group($articlesInCategory);

Generative AI

AutoSummarizationService

Takes a TopicCluster and generates a human-readable summary via Gemini AI, with a full audit trail in summary_generation_steps.

Step-by-step generation

scrape_article_* — Scrapes each article in the cluster if needed.
individual_summary_* — Summary per article.
master_summary — Single cohesive summary of the topic.
suggest_title — SEO-friendly cluster title from the master summary.

Scrape Step 1

Individual Summaries Step 2

Master Synthesis Step 3

Title Generation Step 4

Trend Intelligence

PulseService

Generates daily Pulse stats per category: velocity, trend lines, trending topics, and the SVG charts used on the homepage and category pages.

Key functionality

generateDailyPulse() — Runs per category with configurable time windows (7–21 days).
Smart trend analysis — Compares current vs previous window; fast categories (e.g. Tech) use 7-day, slower (e.g. Digital Art) use 21-day.
Pulse type — trend_line (meaningful activity) or minimal_activity (low activity).
SVG generation — generateTrendGraph() (sparklines), generateLargeTrendGraph() (area charts with gradients/glow).

PHP

// Category-specific time windows
'tech-news' => 7,
'artificial-intelligence' => 7,
'webdev' => 10,
'hardware' => 14,
'digital-art' => 21,

Frontend

Pulse Cards & Category Stats

Pulse cards (pulse-card.blade) show velocity, trend lines or minimal-activity view, activity trend (↗/↘), and trending topics. Category stats (category-stats.blade) show N-day velocity, trending topic badges, and total articles. Both use stats_json from DailyPulse (velocity, svg_points, trending_topics, confirmation_score, etc.).

Phase 2

Deep Intelligence

Novelty — ClusterInsightService::computeNovelty() vs historical clusters; labels: New, Recurring, Ongoing.
Controversy — computeControversyScore() from keywords in titles/descriptions; labels: low, medium, high.
Hidden trends — Small clusters with high novelty/momentum or rapid 24h growth; "Early Trend Alert" in pulse modal.
Coverage bias — Diversity (unique feeds / total articles); labels e.g. Industry-driven vs Grassroots.
Narrative shifts — NarrativeShiftService and "What Changed Today" on the homepage.

Admin & UX

Admin Dashboard & User Flow

Admin (/admin/content-engine): dashboard (last run, last pulse, tokens, limits), Content Engine settings (DB overrides), and Runs list with Livewire. Clustering and summarization are triggered from here.

User experience: Homepage pulse cards (e.g. "Tech News ↗ +45%") → category deep-dive with trending topics and SVG trend charts → time windows and thresholds adapt per category for meaningful insights.

Real-time Pulse

The Trending Update Debate: IOS Vs Android Edition

Activity in apple-android is stable.

+0% Apple Android

Hot Take: Trending Update Is The Future (Or Just More Hype)

Activity in artificial-intelligence is stable.

+0% Artificial Intelligence

The Trending Update Showcase: Technology Meets Creativity

Activity in digital-art is stable.

+0% Digital Art

Hot Take: Trending Update Is The Hardware We've Been Waiting For

Activity in hardware is stable.

+0% Hardware

Why Trending Update Is Peak Internet Culture

Activity in internet-culture is stable.

+0% Internet Culture

The Trending Update Experience: Art, Music, Film, And More

Activity in media-we-love is stable.

+0% Media We Love

Breaking: Trending Update Developments (And Why They Matter)

Activity in tech-news is stable.

-100% Tech News

The Trending Update Approach: Building Better, Not Just Faster

Activity in webdev is stable.

+0% Webdev

Engine

Last Run 00:23
Total Clusters (7d) 142
AI Tokens Used 4.2M

Huement Nav

The Content Engine

1. Scrape

2. Tag

3. Cluster

4. Summarize

5. Pulse

ContentScraper

Key methods

TagSuggester

Key functionality

ArticleClusteringService

Key functionality

AutoSummarizationService

Step-by-step generation

PulseService

Key functionality

Pulse Cards & Category Stats

Deep Intelligence

Admin Dashboard & User Flow

The Trending Update Debate: IOS Vs Android Edition

Hot Take: Trending Update Is The Future (Or Just More Hype)

The Trending Update Showcase: Technology Meets Creativity

Hot Take: Trending Update Is The Hardware We've Been Waiting For

Why Trending Update Is Peak Internet Culture

The Trending Update Experience: Art, Music, Film, And More

Breaking: Trending Update Developments (And Why They Matter)

The Trending Update Approach: Building Better, Not Just Faster

Engine

SITEMAP MENU