Automated Intelligence

The Content Engine

A multi-stage pipeline that transforms raw RSS feeds into semantically clustered, AI-summarized insights. Features advanced repeat detection, automated publishing to Statamic, trend analysis via Pulse, and evolutionary tracking with Narrative Shifts.

1. Scrape

Readability & Markdown

2. Tag

RAKE Keyword Extraction

3. Cluster

DBSCAN & Merge Step

4. Summarize

Gemini AI Multi-Step

5. Pulse

Velocity & Trends

Content Engine Promotional Artwork
Extraction Layer

ContentScraper

Fetches and extracts main article content from a URL and cleans it for downstream processing.

Key methods
  • scrapeArticleContent(string $url) — Fetches HTML, uses fivefilters/Readability to extract core content (no ads/nav), then converts to clean Markdown.
  • scrapeArticleWithMetadata(string $url) — Same plus title, author, site name, excerpt.
  • Data Retention — Automatically cleans up articles older than 6 months unless they are pinned to a Summary Object.

Content is converted to Markdown via a ContentCleaner for high-density input to clustering.

PHP
$readability->parse($html);
$content = $readability->getContent();
$contentCleaner = new ContentCleaner;
$cleanContent = $contentCleaner->htmlToMarkdown($content);

return [
  'content'   => $cleanContent,
  'title'     => $readability->getTitle(),
  'author'    => $readability->getAuthor(),
  'site_name' => $readability->getSiteName(),
  'excerpt'   => $readability->getExcerpt(),
];
Tagging

TagSuggester

Analyzes text and suggests relevant, SEO-friendly tags using the RAKE algorithm.

Key functionality
  • suggestTags(string $text, int $numTags = 5) — Uses donatelloza/rake-plus (Rapid Automatic Keyword Extraction): word frequency and co-occurrence.
  • Blacklist filtering — Filters out non-descriptive words (e.g. "today", "article") so only meaningful keywords are suggested.
PHP
$rake = RakePlus::create($text, 'en_US');
$phrases = $rake->keywords();

foreach ($phrases as $phrase) {
  if ($this->isBlacklisted(Str::lower($phrase))) continue;
  $filteredTags[] = Str::slug($cleanPhrase);
}
return $filteredTags;
Machine Learning

Rubix ML Implementation

We use the rubix/ml package to handle the heavy lifting of vectorization and density-based clustering.

PHP
// Build dataset from article title+description
$dataset = Unlabeled::quick($samples);
$dataset
->apply(new TextNormalizer())
->apply(new WordCountVectorizer(5000, 1, 0.99, new Word()))
->apply(new TfIdfTransformer());

// DBSCAN with Cosine distance (0–1 scale)
$estimator = new DBSCAN(0.4, 2, new BallTree(20, new Cosine()));
$predictions = $estimator->predict($dataset);
Clustering Logic

ArticleClusteringService

Groups semantically similar articles into Topic Clusters—the core of trending story detection.

Advanced Features
  • Merge Step — Post-clustering, we merge clusters that are semantically identical but were separated by DBSCAN due to slight density variations.
  • Hot Clusters — When multiple related stories break in one category, we aggregate them into a "Hot Story" cluster for a single comprehensive report.
  • Dynamic Epsilon — Category-specific neighborhood size (e.g. tech-news stricter, media-we-love looser).
  • TF-IDF & Cosine Similarity — Used to measure the semantic distance between articles and clusters.
PHP
// Post-clustering merge threshold
'cluster_merge_similarity_threshold' => 0.82,
'cluster_merge_max_size' => 8,

// Hot cluster detection
if ($run->repeatClusters()->count() >= 2) {
  $this->createHotStoryClustersForRun($run);
}
Generative AI

AutoSummarizationService

Takes a TopicCluster and generates a human-readable summary via Gemini AI, with a full audit trail in summary_generation_steps.

Step-by-step generation
  • scrape_article_* — Scrapes each article in the cluster if needed.
  • individual_summary_* — Summary per article (normal clusters).
  • repeat_summary_* — Specialized steps for Hot Clusters; summarizes each unique story within the aggregate.
  • master_summary — Cohesive synthesis of all articles or repeat summaries.
  • suggest_title — AI proposes Huement-style titles from category templates.
Scrape Step 1
Individual Summaries Step 2
Master Synthesis Step 3
Title Generation Step 4
Trend Intelligence

PulseService

Generates daily Pulse stats per category: velocity, trend lines, trending topics, and the SVG charts used on the homepage and category pages.

Key functionality
  • generateDailyPulse() — Runs per category with configurable time windows (7–21 days).
  • Smart trend analysis — Compares current vs previous window; fast categories (e.g. Tech) use 7-day, slower (e.g. Digital Art) use 21-day.
  • Pulse typetrend_line (meaningful activity) or minimal_activity (low activity).
  • SVG generationgenerateTrendGraph() (sparklines), generateLargeTrendGraph() (area charts with gradients/glow).
PHP
// Category-specific time windows
'tech-news' => 7,
'artificial-intelligence' => 7,
'webdev' => 10,
'hardware' => 14,
'digital-art' => 21,
Frontend

Pulse Cards & Category Stats

Pulse cards (pulse-card.blade) show velocity, trend lines or minimal-activity view, activity trend (↗/↘), and trending topics. Category stats (category-stats.blade) show N-day velocity, trending topic badges, and total articles. Both use stats_json from DailyPulse (velocity, svg_points, trending_topics, confirmation_score, etc.).

Evolutionary Intelligence

NarrativeShiftService

Detects how a story evolves over time by comparing the semantic summaries of current clusters against historical ones in the same category.

Key functionality
  • What Changed Today — Surfaced on the homepage when a meaningful shift in the narrative is detected.
  • Semantic Comparison — Uses AI to compare "Window A" vs "Window B" and identify new developments, rebuttals, or resolution of events.
PHP
// Detection window (default 7 days)
$shifts = $narrativeService->runDetectionAndRecordShifts();
// Persisted to narrative_shifts table
Visual Intelligence

NanoBananaService

Automatically generates featured images for blog posts using Gemini 2.5 Flash Image.

Key functionality
  • Prompt Synthesis — Builds descriptive image prompts from the article's summary, category, and tags.
  • Style Injection — Ensures a consistent Huement visual style across all generated assets.
PHP
$result = $nanoBananaService->generateImage($prompt);
// Returns local storage path and token usage metadata
Publishing Layer

Statamic Integration

Seamlessly bridges the AI pipeline with our Statamic CMS via the Area of Interest (a_o_i) taxonomy.

  • Automated Tagging — Maps AI-suggested tags to Statamic taxonomy terms.
  • Blueprint Mapping — Ensures generated content fits the strict structural requirements of our blog blueprints.
Phase 2

Deep Intelligence

  • NoveltyClusterInsightService::computeNovelty() vs historical clusters; labels: New, Recurring, Ongoing.
  • ControversycomputeControversyScore() from keywords in titles/descriptions; labels: low, medium, high.
  • Hidden trends — Small clusters with high novelty/momentum or rapid 24h growth; "Early Trend Alert" in pulse modal.
  • Coverage bias — Diversity (unique feeds / total articles); labels e.g. Industry-driven vs Grassroots.
  • Narrative shiftsNarrativeShiftService and "What Changed Today" on the homepage.
Admin & UX

Admin Dashboard & User Flow

Admin (/admin/content-engine): dashboard (last run, last pulse, tokens, limits), Content Engine settings (DB overrides), and Runs list with Livewire. Clustering and summarization are triggered from here.

User experience: Homepage pulse cards (e.g. "Tech News ↗ +45%") → category deep-dive with trending topics and SVG trend charts → time windows and thresholds adapt per category for meaningful insights.

The engine is aggregating data. Check back shortly for new pulses.

Engine
  • Last Run 07:03
  • Total Clusters (7d) 142
  • AI Tokens Used 4.2M