how does this work?

1. Overview

Intelligence reads the world's news as a stream of structured events, builds a graph of relationships from it, and surfaces a daily feed of AI-written sentences describing where things are going. Each sentence carries a numeric confidence and an honest calibration tier.

What it is not: a news aggregator, a sentiment dashboard, a Polymarket viewer, or a chatbot. It is a calibrated reasoning engine over a news entity-graph, with prose generated by a large language model under tight constraints.

The core question this system tries to answer: given everything that just happened in the world, what shifted, and how confident should we be that those shifts mean something?

2. Architecture

Three input streams feed a graph database. Pattern detectors read the graph state and produce candidate claims. Each claim's confidence is computed numerically; its sentence is written by an LLM. The two paths never cross.

Three sources push events into the system: GDELT (15-min global feed), RSS (think-tank, defense, tech press, Reddit) and FRED (macro indicators). Two enrichment layers supplement specific stages without producing events: Wikidata gives every entity a canonical QID at link time, and Polymarket resolved-market outcomes train the logistic model that converts signal magnitude into a calibrated probability. The chain itself runs vertically: events are linked to seeded entities; each shared event yields a co-mention edge weighted by the article's publisher tier × tone × time decay; recursive PageRank over the resulting undirected graph produces per-node gravity; pattern detectors fire on graph state; the system computes a confidence number from signal magnitude (refined by the Polymarket-trained mapping when the pattern shape has resolved-market history), and the LLM writes prose given the pattern + signals, never the number. Both outputs land in the claims table together.

3. Data pipeline

Three event sources land in the same events table, distinguished by the source column:

GDELT [1] — a global CSV every 15 minutes, ~2,000 rows in 61 tab-delimited columns, drawn from 100+ languages of news media and tagged with the CAMEO ontology of conflict and cooperation [2]. The dominant volume.
RSS feeds — hourly pulls from think-tank, OSINT, and defense outlets that GDELT under-represents (CSIS, Brookings, CFR, RUSI, Defense One, War on Rocks, Bellingcat, The Diplomat, Al-Monitor, plus BBC / NPR / Guardian / NYT World as ground truth). Item titles run through the same name + person + topic matchers as GDELT, so an RSS row links to seeded entities the same way a GDELT row does.
FRED indicators [11] — daily pull of ~10 economic time-series (Fed funds rate, 10-year Treasury, CPI, unemployment, oil, FX, VIX). Each indicator is also a node in the entity graph, so co-mentioning Powell with the FOMC headline produces an edge between Jerome Powell and the Fed Funds Rate indicator.

Every event row carries a publisher_tier (1=global wire, 2=national press / think tank, 3=regional / state-aligned, 4=aggregator / content farm, 5=unknown), an alert flag if the title matches the escalation keyword list, and a list of topic tags from a curated catalog (tariffs, fed-rates, iran-nuclear, election, …). All four are precomputed at ingest so downstream pattern detection is a fast SQL join.

A real GDELT event

To make this concrete, here is one event from the live database (a real row, ingested 2026-03-10), shown in three views: as it lands in the raw GDELT export, as it ends up in our events table, and as it links into the entity graph.

1. Raw GDELT row (key columns of 61)

GLOBALEVENTID         1293508008
SQLDATE               20260310
Actor1Name            ISRAELI MILITARY
Actor1CountryCode     ISR              (CAMEO)
Actor1Geo_CountryCode IS               (FIPS, geo-confirmed)
Actor2Name            HAMAS
Actor2CountryCode     PSE
Actor2Geo_CountryCode GZ
EventCode             112              (ACCUSE, within DISAPPROVE root)
EventRootCode         11
QuadClass             4                (material conflict)
GoldsteinScale        -2.0
AvgTone               -7.434           (sentence-level sentiment)
ActionGeo_CountryCode GZ
ActionGeo_FullName    Gaza Strip, Gaza, GZ
ActionGeo_Lat         31.500
ActionGeo_Long        34.750
DATEADDED             20260310203000
SOURCEURL             https://www.newcastleherald.com.au/story/9195158/
                      israeli-army-kills-three-in-southern-gaza-strip-tunnel/

2. After ingestion: row in our `events` table

id               659907
source           gdelt                  (the data provider, not the publisher)
source_id        1293508008             (the GDELT GLOBALEVENTID)
occurred_at      2026-03-10 20:30:00
cameo_code       112
tone             -7.434
location_lat     31.500
location_lon     34.750
url              https://www.newcastleherald.com.au/story/9195158/...
publisher_tier   3                      (regional Australian press)
publisher_weight 0.600                  (tier 3 → 0.6)

The actor codes, geo-confirmation, and Goldstein scale are read at parse time for filtering and entity-resolution decisions, but only the columns above are persisted: enough to reason about the event, recover the source, and join to entities. source here means the data provider (gdelt, treendly, polymarket), not the publisher of the article. The publisher is captured separately as publisher_tier (1=global wire / major mainstream, 2=national press, 3=regional or specialty, 4=aggregator or content farm, 5=unknown), which derives a publisher_weight used to weight the event's contribution to graph edges. The Newcastle Herald is a regional Australian daily, so this row carries weight 0.60 rather than the 1.00 a Reuters or AP wire would.

Why have a tier at all: a story that was simmering in regional outlets and then breaks into Reuters/BBC/NYT is qualitatively different from one that has been in the wires all along. The mainstream pickup is an early signal that editors who cover the region seriously have decided this matters, and it is exactly what the mainstream_crossing pattern detector looks for. Tier 5 (unknown) defaults to 0.50 so unfamiliar publishers participate in edges without pulling the model in either direction.

3. After entity linking: rows in `entity_event`

entity_event:
  (entity=Israel,    event=659907, role=actor1)     ← Actor1CountryCode=ISR, geo IS confirmed
  (entity=Palestine, event=659907, role=actor2)     ← Actor2CountryCode=PSE, geo GZ confirmed
  (entity=Hamas,     event=659907, role=actor2)     ← Actor2Name="HAMAS" matched org alias
  (entity=Gaza Strip,event=659907, role=proximity)  ← Action lat/lon within Gaza radius

Three of those links come from GDELT's own data (actor country codes, actor name matching). The fourth, Gaza Strip, was added by the proximity-linking pass: the event's lat/lon falls within Gaza's seeded radius, so the place entity is attached even though GDELT didn't name it. From there the event becomes one edge in the recursive PageRank graph.

Pipeline steps

The ingest step downloads the latest GDELT export, filters rows to Middle East scope using a CAMEO/FIPS country-code cross-check (it skips name-based false positives, e.g. a Texas county geocoded as "Jordan" because the actor's name happens to be Jordan), and writes event records.
For each event, actor names and country codes are linked to canonical entities (people, organizations, places, topics). Country actors link via CAMEO codes; named persons and orgs link via name + alias substring match against the seeded entity dictionary.
The edge build step emits a co-mention edge between every pair of entities that appear in the same event. Edges carry a weight (event source weight × time decay) and a timestamp.
The gravity recompute step runs a recursive PageRank iteration over the edge set and persists each entity's gravity and momentum.
The pattern detection step scans the current graph state for triggered patterns; the claim generation step turns triggered patterns into claims.

Articles are not re-scraped; the seed entity catalogue is partly imported from hipcityreg/situation-monitor [3] which has done the curation work. Polymarket markets [4] are pulled from the public Gamma API and matched to entities by question-text substring against the entity name dictionary.

4. The entity graph

Nodes are entities with stable identity, every entity carries a Wikidata QID [5] when one exists. Edges are co-mentions: pairs of entities that appeared in the same news event within a recent window. The graph is undirected — an edge means "these two were talked about together," not "X did something to Y."

GDELT carries direction at the event level (Actor1 → Actor2 with a CAMEO action code), but we deliberately flatten that into an undirected co-mention edge for the graph layer. News direction is noisy: the same conflict produces paired rows like "Iran ACCUSED Israel" and "Israel STRUCK Iran" with opposite directions, and naïvely summing them washes out the underlying signal that both entities are entangled. Direction itself isn't lost: each event row keeps its CAMEO code and tone, which feed the tone_shift pattern and the publisher-tier weighting; the graph's job is to answer "who is entangled with whom," not "who is acting on whom." If a future pattern needs explicit directionality we'll add a directed edge type alongside co_mention rather than rebuild the existing one.

A toy fragment of the Middle East subgraph. Edge thickness is proportional to co-mention frequency in the last 30 days. Each node carries a Wikidata QID and a recursive-PageRank score (gravity).

Wikidata QIDs are non-negotiable. Following Sahu et al. [6], who showed that LLM-generated knowledge graphs from GDELT suffer from entity inconsistency (DALI and THE DALI as separate nodes; 435/968 isolated vertices in GraphRAG), the system uses an ontology-grounded canonical ID for every entity. The LLM never assigns identity.

5. Recursive PageRank

Influence flows through the graph the way the original PageRank algorithm [7] proposed: an entity is important when other important entities point to it, recursively. The same idea was popularised for AI account influence ranking by the Digg AI 1000 system [8], which directly inspired this design.

The iteration:

gravity(t+1)[i]  =  (1 − d) / N
                  +  d · Σⱼ ( gravity(t)[j] · w(j → i) )
                  +  d · ( dangling_mass / N )

w(j → i)  =  weight(j↔i) · exp(−Δt · ln 2 / halflife)
              ─────────────────────────────────────────
              Σ_k weight(j↔k) · exp(−Δt · ln 2 / halflife)

Where d = 0.85 (standard damping), halflife = 14 days (so a 30-day-old edge contributes ~23% of a fresh edge), N is the entity count, and dangling mass is redistributed uniformly so isolated entities don't bleed mass to themselves. Convergence is reached in 20–30 iterations on a graph of our size; the entire recompute runs in well under a second per day in pure PHP.

Why time decay matters. Without it, the graph fossilizes, yesterday's big news stays at the same gravity forever. The 14-day half-life is an opinion, not a measured optimum: long enough that a brief news cycle doesn't immediately outweigh structural relationships, short enough that this week's events dominate. Tunable.

6. Pattern detection

Patterns are finite, pre-defined graph phenomena. The LLM does not choose what to write about, that would create variance and unaccountable output. Each pattern type defines a numeric detector, an LLM prompt template, and a calibration tier.

Pattern	Triggers when	Sentence shape	Tier
`rising_edge`	edge weight A↔B in last 30d ≥ 2× prior 30d, ≥3 events corroborating	"{A}–{B} cooperation/tension is intensifying"	medium
`falling_edge`	edge weight A↔B in last 30d ≤ 0.5× prior 30d, ≥3 prior events	"{A}–{B} relationship is cooling / backchannel going quiet"	medium
`gravity_surge`	entity gravity momentum > +20% AND gravity > 1.5× baseline	"{Entity} is becoming a focal point"	medium
`gravity_collapse`	entity gravity momentum < −20% AND gravity > 1.5× baseline	"{Entity}'s influence is declining"	medium
`tone_shift`	mean GDELT tone for entity shifts ≥ 2.0 between 14d windows	"{Topic} sentiment is hardening / softening"	medium
`cluster_formation`	event count for entity in last 14d ≥ 3× prior 14d, ≥5 events	"{Entity} is becoming a flashpoint"	low
`triangular_tightening`	three entities A,B,C with all 3 pairwise edges rising ≥1.5×	"{A}, {B}, {C} are converging"	low

The 1.5× baseline gate on gravity surge/collapse is load-bearing. With N entities the uniform gravity is 1/N; in a sparse graph, PageRank redistributes mass onto the few nodes with edges, mathematically squeezing everyone else by ~20% as a regression-to-mean artifact. Without the 1.5/N floor, every other entity would fire as a "collapse" the moment two of them surged. The threshold scales with graph size automatically.

Two further pattern types exist in the catalogue but are dormant until their inputs are wired:

polymarket_trajectory, fires when graph state matches a pattern that historically resolved YES on Polymarket. Activates once the calibration model is trained (§8).
mainstream_crossing, fires when a narrative previously confined to fringe-tier sources begins appearing in mainstream-tier sources. Borrows the source-tier classification from hipcityreg/situation-monitor [3]. Activates once we ingest sources beyond GDELT.

7. Prose vs. number, the strict separation

This is the hard architectural rule that distinguishes this system from a generic "LLM reads news and writes commentary" tool: the LLM never produces, sees, or influences the confidence number.

The order of operations matters. When a pattern fires, the system first computes the numeric confidence from the signal magnitude (and, for tiers that have it, the Polymarket-trained mapping in §8). Then the LLM is handed the pattern type, the entities, and the structured signals — but never the resulting number — and asked to write a sentence. Two outputs come back: a sentence (from the LLM) and a confidence (from the formula). They land in the same claims row. The LLM doesn't know what confidence its sentence will be paired with, and the confidence math doesn't know what sentence will be paired with it.

The two paths run in parallel from a triggered pattern and never cross. Mixing them produces hallucinated confidence dressed as analysis, the failure mode this design is built to prevent.

The reasoning, taken seriously:

If the LLM produced the probability, we would get confident-sounding numbers with no grounding. MIRAI [9] showed that even GPT-4o agents reach only F1 ≈ 32.6 on relation forecasting from GDELT, and degrade sharply at long horizons. Letting the model self-report would mask that ceiling.
If the system produced the prose from raw numbers, output would be unreadable and rotelike. Each pattern type would need a hard-coded sentence template per entity-type combination. The LLM does this well; the formula does not.
The LLM is told what kind of pattern fired and given the signal facts, not the confidence value. Its job is to write a short, neutral sentence with no numbers, no hedging, no quotation marks. Temperature is low (0.2). The prompt is hashed and stored on every claim so prose drift is auditable.

8. Polymarket calibration

Polymarket [4] is the calibration substrate, never the user-facing primitive. Users never see a market or a question. Polymarket's role is purely upstream of the confidence model: it provides ground-truth outcomes against which the system's signals can be checked.

The mechanic, in three steps:

Polymarket has thousands of resolved binary markets ("Will X happen by Y date — YES/NO") with a known, money-backed outcome.
For each resolved market whose entities/topics overlap our graph, we replay what our graph signals were saying in the days before resolution. The market_features table accumulates that time series per market: gravity, momentum, edge weight, event count.
A logistic regression learns the mapping graph state → resolution:

P(resolution = YES)  =  f(graph state at time t,
                          for the entities mentioned in the question)

That trained f is what produces the confidence number for live claims whose pattern shape is well-represented in the training data. For shapes with little or no Polymarket coverage, the confidence falls back to a function of raw signal magnitude — and the calibration tier reflects that.

A calibration curve plots predicted probability (x) against the empirical frequency of YES resolutions (y). A perfectly calibrated model lies on the dashed diagonal. The Brier score quantifies the mean squared deviation. This page will populate with the live curve once enough resolved markets accumulate; until then, claims are tagged with calibration tier rather than precise probability.

Every claim carries a calibration tier badge (HIGH / MEDIUM / LOW). The tier reflects how the pattern shape behind the claim relates to Polymarket history, not whether any specific bet was validated TRUE for these entities:

HIGH — this pattern shape (e.g. polymarket_trajectory) has many resolved markets backing it. The confidence number is the trained model's output and is empirically grounded. Read 70% as "70% chance of YES on similar historical markets."
MEDIUM — some Polymarket coverage exists for this shape but with limited resolutions. The number is directional rather than precisely calibrated. Read it as "strong, but treat the percentage with a wide error bar."
LOW — no Polymarket history for this shape (most geo_cluster, triangular_tightening, cluster_formation claims). The 70% is a function of raw signal magnitude only, not a calibrated probability. Read it as "magnitude of shift," not "chance of outcome." The system is being honest that it has nothing to check itself against.

So a claim like The Strait of Hormuz is experiencing a sharp escalation in military and geopolitical activity at 70% LOW means "the underlying signal is strong (a clear geographic event cluster), but no resolved market has a shape we can map this to, so we won't pretend the number is a probability." That distinction is the antidote to the "confident-sounding sentence" failure mode — the system's job includes telling the user when it is guessing.

9. Three independent witnesses

Every claim's confidence integrates three sources, computed numerically:

News graph (GDELT 15-min events + RSS feeds from think tanks and OSINT outlets) — is this entity gaining centrality, is co-occurrence tightening, are the headlines using escalation language?
Polymarket calibration — does the resulting feature pattern resemble historical patterns that resolved YES on money-backed binary markets?
FRED economic indicators — Fed funds rate, CPI, unemployment, oil, FX, VIX. The macro substrate against which rate-, inflation-, and energy-flavoured claims can be sanity-checked rather than left to news framing alone.

Where they agree, the signal is trusted. Where they disagree, the divergence is itself the interesting datum. A graph signal that's invisible in the macro indicators may be a real geopolitical story not yet priced in; a Polymarket move uncorroborated by news flow may be informed traders front-running a leak; an indicator shift unaccompanied by news is the market expecting something coverage hasn't caught up to. All three are surfaced rather than collapsed.

Treendly's attention-pulse layer (search-and-social rising-trend signal) is in the architecture but not yet wired in — planned as a fourth corroborating witness for the claims that depend on whether the public is paying attention, distinct from whether elites or markets are.

10. User-facing surfaces

The system has four surfaces a reader actually opens, and each one is meant to answer a different question.

The feed

The homepage is a list of AI-written sentences with a confidence percentage, a calibration tier badge, a 7-day delta, and a 14-day inline sparkline. The sparkline reads from the claim_history table that records each claim's confidence on every generation pass; this is what turns "78% right now" into "78% and rising for the past week" at a glance. Above the feed sits a heatmap that aggregates ~500K events into ~2K geographic cells server-side, so the density is real (every ingested event contributes) without shipping a megabyte-scale payload.

The claim detail

Clicking any sentence opens a fuller reasoning view. At the top is a cached AI-written 2-3 paragraph summary that frames what is happening, why it matters, and what to watch next. Below it sits the structured evidence: the entities involved (each linkable to its own dossier), the numeric signals that produced the confidence, the contributing events with reconstructed-article titles when available, the calibration tier with an explanation, and a confidence-over-time chart. The summary is regenerated only when the underlying signals change (we hash the signals JSON and compare), so visiting a claim does not burn LLM credits.

The entity dossier

For any entity in the graph, the dossier shows its current gravity and momentum, its top connected entities (time-decayed edge weights), the recent events it appears in, and the active claims it features in. It is the Digg-AI-1000 [8] view applied to news, with geography on top: a markers map shows where this entity's events happen spatially.

The /review page

An admin-only QA view listing every claim generated in a recent window, with the structured signals expandable per claim. Used to spot prose drift, mismatched signals, or "confident-sounding nonsense" failures before a broader audience does. Filters by pattern type and time window.

Article-text reconstruction

Earlier in the data pipeline, each event lands with only a URL and metadata, never a real article body. To ground LLM prose with actual headlines and specifics, the system runs Bertaglia et al.'s gdeltnews package [10] as a Python sidecar: every hour it groups recent events by their 15-min GDELT publication window, downloads the matching Web News NGrams 3.0 files, and reconstructs each article's full text by merging overlapping n-grams (~95% similarity vs. the original). The reconstructed text feeds the prose layer and lives in the articles table; it is the difference between "Iran is intensifying nuclear focus" and "Iran is intensifying nuclear focus following the Bushehr drill on March 8."

11. Limitations

The honest list of where this system is weak:

Forecast horizons. MIRAI [9] showed LLM-agent F1 collapses from 1-day to 90-day horizons. Even with calibration, this system is most useful at 1–14 day horizons. Past that, treat output as qualitative.
GDELT noise. CAMEO codes are coarse, and the geocoder makes name-based mistakes (a Texas locality of Jordan getting tagged as the country). The system filters this with a CAMEO/FIPS cross-check, but noise remains.
Entity coverage is curated, not learned. Each new domain requires a seed entity set. The architecture is general but each port is real work, categories are units, not labels.
Source-tier dynamics aren't modelled yet. Until we ingest beyond GDELT, the mainstream_crossing pattern is dormant and we can't separate fringe narratives from established ones.
The LLM is an unreliable narrator. Even with low temperature and hashed prompts, prose can drift. Weekly review (/review) surfaces every generated sentence with its underlying signals so a human can spot prose that doesn't match the data.
Cold-start period. Pattern detectors that rely on "recent vs. prior" windows need ~30 days of GDELT history before they fire reliably. During the cold-start, only single-window patterns (like gravity_surge) can produce claims, and those are inherently more abstract.

12. References

Leetaru, K. and Schrodt, P. A. (2013). GDELT: Global Data on Events, Location and Tone, 1979–2012. International Studies Association Annual Convention. gdeltproject.org
Schrodt, P. A. (2012). CAMEO: Conflict and Mediation Event Observations Event and Actor Codebook. Pennsylvania State University.
hipcityreg (2026). situation-monitor, real-time dashboard for global news, markets, and geopolitical events. github.com/hipcityreg/situation-monitor. Source-tier classification, leader catalogues, and geographic seed data are imported from this project's open configs.
Polymarket. Polymarket Gamma API. docs.polymarket.com
Vrandečić, D. and Krötzsch, M. (2014). Wikidata: A free collaborative knowledgebase. Communications of the ACM 57(10): 78–85. wikidata.org
Sahu, A. et al. (2025). Talking to GDELT through Knowledge Graphs. arXiv:2503.07584. arxiv.org/abs/2503.07584
Page, L., Brin, S., Motwani, R. and Winograd, T. (1998). The PageRank Citation Ranking: Bringing Order to the Web. Stanford InfoLab Technical Report.
Digg AI 1000 (2026). Recursive PageRank ranking of AI accounts on X. digg.com/ai-1000. The recursive-influence framing in this system is directly inspired by Digg's approach.
Sun, C. et al. (2024–2025). MIRAI: Evaluating LLM Agents for International Event Forecasting. NeurIPS Datasets & Benchmarks 2025, arXiv:2407.01231. arxiv.org/abs/2407.01231
Bertaglia, T. et al. (2026). Free Access to World News: Reconstructing Full-Text Articles from GDELT. Big Data and Cognitive Computing 10(2): 45. mdpi.com/2504-2289/10/2/45. The gdeltnews Python package referenced as the v1.5 ingestion path.
Federal Reserve Bank of St. Louis. FRED Economic Data API. fred.stlouisfed.org/docs/api. Free programmatic access to 800,000+ macroeconomic time-series. We pull ~10 series daily as the third independent witness alongside the news graph and Polymarket calibration.

How Flat9 Intelligence works