How Flat9 Intelligence works
A working explanation of what this platform does, how it produces its sentences, what the confidence numbers mean, and where it can be trusted.
1. Overview
Intelligence reads the world's news as a stream of structured events, builds a graph of relationships from it, and surfaces a daily feed of AI-written sentences describing where things are going. Each sentence carries a numeric confidence and an honest calibration tier.
What it is not: a news aggregator, a sentiment dashboard, a Polymarket viewer, or a chatbot. It is a calibrated reasoning engine over a news entity-graph, with prose generated by a large language model under tight constraints.
The core question this system tries to answer:
given everything that just happened in the world, what shifted, and how
confident should we be that those shifts mean something?
2. Architecture
Three input streams feed a graph database. Pattern detectors read the graph state and produce candidate claims. Each claim's confidence is computed numerically; its sentence is written by an LLM. The two paths never cross.
3. Data pipeline
Three event sources land in the same events table, distinguished
by the source column:
- GDELT [1] — a global CSV every 15 minutes, ~2,000 rows in 61 tab-delimited columns, drawn from 100+ languages of news media and tagged with the CAMEO ontology of conflict and cooperation [2]. The dominant volume.
- RSS feeds — hourly pulls from think-tank, OSINT, and defense outlets that GDELT under-represents (CSIS, Brookings, CFR, RUSI, Defense One, War on Rocks, Bellingcat, The Diplomat, Al-Monitor, plus BBC / NPR / Guardian / NYT World as ground truth). Item titles run through the same name + person + topic matchers as GDELT, so an RSS row links to seeded entities the same way a GDELT row does.
- FRED indicators [11] — daily pull of ~10 economic time-series (Fed funds rate, 10-year Treasury, CPI, unemployment, oil, FX, VIX). Each indicator is also a node in the entity graph, so co-mentioning Powell with the FOMC headline produces an edge between Jerome Powell and the Fed Funds Rate indicator.
Every event row carries a publisher_tier (1=global wire, 2=national press / think tank, 3=regional / state-aligned, 4=aggregator / content farm, 5=unknown), an alert flag if the title matches the escalation keyword list, and a list of topic tags from a curated catalog (tariffs, fed-rates, iran-nuclear, election, …). All four are precomputed at ingest so downstream pattern detection is a fast SQL join.
A real GDELT event
To make this concrete, here is one event from the live database (a real
row, ingested 2026-03-10), shown in three views: as it lands in the raw
GDELT export, as it ends up in our events table, and as it
links into the entity graph.
1. Raw GDELT row (key columns of 61)
GLOBALEVENTID 1293508008
SQLDATE 20260310
Actor1Name ISRAELI MILITARY
Actor1CountryCode ISR (CAMEO)
Actor1Geo_CountryCode IS (FIPS, geo-confirmed)
Actor2Name HAMAS
Actor2CountryCode PSE
Actor2Geo_CountryCode GZ
EventCode 112 (ACCUSE, within DISAPPROVE root)
EventRootCode 11
QuadClass 4 (material conflict)
GoldsteinScale -2.0
AvgTone -7.434 (sentence-level sentiment)
ActionGeo_CountryCode GZ
ActionGeo_FullName Gaza Strip, Gaza, GZ
ActionGeo_Lat 31.500
ActionGeo_Long 34.750
DATEADDED 20260310203000
SOURCEURL https://www.newcastleherald.com.au/story/9195158/
israeli-army-kills-three-in-southern-gaza-strip-tunnel/
2. After ingestion: row in our events table
id 659907 source gdelt (the data provider, not the publisher) source_id 1293508008 (the GDELT GLOBALEVENTID) occurred_at 2026-03-10 20:30:00 cameo_code 112 tone -7.434 location_lat 31.500 location_lon 34.750 url https://www.newcastleherald.com.au/story/9195158/... publisher_tier 3 (regional Australian press) publisher_weight 0.600 (tier 3 → 0.6)
The actor codes, geo-confirmation, and Goldstein scale are read at parse time
for filtering and entity-resolution decisions, but only the columns above are
persisted: enough to reason about the event, recover the source, and join to
entities. source here means the data provider (gdelt, treendly,
polymarket), not the publisher of the article. The publisher is captured
separately as publisher_tier (1=global wire / major mainstream,
2=national press, 3=regional or specialty, 4=aggregator or content farm,
5=unknown), which derives a publisher_weight used to weight
the event's contribution to graph edges. The Newcastle Herald is a regional
Australian daily, so this row carries weight 0.60 rather than the 1.00 a
Reuters or AP wire would.
Why have a tier at all: a story that was simmering in regional outlets and
then breaks into Reuters/BBC/NYT is qualitatively different from one that
has been in the wires all along. The mainstream pickup is an early signal
that editors who cover the region seriously have decided this matters, and
it is exactly what the mainstream_crossing pattern detector
looks for. Tier 5 (unknown) defaults to 0.50 so unfamiliar publishers
participate in edges without pulling the model in either direction.
3. After entity linking: rows in entity_event
entity_event: (entity=Israel, event=659907, role=actor1) ← Actor1CountryCode=ISR, geo IS confirmed (entity=Palestine, event=659907, role=actor2) ← Actor2CountryCode=PSE, geo GZ confirmed (entity=Hamas, event=659907, role=actor2) ← Actor2Name="HAMAS" matched org alias (entity=Gaza Strip,event=659907, role=proximity) ← Action lat/lon within Gaza radius
Three of those links come from GDELT's own data (actor country codes, actor
name matching). The fourth, Gaza Strip, was added by the
proximity-linking pass: the event's lat/lon falls within Gaza's seeded
radius, so the place entity is attached even though GDELT didn't name it.
From there the event becomes one edge in the recursive PageRank graph.
Pipeline steps
- The ingest step downloads the latest GDELT export, filters rows to Middle East scope using a CAMEO/FIPS country-code cross-check (it skips name-based false positives, e.g. a Texas county geocoded as "Jordan" because the actor's name happens to be Jordan), and writes event records.
- For each event, actor names and country codes are linked to canonical entities (people, organizations, places, topics). Country actors link via CAMEO codes; named persons and orgs link via name + alias substring match against the seeded entity dictionary.
- The edge build step emits a co-mention edge between every pair of entities that appear in the same event. Edges carry a weight (event source weight × time decay) and a timestamp.
- The gravity recompute step runs a recursive PageRank iteration over the edge set and persists each entity's gravity and momentum.
- The pattern detection step scans the current graph state for triggered patterns; the claim generation step turns triggered patterns into claims.
Articles are not re-scraped; the seed entity catalogue is partly imported from
hipcityreg/situation-monitor [3] which has
done the curation work. Polymarket markets [4] are pulled
from the public Gamma API and matched to entities by question-text substring
against the entity name dictionary.
4. The entity graph
Nodes are entities with stable identity, every entity carries a Wikidata QID [5] when one exists. Edges are co-mentions: pairs of entities that appeared in the same news event within a recent window. The graph is undirected — an edge means "these two were talked about together," not "X did something to Y."
GDELT carries direction at the event level (Actor1 → Actor2 with a CAMEO
action code), but we deliberately flatten that into an undirected co-mention
edge for the graph layer. News direction is noisy: the same conflict produces
paired rows like "Iran ACCUSED Israel" and "Israel STRUCK Iran" with opposite
directions, and naïvely summing them washes out the underlying signal that
both entities are entangled. Direction itself isn't lost: each event row
keeps its CAMEO code and tone, which feed the tone_shift pattern
and the publisher-tier weighting; the graph's job is to answer "who is
entangled with whom," not "who is acting on whom." If a future pattern needs
explicit directionality we'll add a directed edge type alongside
co_mention rather than rebuild the existing one.
Wikidata QIDs are non-negotiable. Following Sahu et al. [6],
who showed that LLM-generated knowledge graphs from GDELT suffer from entity
inconsistency (DALI
and THE DALI
as separate nodes; 435/968
isolated vertices in GraphRAG), the system uses an ontology-grounded canonical
ID for every entity. The LLM never assigns identity.
5. Recursive PageRank
Influence flows through the graph the way the original PageRank algorithm [7] proposed: an entity is important when other important entities point to it, recursively. The same idea was popularised for AI account influence ranking by the Digg AI 1000 system [8], which directly inspired this design.
The iteration:
gravity(t+1)[i] = (1 − d) / N
+ d · Σⱼ ( gravity(t)[j] · w(j → i) )
+ d · ( dangling_mass / N )
w(j → i) = weight(j↔i) · exp(−Δt · ln 2 / halflife)
─────────────────────────────────────────
Σ_k weight(j↔k) · exp(−Δt · ln 2 / halflife)
Where d = 0.85 (standard damping), halflife = 14 days
(so a 30-day-old edge contributes ~23% of a fresh edge), N is the
entity count, and dangling mass is redistributed uniformly so isolated entities
don't bleed mass to themselves. Convergence is reached in 20–30 iterations
on a graph of our size; the entire recompute runs in well under a second per
day in pure PHP.
Why time decay matters. Without it, the graph fossilizes, yesterday's big news stays at the same gravity forever. The 14-day half-life is an opinion, not a measured optimum: long enough that a brief news cycle doesn't immediately outweigh structural relationships, short enough that this week's events dominate. Tunable.
6. Pattern detection
Patterns are finite, pre-defined graph phenomena. The LLM does not choose what to write about, that would create variance and unaccountable output. Each pattern type defines a numeric detector, an LLM prompt template, and a calibration tier.
| Pattern | Triggers when | Sentence shape | Tier |
|---|---|---|---|
rising_edge |
edge weight A↔B in last 30d ≥ 2× prior 30d, ≥3 events corroborating | "{A}–{B} cooperation/tension is intensifying" | medium |
falling_edge |
edge weight A↔B in last 30d ≤ 0.5× prior 30d, ≥3 prior events | "{A}–{B} relationship is cooling / backchannel going quiet" | medium |
gravity_surge |
entity gravity momentum > +20% AND gravity > 1.5× baseline | "{Entity} is becoming a focal point" | medium |
gravity_collapse |
entity gravity momentum < −20% AND gravity > 1.5× baseline | "{Entity}'s influence is declining" | medium |
tone_shift |
mean GDELT tone for entity shifts ≥ 2.0 between 14d windows | "{Topic} sentiment is hardening / softening" | medium |
cluster_formation |
event count for entity in last 14d ≥ 3× prior 14d, ≥5 events | "{Entity} is becoming a flashpoint" | low |
triangular_tightening |
three entities A,B,C with all 3 pairwise edges rising ≥1.5× | "{A}, {B}, {C} are converging" | low |
The 1.5× baseline gate on gravity surge/collapse is load-bearing. With N
entities the uniform gravity is 1/N; in a sparse graph, PageRank
redistributes mass onto the few nodes with edges, mathematically squeezing
everyone else by ~20% as a regression-to-mean artifact. Without the 1.5/N
floor, every other entity would fire as a "collapse" the moment two of them
surged. The threshold scales with graph size automatically.
Two further pattern types exist in the catalogue but are dormant until their inputs are wired:
-
polymarket_trajectory, fires when graph state matches a pattern that historically resolved YES on Polymarket. Activates once the calibration model is trained (§8). -
mainstream_crossing, fires when a narrative previously confined to fringe-tier sources begins appearing in mainstream-tier sources. Borrows the source-tier classification fromhipcityreg/situation-monitor[3]. Activates once we ingest sources beyond GDELT.
7. Prose vs. number, the strict separation
This is the hard architectural rule that distinguishes this system from a generic "LLM reads news and writes commentary" tool: the LLM never produces, sees, or influences the confidence number.
The order of operations matters. When a pattern fires, the system first
computes the numeric confidence from the signal magnitude (and, for tiers
that have it, the Polymarket-trained mapping in §8).
Then the LLM is handed the pattern type, the entities, and
the structured signals — but never the resulting number — and asked to write
a sentence. Two outputs come back: a sentence (from the LLM) and a confidence
(from the formula). They land in the same claims row. The LLM
doesn't know what confidence its sentence will be paired with, and the
confidence math doesn't know what sentence will be paired with it.
The reasoning, taken seriously:
- If the LLM produced the probability, we would get confident-sounding numbers with no grounding. MIRAI [9] showed that even GPT-4o agents reach only F1 ≈ 32.6 on relation forecasting from GDELT, and degrade sharply at long horizons. Letting the model self-report would mask that ceiling.
- If the system produced the prose from raw numbers, output would be unreadable and rotelike. Each pattern type would need a hard-coded sentence template per entity-type combination. The LLM does this well; the formula does not.
- The LLM is told what kind of pattern fired and given the signal facts, not the confidence value. Its job is to write a short, neutral sentence with no numbers, no hedging, no quotation marks. Temperature is low (0.2). The prompt is hashed and stored on every claim so prose drift is auditable.
8. Polymarket calibration
Polymarket [4] is the calibration substrate, never the user-facing primitive. Users never see a market or a question. Polymarket's role is purely upstream of the confidence model: it provides ground-truth outcomes against which the system's signals can be checked.
The mechanic, in three steps:
- Polymarket has thousands of resolved binary markets ("Will X happen by Y date — YES/NO") with a known, money-backed outcome.
-
For each resolved market whose entities/topics overlap our graph, we
replay what our graph signals were saying in the days before
resolution. The
market_featurestable accumulates that time series per market: gravity, momentum, edge weight, event count. - A logistic regression learns the mapping graph state → resolution:
P(resolution = YES) = f(graph state at time t,
for the entities mentioned in the question)
That trained f is what produces the confidence number for live claims whose pattern shape is well-represented in the training data. For shapes with little or no Polymarket coverage, the confidence falls back to a function of raw signal magnitude — and the calibration tier reflects that.
Every claim carries a calibration tier badge (HIGH / MEDIUM / LOW). The tier reflects how the pattern shape behind the claim relates to Polymarket history, not whether any specific bet was validated TRUE for these entities:
-
HIGH — this pattern shape (e.g.
polymarket_trajectory) has many resolved markets backing it. The confidence number is the trained model's output and is empirically grounded. Read 70% as "70% chance of YES on similar historical markets." - MEDIUM — some Polymarket coverage exists for this shape but with limited resolutions. The number is directional rather than precisely calibrated. Read it as "strong, but treat the percentage with a wide error bar."
-
LOW — no Polymarket history for this shape (most
geo_cluster,triangular_tightening,cluster_formationclaims). The 70% is a function of raw signal magnitude only, not a calibrated probability. Read it as "magnitude of shift," not "chance of outcome." The system is being honest that it has nothing to check itself against.
So a claim like The Strait of Hormuz is experiencing a sharp escalation
in military and geopolitical activity
at 70% LOW means
"the underlying signal is strong (a clear geographic event cluster), but no
resolved market has a shape we can map this to, so we won't pretend the
number is a probability." That distinction is the antidote to the
"confident-sounding sentence" failure mode — the system's job
includes telling the user when it is guessing.
9. Three independent witnesses
Every claim's confidence integrates three sources, computed numerically:
- News graph (GDELT 15-min events + RSS feeds from think tanks and OSINT outlets) — is this entity gaining centrality, is co-occurrence tightening, are the headlines using escalation language?
- Polymarket calibration — does the resulting feature pattern resemble historical patterns that resolved YES on money-backed binary markets?
- FRED economic indicators — Fed funds rate, CPI, unemployment, oil, FX, VIX. The macro substrate against which rate-, inflation-, and energy-flavoured claims can be sanity-checked rather than left to news framing alone.
Where they agree, the signal is trusted. Where they disagree, the divergence is itself the interesting datum. A graph signal that's invisible in the macro indicators may be a real geopolitical story not yet priced in; a Polymarket move uncorroborated by news flow may be informed traders front-running a leak; an indicator shift unaccompanied by news is the market expecting something coverage hasn't caught up to. All three are surfaced rather than collapsed.
Treendly's attention-pulse layer (search-and-social rising-trend signal) is in the architecture but not yet wired in — planned as a fourth corroborating witness for the claims that depend on whether the public is paying attention, distinct from whether elites or markets are.
10. User-facing surfaces
The system has four surfaces a reader actually opens, and each one is meant to answer a different question.
The feed
The homepage is a list of AI-written sentences with a confidence percentage,
a calibration tier badge, a 7-day delta, and a 14-day inline sparkline. The
sparkline reads from the claim_history table that records each
claim's confidence on every generation pass; this is what turns "78% right
now" into "78% and rising for the past week" at a glance. Above the feed sits
a heatmap that aggregates ~500K events into ~2K geographic cells server-side,
so the density is real (every ingested event contributes) without shipping a
megabyte-scale payload.
The claim detail
Clicking any sentence opens a fuller reasoning view. At the top is a cached AI-written 2-3 paragraph summary that frames what is happening, why it matters, and what to watch next. Below it sits the structured evidence: the entities involved (each linkable to its own dossier), the numeric signals that produced the confidence, the contributing events with reconstructed-article titles when available, the calibration tier with an explanation, and a confidence-over-time chart. The summary is regenerated only when the underlying signals change (we hash the signals JSON and compare), so visiting a claim does not burn LLM credits.
The entity dossier
For any entity in the graph, the dossier shows its current gravity and momentum, its top connected entities (time-decayed edge weights), the recent events it appears in, and the active claims it features in. It is the Digg-AI-1000 [8] view applied to news, with geography on top: a markers map shows where this entity's events happen spatially.
The /review page
An admin-only QA view listing every claim generated in a recent window, with the structured signals expandable per claim. Used to spot prose drift, mismatched signals, or "confident-sounding nonsense" failures before a broader audience does. Filters by pattern type and time window.
Article-text reconstruction
Earlier in the data pipeline, each event lands with only a URL and metadata,
never a real article body. To ground LLM prose with actual headlines and
specifics, the system runs Bertaglia et al.'s gdeltnews package
[10] as a Python sidecar: every hour it groups recent
events by their 15-min GDELT publication window, downloads the matching
Web News NGrams 3.0 files, and reconstructs each article's full text by
merging overlapping n-grams (~95% similarity vs. the original). The
reconstructed text feeds the prose layer and lives in the articles
table; it is the difference between "Iran is intensifying nuclear focus" and
"Iran is intensifying nuclear focus following the Bushehr drill on March 8."
11. Limitations
The honest list of where this system is weak:
- Forecast horizons. MIRAI [9] showed LLM-agent F1 collapses from 1-day to 90-day horizons. Even with calibration, this system is most useful at 1–14 day horizons. Past that, treat output as qualitative.
- GDELT noise. CAMEO codes are coarse, and the geocoder makes name-based mistakes (a Texas locality of Jordan getting tagged as the country). The system filters this with a CAMEO/FIPS cross-check, but noise remains.
- Entity coverage is curated, not learned. Each new domain requires a seed entity set. The architecture is general but each port is real work, categories are units, not labels.
-
Source-tier dynamics aren't modelled yet. Until we ingest
beyond GDELT, the
mainstream_crossingpattern is dormant and we can't separate fringe narratives from established ones. - The LLM is an unreliable narrator. Even with low temperature and hashed prompts, prose can drift. Weekly review (/review) surfaces every generated sentence with its underlying signals so a human can spot prose that doesn't match the data.
-
Cold-start period. Pattern detectors that rely on
"recent vs. prior" windows need ~30 days of GDELT history before they
fire reliably. During the cold-start, only single-window patterns
(like
gravity_surge) can produce claims, and those are inherently more abstract.
12. References
- Leetaru, K. and Schrodt, P. A. (2013). GDELT: Global Data on Events, Location and Tone, 1979–2012. International Studies Association Annual Convention. gdeltproject.org
- Schrodt, P. A. (2012). CAMEO: Conflict and Mediation Event Observations Event and Actor Codebook. Pennsylvania State University.
- hipcityreg (2026). situation-monitor, real-time dashboard for global news, markets, and geopolitical events. github.com/hipcityreg/situation-monitor. Source-tier classification, leader catalogues, and geographic seed data are imported from this project's open configs.
- Polymarket. Polymarket Gamma API. docs.polymarket.com
- Vrandečić, D. and Krötzsch, M. (2014). Wikidata: A free collaborative knowledgebase. Communications of the ACM 57(10): 78–85. wikidata.org
- Sahu, A. et al. (2025). Talking to GDELT through Knowledge Graphs. arXiv:2503.07584. arxiv.org/abs/2503.07584
- Page, L., Brin, S., Motwani, R. and Winograd, T. (1998). The PageRank Citation Ranking: Bringing Order to the Web. Stanford InfoLab Technical Report.
- Digg AI 1000 (2026). Recursive PageRank ranking of AI accounts on X. digg.com/ai-1000. The recursive-influence framing in this system is directly inspired by Digg's approach.
- Sun, C. et al. (2024–2025). MIRAI: Evaluating LLM Agents for International Event Forecasting. NeurIPS Datasets & Benchmarks 2025, arXiv:2407.01231. arxiv.org/abs/2407.01231
-
Bertaglia, T. et al. (2026).
Free Access to World News: Reconstructing Full-Text Articles from GDELT.
Big Data and Cognitive Computing 10(2): 45.
mdpi.com/2504-2289/10/2/45.
The
gdeltnewsPython package referenced as the v1.5 ingestion path. - Federal Reserve Bank of St. Louis. FRED Economic Data API. fred.stlouisfed.org/docs/api. Free programmatic access to 800,000+ macroeconomic time-series. We pull ~10 series daily as the third independent witness alongside the news graph and Polymarket calibration.