Detecting and Attributing Submarine-Cable Latency Anomalies with Two-Signal Cross-Validation
Methodology
Detecting and Attributing Submarine-Cable Latency Anomalies with Two-Signal Cross-Validation
Evgeny Korolev - GeoCables · Technical note, June 2026 · v1.1
The key contribution here is not the detector itself, but a two-signal cross-validation layer that grades each alert against two independent physical signals: a change in the autonomous-system path (AS-path) at the time of the event, and whether other probes whose route physically traverses the same cable corridor observe the same degradation.
Both signals reflect fundamentally different physical processes and are largely independent of one another.
Their combined analysis separates genuine topology-level cable incidents from routing changes and from single-vantage noise. The numbers below are deliberately undramatic: the observation window contained no large-scale cable break - and the method correctly declines to manufacture one.
1. Data Foundation
| Asset | Scale |
|---|---|
| Submarine cables (with landing-point geometry) | 703 |
| Landing points (geo-coded) | 1,932 |
| Backbone / cable segments | 26,053 |
| Own probes (Minsk, Almaty, Tbilisi, Jerusalem) + | 12 + RIPE |
| Completed health checks (since 2026-03-01) | 168,699 |
| Cables under active measurement | 691 |
The method runs on a curated topology graph and a continuously accumulating measurement archive. Raw per-measurement RTT is preserved in append-only fashion (append-only - the immutability principle: new records are added, old ones are never deleted or modified - like a ship's log that guarantees the integrity of the historical record), so any future detector can be re-run over the full history.
This property cannot be reconstructed retroactively from public cable maps alone.
2. Detection (Baseline + Haversine Attribution)
Each check pings (and traceroutes) a target near a cable landing point.
A measurement is treated as a candidate anomaly when its RTT rises materially above that route's adaptive baseline (the baseline is the statistically “normal” latency computed from measurement history - like a body’s normal temperature: a deviation is flagged only when it is genuinely significant).
Candidates pass through a staged filtering funnel before any alert is raised - the first layer of false-positive suppression:
| Stage | Meaning | Count |
|---|---|---|
spike |
single measurement above the baseline | 624 |
anomaly_confirmed |
spike confirmed by recurrence or corroborating data | 189 |
alert |
promoted to a tracked incident | 114 |
Only ~18% of raw spikes (114 of 624) reach alert status.
Cable attribution is geometric: the latency jump is associated with the nearest cable segment by haversine distance (the haversine formula - a classical navigation method for computing the shortest distance between two points on the surface of a sphere; used by 19th-century navigators and today a staple of GPS and geospatial analysis) between the suspected hop and the candidate cables' landing points.
3. Two-Signal Cross-Validation
3.1 Signal A - AS-Path Reroute
When a submarine cable degrades, traffic frequently reroutes around it, changing the autonomous-system path.
An Autonomous System (AS) is an independently managed network or group of networks under a single administrative authority - such as a large ISP or cloud operator.
The entire internet comprises roughly 80,000 such systems, which coordinate routes via BGP - the “diplomatic language” spoken between networks.
For each alert we compare the AS-path before the event (the modal path for that probe→target pair) with the path at the event.
| Verdict | Meaning | Count |
|---|---|---|
route_change_break |
AS-path changed, with a large further latency rise on the new path | 4 |
route_change |
AS-path changed | 5 |
same_path |
RTT rose, path unchanged - consistent with congestion | 57 |
no routing history |
insufficient routing history | 48 |
During algorithm calibration, of the 66 alerts with sufficient routing history, 9 (13.6%) were independently corroborated by a measured AS-path change.
A naïve fingerprint over the raw IP-path is far too noisy: ECMP load-balancing (ECMP - Equal-Cost Multi-Path routing: when multiple equally valid paths exist to a destination, traffic is distributed across them randomly - like cars spreading across highway lanes) and intermittent timeouts produce ~18 distinct IP-paths per probe→target pair.
The signal only becomes stable on the AS-set fingerprint (~1.5 distinct per pair), which is what we use.
3.2 Signal B - Segment-Aware Multi-Probe Consensus
If only the detecting probe sees a degradation, the key question is whether the other probes are silent witnesses or simply not on the affected cable. A probe routing around the cable is not a witness - its silence proves nothing.
This is the classical logical trap: absentia probationis non est probatio absentiae - absence of evidence is not evidence of absence.
We therefore count a probe as an eligible witness only when its actual AS-path geo-traverses the same cable corridor as the alerting probe's path.
This geo-corridor check demoted 30% of the naïvely assumed same-cable “witnesses” as off-corridor.
| Verdict | Meaning | Count |
|---|---|---|
widespread |
majority of corridor witnesses also degraded - real cable event | 1 |
mixed |
some corridor witnesses degraded | 4 |
routing_event_non_cable |
alerting probe rerouted, corridor witnesses healthy → BGP/peering event, not a cable cut | 7 |
probe_specific_likely_fp |
corridor witnesses healthy, no reroute → local artifact | 44 |
narrow_path_event_possible |
single-probe cable - cannot rule out a narrow event | 1 |
insufficient_witness_context |
no concurrent corridor witness - status unknown | 57 |
3.3 Confidence Ladder
Both signals are largely independent: an AS-path change reflects a topological event in the network - a physical disruption in the routing path - whereas multi-probe consensus reflects the geographic breadth of degradation - how widely the problem has spread. These are fundamentally different physical processes that need not co-occur. Very few alerts satisfy both criteria simultaneously. Combining them:
| Tier | Definition | Count (of 114) |
|---|---|---|
| Dual-confirmed | AS-path change and multi-probe corroboration simultaneously | 0 |
| Single-signal | corroborated by exactly one of the two signals | 14 |
| Defensible false positive | corridor witnesses healthy and no reroute | 44 |
| Unclassified (coverage) | no witness present on the corridor - a probe-coverage limit, not method uncertainty | 56 |
The dual-confirmed count is zero, and that is the correct result: the March–June 2026 window contained no large-scale submarine-cable break.
A genuine major cut would light both signals simultaneously - just as an earthquake registers on multiple independent seismographs at once.
The value of the method lies in its ability to discriminate: confidently labelling 44 alerts (38%) as defensible false positives and isolating 7 events as routing changes that are not cable faults - rather than producing a dramatic body count on a quiet month.
4. False-Positive Analysis (Honest)
- The funnel already discards 82% of raw spikes before alerting (624 → 114).
- Of alerts, 44/114 (38%) are defensible false positives - both independent signals are negative. These are now suppressed from user notifications while remaining visible in the dashboard.
- A naïve “single-probe ⇒ false positive” rule would have over-counted false positives by ~45%: the geo-corridor refinement reclassifies many of them as unknown rather than confirmed-FP, because the silence of a witness that never used the cable in the first place is not evidence of anything.
- 56/114 alerts remain unclassified due to the structural absence of a witness on the corridor - a probe-coverage limitation, not method uncertainty. The classifier is well-defined for these alerts; the data to evaluate them simply does not yet exist.
5. Limitations
- Probe geography. A small fixed fleet plus (- the world's largest distributed internet measurement network, with 12,000+ independent probes across 180+ countries, operated by RIPE NCC) means many cables have no second vantage on the relevant corridor; for those, the consensus signal is structurally unavailable. Expanding the probe fleet directly removes the structural cause of most unclassified cases.
- Geo-resolution. Corridor matching is at country granularity from AS-path geolocation; some networks mis-geolocate, which we treat as unknown rather than mismatch.
- Attribution is geometric. Cable assignment is by haversine proximity; the corridor cross-check confirms it where path data exists (64 of 66 cases confirmed), but it is a heuristic, not a claim of ground truth.
- Quiet observation window. The reported period saw no major cable break. The method is validated by its discrimination, not by detecting a disaster - much like a smoke detector whose reliability is tested with test smoke rather than an actual fire.
6. Reproducibility & Data
Detection uses an adaptive threshold calibrated to each route's baseline distribution rather than a fixed multiplier; the consensus tiers reflect the share of eligible corridor witnesses that corroborate the degradation.
The exact parameterisation is intentionally omitted here and is available in a forthcoming complete publication or on request.
Raw per-measurement RTT and AS-paths are retained, so the classifier can be re-run end-to-end over the full archive under any revised method. Source signals: (ping + traceroute), own probes, and the curated cable/landing-point graph.
Method and figures may be cited with attribution. Live monitoring: Cable Health Monitor.
Figures are from the production system as of 2026-06-17 and will evolve as the archive deepens.