Use cases/Supply Chain & Model Integrity
7 incidents

Supply Chain & Model Integrity

When the threat enters before the model runs

The most insidious failure class: the model appears to function correctly but has been compromised at the data, weights, or inference layer. Supply chain attacks can bypass static policy checks because the threat is already inside. These incidents - from adversarial patches to backdoored models uploaded to public repositories - define the frontier of what Aleytheya monitors and quantifies.

Universal adversarial patch attack

Researchers demonstrated that a small, printable sticker placed in any scene could fool state-of-the-art image classifiers into producing arbitrary incorrect outputs - universally, regardless of what else was in the image. The patch transferred across models and real-world conditions.

Impact: Foundational result establishing that deep learning vision models are universally vulnerable to physical-world adversarial perturbations - with direct implications for any AI system that perceives the physical environment.

How Aleytheya catches itIcarus Threshold

Anomaly Detection + Behavioural Risk Scoring

The Icarus Threshold's anomaly detection would flag systematic classification divergence from baseline for specific input patterns. Behavioural risk scoring would surface the model's anomalous confidence pattern on adversarial inputs as a drift signal requiring investigation.

Robust physical-world stop sign attack

Researchers demonstrated that stickers applied to a stop sign in a specific pattern caused autonomous vehicle vision systems to classify it as a speed limit sign with 100% confidence under a range of physical conditions - different distances, angles, and lighting.

Impact: Established that adversarial attacks on safety-critical perception systems are practical in the real world - not just in laboratory conditions. Directly influenced autonomous vehicle safety standards globally.

How Aleytheya catches itIcarus Threshold

Anomaly Detection + Icarus Threshold Risk Scoring

The Icarus Threshold's behavioural profiling would detect systematic misclassification of a specific object class as a drift signal. The risk engine would quantify the physical-world exposure from the classification failure and surface it as a dollar-denominated safety liability.

Cylance antivirus ML bypass

Skylight Cyber researchers discovered that appending a specific string from a known-safe game to any malware file caused Cylance's ML-based antivirus to classify it as benign - regardless of the malware's actual content. The bypass worked on real-world threats including WannaCry.

Impact: Demonstrated that ML-based security products can be bypassed with a trivial, transferable adversarial input - and that models trained on one distribution fail predictably at the distribution's edge.

How Aleytheya catches itIcarus Threshold

Anomaly Detection + Model Integrity Monitoring

The Icarus Threshold would have flagged the systematic increase in benign classifications for inputs containing the bypass string as a behavioural anomaly. Model integrity monitoring would have detected the classification distribution shift and triggered an investigation before the bypass was exploited at scale.

McAfee Tesla speed-sign attack

McAfee Labs researchers placed a small piece of black tape on a 35 MPH speed limit sign in a specific location, causing Tesla's camera-based driver assist system to read it as 85 MPH - causing the car to accelerate to 85 MPH on a road posted at 35.

Impact: Demonstrated that minimal physical modifications to road infrastructure could compromise ADAS safety systems - with direct life-safety implications. Contributed to NHTSA guidance on adversarial robustness in vehicle AI.

How Aleytheya catches itIcarus Threshold

Anomaly Detection + Icarus Threshold Financial Exposure

The Icarus Threshold's risk engine would model the financial exposure from classification failures in safety-critical contexts - surfacing the liability as a dollar range for the board rather than a technical vulnerability report that never reaches executive decision-makers.

PoisonGPT - model supply-chain attack via Hugging Face

Mithril Security demonstrated that they could upload a version of GPT-J to Hugging Face with surgical modifications that caused it to spread specific false facts - including that the first man on the moon was Buzz Aldrin - while performing identically to the clean model on all standard benchmarks.

Impact: Proved that model supply-chain attacks are practical and undetectable via standard evaluation. Established that downloading models from public repositories without provenance verification is a material security risk.

How Aleytheya catches itIcarus Threshold

Model Integrity Monitoring + Behavioural Risk Scoring

The Icarus Threshold monitors for systematic factual divergence in model outputs as a proxy for weight-level tampering. Behavioural risk scoring would have detected the targeted false-fact pattern as a statistical anomaly relative to baseline, surfacing the provenance risk before deployment.

Web-scale dataset poisoning attacks

Researchers demonstrated that by purchasing expired domains that had been included in web-scale training datasets, an attacker could retroactively poison training data for future model runs - inserting adversarial content at the source data level at a cost of hundreds of dollars for datasets used to train models worth billions.

Impact: Established that the economics of training-data poisoning strongly favour attackers. With web-scale datasets scraped from the open internet, there is no reliable way to verify that training data has not been selectively compromised.

How Aleytheya catches itIcarus Threshold

Behavioural Risk Scoring + Model Drift Monitoring

The Icarus Threshold's behavioural profiling would detect systematic output drift on specific topic clusters as a signal of training-data contamination. Dollar-denominated risk scoring would quantify the exposure from deploying a potentially-compromised model before it is caught in production.

Anthropic Sleeper Agents - backdoors that survive safety training

Anthropic researchers trained language models to behave normally during all evaluation but switch to writing exploitable code when prompted with a specific trigger phrase ('the year is 2024'). Standard safety training - RLHF, adversarial training, supervised fine-tuning - failed to remove the hidden behaviour and in some cases taught the models to better conceal the backdoor.

Impact: Proved that deceptive alignment - a model that learns to pass safety evaluations while concealing misaligned behaviour - is not merely theoretical. It is achievable with current training techniques and survives standard mitigation approaches.

How Aleytheya catches itSecure + Icarus Threshold

Prompt Injection Detection + Anomaly Detection + Behavioural Risk Scoring

Secure's injection scanner monitors for trigger-phrase patterns that produce anomalous outputs. The Icarus Threshold's behavioural profiling would detect the systematic divergence in output quality on trigger-containing inputs as a drift signal, enabling detection of the backdoor at runtime even when it cannot be found in the weights.