Agentiks

The integrity & trust layer for AI training data

Prove what your model learned from.

Agentiks sits at the point of intake and judges every source, checks every sample in the embedding space, and signs a tamper-evident record of what entered your model, before it ever trains.

  • Source trust at intake
  • Per sample, in the embedding space
  • Software-only, any cluster
Intake · livewatching
Source trust
arxiv-mirror-07tier A0.91
commoncrawl-eutier B0.74
vendor-rlhf-03tier C0.58
Every sample · embedding space
sample 0xA4F9…7C2PASS
sample 0x71C0…3B8REJECT

How it works

Everything that has to happen before your data trains, in one gate.

Lineage tools just map it. Observability tools just watch it, after the fact. Agentiks does both. It also judges the source, checks every sample, and signs the record, the moment data arrives, before it can reach training.

Many sources
Web crawlsData vendorsHuman labelersRLHF feeds
Agentiks · the gate at intake
01
Map every source and transform
lineage
02
Score how much to trust each source
source trust
03
Check every sample, inline
binding verdict
04
Sign a tamper-evident record
the certificate
The output
Trusted, signed data
→ ready to train

Map · judge · check · sign. One pass, before the data lands.

Proof 01 · Source trust

A credit score for every data source.

A live trust score for every place your data comes from, earned over time and able to fade, scored across the signals that matter. So you train on sources you judged, not sources you assumed were fine.

Earned, fading reputation is proven: Spamhaus (email, since 1998) · BitSight · Sift.

Source · arxiv-mirror-07A
0/ 1.00↓ 0.04 · 30d
Trust over time · earned, decaying
commoncrawl-eutier B0.74
vendor-rlhf-03tier C0.58

Proof 02 · Every sample, at intake

Every sample gets a verdict before it ever trains.

Each sample is examined the moment it arrives and given a clear verdict, let in, hold, or reject, before it can reach training. The decision is binding: if a check can’t run, the sample stays out.

SourcesGate · provenance · embedding · qualityTo training
PASS· enters trainingQUARANTINE· held for reviewREJECT· never trains

Where bias and drift show up first

Every sample also lands somewhere on a map of meaning, its embedding. We watch that map closely: it’s where bias creeps in and a drifting source shows first, as a point sitting off on its own before any label ever looks wrong.

How we embed

Each sample, text or image, is run through a production embedding model, the encoder, on NVIDIA GPU nodes. That places it in the same representation space the model learns in, inline at intake, so the check is on meaning, not surface statistics.

Encodermultimodal embedding model
ComputeNVIDIA GPU nodes
Runsinline, at intake

Proof 03 · Tamper-evident ledger

A record nobody can quietly rewrite.

Every sample gets a unique fingerprint, and each record is locked to the one before it in an add-only log. Periodic Merkle checkpoints roll each batch into a single root, so anyone can prove a given sample is in the ledger, and unchanged, with a short proof instead of replaying the whole chain. Software-only, a PostgreSQL hash chain with optional Merkle checkpoints and S3 Object Lock, verifiable with psql and aws s3.

Merkle-tree audit logs are battle-tested: Certificate Transparency (10.9B certs, every browser padlock) · AWS CloudTrail · Guardtime (Estonia/NATO) · SEC 17a-4 · FINRA 4511.

Batch Merkle root
root 7b3e…f1
one root commits to every record below
record #4410
sha256 a17f…c0
links to a17f…
record #4411
sha256 4e1b…9f
links to 4e1b…
record #4412sealed head
sha256 9f2c…1d
Edit record #4411
Every later hash and the batch Merkle root stop matching. The tamper is provable in one check.

The Integrity Certificate

The thing you hand the auditor.

Any one proof alone proves little. All three together is the certificate: a signed bundle for every sample and batch, holding where it came from, how trusted the source was, the checks it passed, and a seal proving none of it changed afterward.

Integrity Certificate
batch · 12,408 samples
01 · SOURCE TRUST
arxiv-mirror-07tier A · score 0.91
02 · SAMPLE VERDICT
0xA4F9…7C2PASS · checked inline
03 · SIGNATURE CHAIN
sha256 9f2c…1d⛓ prev 4e1b…
Signed · tamper-evident
in-toto · SLSA · CycloneDX ML-BOM

source trust + sample verdicts + signature chain = integrity.We deliver all three.

Built for your seat

Judge. Check. Sign. All three, at the gate.

Frontier & ML platform teams

Source trust at intake, a sub-second binding verdict, and embedding-space drift caught early. A software SDK that runs on any cluster, even air-gapped.

Fraud, credit & trading ML

Adversarial sources scored from behavior alone, with dedup and drift checked on every retrain, before bad data can move the model.

Governance & model-risk

A signed Integrity Certificate per sample and batch, tamper-evident lineage, and data-plane evidence a regulator will accept.

Become a design partner and put the gate in front of your training data.

Building with design partners in frontier and regulated AI