Technical Architecture

How Protecto Works

Intelligence and control for the AI context layer. A layered architecture that classifies, tokenizes, and governs sensitive data as it flows through enterprise AI systems.

01 / Introduction

The Problem:
Data Moves Differently in AI

In traditional systems, data lived in structured environments: databases, applications, APIs. Security controls were designed around those boundaries. Access was applied to tables, files, or services.

AI systems operate differently.

Modern AI workflows assemble context dynamically. A single inference request may combine information from structured databases, PDFs, knowledge bases, SaaS systems, and external tools before sending that context to a model for reasoning.

Once data becomes part of that context, the boundaries that traditional security relied on disappear. Sensitive information can appear anywhere inside the prompt, the retrieved documents, or the generated response.

Protecto was built to operate at this moment. It sits between enterprise data and AI reasoning systems, providing intelligence and control over how sensitive information flows through AI applications.

02 / Architecture

The Protecto Architecture

Protecto is designed as a layered platform that analyzes data, understands context, and applies policy during AI workflows. Each layer contributes a specific capability that allows AI systems to safely operate on enterprise information.

Enterprise Users

Employees Partners Applications AI Agents

Agentic AI Systems

LLMs Agents RAG Workflows

Protecto Platform

AI Context Control Layer

Agentic Data Classification

Context Intelligence

CBAC Decision Engine

Identity & Access Integration

Secure Tokenization

Vault Infrastructure

Secure RAG (GPTGuard)

Enterprise Platform Services

Data Sources

Databases Documents Files SaaS APIs Vector DBs MCPs

2.1 / Classification

Agentic Data Classification

Enterprise data rarely exists in clean, structured formats. Sensitive information appears across documents, PDFs, emails, scanned records, and conversational text.

Protecto uses AI-driven classification to detect sensitive entities across both structured and unstructured content. Instead of relying on pattern matching, the classification engine analyzes the semantic meaning of data to accurately identify sensitive information.

The system identifies categories including:

This matters because sensitive meaning is context-dependent. Protecto’s classification engine handles the hard cases: sensitive meaning spread across sentences, expressed in mixed languages, and embedded in inconsistent formatting or typos.

The result is a persistent, structured understanding of where sensitive data exists within enterprise content.

Why this is hard

Sensitive meaning rarely appears as a single field, and cleanly formatted. User prompts are often expressed in mixed languages, inconsistent formatting or typos, making simple pattern detection or NLP based approaches insufficient.

2.2 / Tokenization

Context Preserving Tokenization

Sensitive values must often be protected without removing them entirely from the data. This is where traditional approaches break down.

Conventional masking replaces values with random characters or generic placeholders. This works for storage-level protection, but it destroys the semantic structure that language models depend on. Masked data produces degraded reasoning, broken retrieval, and incoherent responses.

Protecto uses semantic tokenization: sensitive values are replaced with structured tokens that preserve the type, position, and referential relationships of the original data.

tokenization-example.txt

// Original text
John Smith used a credit card ending in 4321. John has requested refund. // After semantic tokenization
<PERSON>GHJ6M7 HWE12K</PERSON> used a credit card ending in <CARD>Q19K02</CARD>. <PERSON>GHJ6M7</PERSON> has requested refund. // Referential integrity is maintained across the full document.
// Every reference to the same entity maps to the same token.

The AI system continues to reason over the document. It understands that same person who used credit card is requesting refund. Summarization, extraction, and question-answering continue to work. But the underlying sensitive values are never exposed to the model.

Tokenization services operate at high throughput and maintain referential integrity across documents and datasets.

Why this is hard

Traditional masking breaks AI reasoning because entity relationships disappear. Tokenization must preserve semantic structure and referential integrity across entire documents while still protecting the underlying sensitive values. Getting this wrong degrades reasoning quality and model output.

2.3 / Vault

Vault Infrastructure

Tokenized values are mapped to original sensitive data within a secure vault. This creates a strict architectural separation between the AI processing layer and the sensitive data storage layer.

Encrypted Storage

Sensitive values are encrypted at rest with enterprise-grade key management.

Complete Audit Logs

Every token access event is logged for compliance, forensics, and traceability.

Processing Isolation

AI systems operate on tokenized data. Sensitive values never enter the inference path.

Policy-Controlled Restoration

Token-to-value mapping is governed by access policies. Only authorized workflows can restore original values.

This architecture means that even if a model, prompt, or response is logged, intercepted, or cached, the underlying regulated data is not present in those artifacts.

Why this is hard

Tokens must be reversible for authorized workflows while remaining cryptographically secure and fully isolated from AI processing pipelines. The vault must support strict 'acid' properties while handling large volume data.

2.4 / Access Control

Access Control and Identity Integration

Protecto integrates with enterprise identity systems including Active Directory and modern IAM platforms. These integrations allow the system to understand who is interacting with an AI system and what permissions they hold.

Traditional Role-Based Access Control (RBAC) governs access to systems and datasets. A user either has access to a table or they don’t.

But AI systems create a new problem. A model that has access to a knowledge base can generate responses that combine information from multiple sources. The output might contain data elements that the requesting user should not see, even though the model itself was authorized to access the source material.

Protecto introduces an additional layer of control that operates on the information contained inside AI-generated responses, not just on the systems that produced them.

Why this is hard

AI systems can generate responses that combine multiple data sources, meaning access control must operate not only on datasets but also on the information contained inside generated responses. Traditional RBAC was never designed for this.

2.5 / Context Intelligence

Understanding the Data and Its Context

Identifying sensitive data is necessary but not sufficient. The same data element may be appropriate to surface in one context and restricted in another. Enterprise confidential information such as discount details and intellectual property are spread over many parts of the document

The Context Intelligence layer evaluates multiple signals during each AI interaction:

These signals are combined to determine the exposure risk associated with each request. A request to summarize a document may be appropriate for one user and restricted for another.

Rather than relying on static access rules, Protecto evaluates each interaction dynamically, allowing AI systems to operate with the correct level of visibility per user.

Why this is hard

The correct exposure decision depends on multiple signals at once: the prompt intent, the data sources used, the identity of the user, and enterprise policy. All of these must be evaluated dynamically during each AI interaction, not preconfigured in a static rule table.

2.6 / CBAC Context-Based Access Control

Apply Access Control Based on Context

Protecto evaluates each AI interaction through a policy decision engine called the CBAC agent. This is the runtime enforcement layer: where classification, context, identity, and policy converge into an access decision.

The CBAC agent analyzes several factors simultaneously:

Based on this analysis, Protecto determines what information should be visible to the requesting user. Two users asking the same question may receive responses with different levels of detail depending on their role and the sensitivity of the data involved.

This decision process occurs in real time during AI inference. There is no batch review, no manual approval queue. The CBAC agent evaluates and enforces policy inline, allowing enterprises to deploy AI assistants and agents without exposing restricted information.

Why this is hard

Security decisions must be made in real time during AI inference while simultaneously evaluating identity, intent, data sensitivity, and policy rules. The controls should be intelligent to adapt to new context, data, and requests.

2.7 / Secure RAG

Secure Retrieval-Augmented Generation

Many enterprise AI applications rely on retrieval-augmented generation (RAG) to answer questions using internal knowledge bases. This is where risk concentrates: the retrieval step pulls documents from storage, and the generation step surfaces their contents to end users.

Protecto includes GPTGuard, a secure RAG layer that protects sensitive information across this entire pipeline.

Document Evaluation

Documents retrieved from vector databases are evaluated for sensitive content before they enter the generation context.

Semantic Preservation

Tokenized data maintains semantic meaning so model reasoning quality is preserved even on protected content.

Response Filtering

Generated responses are filtered according to user permissions before delivery, ensuring restricted data is never surfaced.

This allows organizations to build AI assistants that operate on internal knowledge without exposing confidential information embedded inside documents.

Why this is hard

Retrieval systems may pull documents containing mixed-sensitivity information. The platform must selectively protect sensitive segments within those documents without degrading retrieval accuracy or model reasoning quality.

2.8 / Platform

Enterprise Platform Infrastructure

Protecto is designed to operate at enterprise scale. The platform provides services that support high-volume AI workloads while maintaining strict security guarantees.

Multi-Tenant Architecture

Strict data isolation between tenants with independent policy configurations, encryption keys, and audit trails.

High-Performance APIs

Real-time inference APIs designed for low-latency inline processing during AI workflows.

Async Processing Pipelines

Batch document classification and tokenization for large-scale data preparation across enterprise content stores.

Compliance & Audit

Comprehensive logging for every classification, tokenization, and policy decision for regulatory traceability.

Deployment environments include:

Private VPC

On-Premises

Air-Gapped Networks

Multi-Cloud

Hybrid Architectures

The platform processes millions of documents and AI interactions per day while maintaining consistent security guarantees across all deployment models.

Why this is hard

The system must enforce classification, tokenization, and policy decisions across millions of documents and real-time AI interactions simultaneously, while maintaining low latency, strict tenant isolation, and consistent security guarantees across every deployment model.

Enterprise Users

Employees Partners Applications AI Agents

Agentic AI Systems

LLMs Agents RAG Workflows

Protecto Platform

AI Context Control Layer

Agentic Data Classification

Context Intelligence

CBAC Decision Engine

Identity & Access Integration

Secure Tokenization

Vault Infrastructure

Secure RAG (GPTGuard)

Enterprise Platform Services

Data Sources

Databases Documents Files SaaS APIs Vector DBs MCPs

Summary

Enabling Safe Enterprise AI

AI systems are becoming the primary interface to enterprise knowledge. As models gain access to more context, the risk of exposing sensitive information increases proportionally.

Protecto introduces intelligence and control at the point where enterprise data becomes AI context.

By combining classification, contextual understanding, tokenization, and policy enforcement into a unified runtime architecture, the platform enables organizations to deploy AI systems that reason over enterprise information while keeping sensitive data protected.

AI operates with the context it needs. Enterprises maintain the safeguards required to operate responsibly.

Technical Architecture

How Protecto Works

01 / Introduction

The Problem: Data Moves Differently in AI

02 / Architecture

The Protecto Architecture

2.1 / Classification

Agentic Data Classification

Why this is hard

2.2 / Tokenization

Context Preserving Tokenization

Why this is hard

2.3 / Vault

Vault Infrastructure

Encrypted Storage

Complete Audit Logs

Processing Isolation

Policy-Controlled Restoration

Why this is hard

2.4 / Access Control

Access Control and Identity Integration

Why this is hard

2.5 / Context Intelligence

Understanding the Data and Its Context

Why this is hard

2.6 / CBAC Context-Based Access Control

Apply Access Control Based on Context

Input

Analysis

Evaluation

Enforcement

Why this is hard

2.7 / Secure RAG

Secure Retrieval-Augmented Generation

Document Evaluation

Semantic Preservation

Response Filtering

Why this is hard

2.8 / Platform

Enterprise Platform Infrastructure

Multi-Tenant Architecture

High-Performance APIs

Async Processing Pipelines

Compliance & Audit

Why this is hard

Summary

Enabling Safe Enterprise AI

The Problem:
Data Moves Differently in AI