Agent Discovery
MeshGuard's discovery module finds and tracks AI service usage across your enterprise, including unauthorized "shadow AI" that bypasses governance controls. It integrates with network proxies, SIEM systems, API gateways, and MeshGuard agents to provide a comprehensive view of which AI services are in use, who is using them, and whether they comply with your policies.
Architecture
The discovery system has four layers:
- Connectors -- Data source integrations that ingest traffic events (proxy logs, SIEM, API gateways, agent reports).
- Registry -- A database of known AI services with domain patterns, risk profiles, and compliance flags.
- Scanner -- The orchestrator that runs discovery scans, matches traffic to known services, and triggers enforcement.
- Enforcement -- Integration with the MeshGuard policy engine to block, warn, or log discovered AI usage.
Proxy Logs ─┐
SIEM ───────┤
API Gateway ┤──→ Connectors ──→ Scanner ──→ Registry Match ──→ Enforcement
Agents ─────┘ ↓
Discovery DBConnectors
MeshGuard ships with four built-in connector types:
Proxy Log Connector
Parses logs from corporate proxies (Squid, Zscaler, BlueCoat).
{
id: 'corp-proxy',
name: 'Corporate Proxy',
type: 'proxy-log',
enabled: true,
config: {
logPath: '/var/log/squid/access.log'
},
syncIntervalMinutes: 15
}API Gateway Connector
Ingests logs from API gateways (Kong, Apigee, AWS API Gateway).
{
id: 'api-gw',
name: 'API Gateway',
type: 'api',
enabled: true,
config: {
endpoint: 'https://kong-admin.internal:8001',
apiKey: 'your-api-key'
},
syncIntervalMinutes: 10
}SIEM Connector
Pulls events from SIEM systems. Supports Splunk and Elasticsearch natively.
{
id: 'siem-splunk',
name: 'Splunk SIEM',
type: 'siem',
enabled: true,
config: {
endpoint: 'https://splunk.internal:8089',
apiKey: 'your-splunk-token',
siemType: 'splunk'
},
syncIntervalMinutes: 30
}Splunk -- Searches the proxy index for traffic to known AI service domains and extracts source IP, destination host, path, and byte counts.
Elasticsearch -- Queries proxy-* indices with wildcard domain matches and returns structured traffic events.
Agent Report Connector
Receives real-time reports from MeshGuard agents about AI service access. This connector requires no external configuration -- agents report directly.
{
id: 'agent-reports',
name: 'Agent Reports',
type: 'agent',
enabled: true,
config: {},
syncIntervalMinutes: 5
}Known AI Service Registry
MeshGuard includes a built-in registry of 25+ known AI services across nine categories:
| Category | Services |
|---|---|
llm | ChatGPT, OpenAI API, Claude, Anthropic API, Google Gemini, Mistral AI, Cohere |
code-assistant | GitHub Copilot, Cursor, Tabnine, Codeium, Amazon CodeWhisperer, Sourcegraph Cody |
image-gen | DALL-E, Midjourney, Stability AI, Leonardo.AI |
voice | ElevenLabs, Murf AI, AssemblyAI |
automation | Zapier AI, Make AI, n8n AI |
analytics | Databricks AI, Snowflake Cortex |
search | Perplexity |
Each service entry includes:
- Domain patterns -- Exact and wildcard matches (e.g.,
api.openai.com,*.cloud.databricks.com) - Path patterns -- Endpoint-specific matching (e.g.,
/v1/chat/completions) - Header signatures -- API key format detection (e.g.,
sk-*,sk-ant-*) - Risk assessment -- Default risk level, data exfiltration risk, code execution risk
- Compliance flags -- Relevant standards (SOC2, HIPAA, GDPR, FedRAMP, ISO27001)
Adding Custom Services
Register additional AI services specific to your environment:
import { addCustomService } from './discovery';
addCustomService({
id: 'internal-llm',
name: 'Internal LLM',
vendor: 'YourCompany',
category: 'llm',
domains: ['llm.internal.company.com'],
pathPatterns: ['/v1/generate', '/v1/chat'],
defaultRisk: 'low',
dataExfiltrationRisk: false,
codeExecutionRisk: false,
complianceFlags: ['SOC2'],
description: 'Self-hosted internal LLM service',
});Shadow AI Detection
Shadow AI refers to unauthorized use of AI services that bypasses your governance controls. MeshGuard detects shadow AI by:
- Matching network traffic -- Connectors capture outbound traffic and the scanner matches destination hosts against the registry.
- Classifying risk -- Each discovery is assigned a risk level based on the matched service's profile.
- Tracking status -- Discoveries move through a lifecycle:
discovered>monitoring/sanctioned/blocked/retired.
Discovery Statuses
| Status | Description |
|---|---|
discovered | Newly found, not yet reviewed |
sanctioned | Reviewed and approved for use |
blocked | Explicitly blocked by policy |
monitoring | Under observation, not yet decided |
retired | Previously active, now inactive |
Risk Levels
| Risk | Description |
|---|---|
critical | Immediate action required. High data exfiltration and compliance risk. |
high | Significant risk. Code execution possible, sensitive data exposure likely. |
medium | Moderate risk. Data exfiltration possible but service has compliance certifications. |
low | Minimal risk. Limited data exposure, well-known vendor. |
unknown | Unclassified. Needs manual review. |
Running Scans
On-Demand Scan
import { runScan } from './discovery';
const result = await runScan({
connectorIds: ['siem-splunk', 'corp-proxy'],
timeRange: {
from: new Date('2026-04-21'),
to: new Date('2026-04-22'),
},
orgId: 'org_abc',
fullScan: false, // Incremental scan
});Scan result:
{
"id": "scan_abc123",
"startedAt": "2026-04-22T10:00:00.000Z",
"completedAt": "2026-04-22T10:00:45.000Z",
"status": "success",
"newDiscoveries": 3,
"updatedDiscoveries": 12,
"totalServicesFound": 8,
"criticalFindings": 0,
"highRiskFindings": 2,
"connectorResults": [
{ "connectorId": "siem-splunk", "status": "success", "recordsProcessed": 4521 },
{ "connectorId": "corp-proxy", "status": "success", "recordsProcessed": 1203 }
]
}Periodic Scans
Enable automatic periodic scanning:
import { startPeriodicScans, stopPeriodicScans } from './discovery';
// Scan every 60 minutes
startPeriodicScans(60);
// Stop periodic scanning
stopPeriodicScans();Real-Time Agent Events
Agents can report AI service access in real time without waiting for the next scan cycle:
import { ingestAgentEvent } from './discovery';
const discovery = ingestAgentEvent({
agentId: 'agent_123',
destinationHost: 'api.openai.com',
destinationPath: '/v1/chat/completions',
bytesSent: 4096,
orgId: 'org_abc',
});
if (discovery) {
console.log(`Detected: ${discovery.serviceName} (${discovery.risk} risk)`);
}Policy Enforcement
When a new AI service is discovered through an agent, MeshGuard automatically evaluates it against the agent's policy:
- Actions are formatted as
ai:{category}:{service-id}(e.g.,ai:llm:openai-api) - The policy engine returns
allow,deny, or a default - The enforcement result is recorded as an
EnforcementEventand logged in the audit trail
Enforcement actions:
| Action | When Applied |
|---|---|
block | Policy explicitly denies the AI service |
allow | Policy allows and the service is sanctioned |
warn | Policy allows but the service is not yet sanctioned |
log | Default action for unmatched policies |
TIP
Create policy rules using the ai:* action prefix to control AI service access. For example, ai:code-assistant:* matches all code assistant services.
Sanctioning and Blocking
Sanction a Service
Mark a discovered service as approved:
import { sanctionService } from './discovery';
sanctionService('disc_abc123', 'admin@company.com', 'Approved for engineering team');Block a Service
Explicitly block a discovered service:
import { blockService } from './discovery';
blockService('disc_xyz789', 'admin@company.com', 'Not compliant with HIPAA requirements');Summary Dashboard
Get a quick overview of your organization's shadow AI posture:
import { getShadowAISummary } from './discovery';
const summary = getShadowAISummary('org_abc');The summary includes:
- Discovery statistics -- Total discoveries, active/sanctioned/blocked counts, by category and risk
- Recent discoveries -- Last 10 discovered services
- Unreviewed count -- How many discoveries are still in
discoveredstatus - High-risk count -- Count of critical and high-risk discoveries that are not yet blocked
Initialization
Initialize the discovery module at application startup:
import { initDiscovery } from './discovery';
initDiscovery({
dbPath: '/var/data/meshguard/discovery.db',
enablePeriodicScans: true,
scanIntervalMinutes: 60,
});Best Practices
- Start with SIEM and proxy connectors -- These provide the broadest visibility into AI service usage across the enterprise.
- Enable agent reporting -- Real-time agent events catch AI usage that batch scans might miss.
- Review discoveries promptly -- Unreviewed discoveries are a blind spot. Assign someone to triage new findings weekly.
- Sanction approved services explicitly -- A sanctioned service will not trigger warnings in enforcement. This reduces noise.
- Create AI-specific policies -- Use
ai:*action patterns in your MeshGuard policies to control which agents can access which AI services. - Monitor high-risk services -- Set up alerts for discoveries with
criticalorhighrisk levels. - Add custom services -- Register internal AI services and industry-specific tools that are not in the built-in registry.
