NotionAlpha is now an OSS AI Lab
Enterprise agentic AI is evolving through open-source components, yet a unified, production-ready architecture is missing. NotionAlpha aims to bridge this gap for organizations.
Enterprise agentic AI is emerging as a collection of open-source components that are not yet integrated into a cohesive solution. The runtime isolating an agent from the host is provided by one vendor, while the assurance suite for prompt-injection and tool-misuse comes from another. Other components, such as the trajectory store, identity model, policy DSL, and orchestration layer, are each open-sourced in separate repositories, following different roadmaps and definitions of key concepts like "policy" or "session." Each is powerful individually, but not yet production-ready as a unified system. Collaboration with enterprise leaders has revealed a consistent need: organizations seek an agentic stack they can own, audit, and operate without vendor lock-in. While foundational components are emerging, the architecture to integrate them into a deployable system is lacking. NotionAlpha aims to address this gap.
The Pivot
Over the past year, NotionAlpha has operated as an AI transformation advisory platform, conducting readiness assessments and guiding enterprise adoption. This ongoing work has shown that the primary challenge for enterprises is not whether to adopt AI, but rather what foundation to build on and how to maintain flexibility. This is fundamentally an architectural issue, not a strategic one. We are transitioning NotionAlpha into an open-source AI Lab: a neutral platform for the architecture that supports enterprise agentic AI. The lab is open by default, permissively licensed, and free from lock-in. The lab focuses on three core activities:
- Contributing to the open-source projects that underpin enterprise agentic AI.
- Publishing a capability-first reference architecture that transforms these projects into deployable systems.
- Maintaining an evaluation methodology in which recommendations for each layer are dated, evidence-based, and replaceable, rather than opinion-driven.
The first tangible result of these efforts is available today.
The reference architecture
The architecture consists of three layers and seven capabilities. Each capability is defined by its required function in a production agent-native system, while the implementation fulfilling it is a separate, dated, and replaceable selection.
- Capabilities are permanent, while implementations are interchangeable. Each project recommendation for a layer is dated and includes a re-evaluation trigger. If a superior implementation emerges in the future, only the recommendation changes, not the architecture itself.
- Governance is integrated centrally, not added as an afterthought. Runtime isolation and assurance are managed in the control plane, above the agent. Observation is enforced by the platform by default, rather than being optional for the agent.
- The trajectory is treated as a first-class object. Every action taken by an agent is recorded as a durable, replayable, and inspectable artifact. Logs and traces alone are insufficient; auditors rely on the trajectory for a comprehensive review.
The seam artifact: Gauntlet v0.1.0
Two recommended open-source projects are Microsoft RAMPART (Assurance, Evaluation & Forensics) and NVIDIA OpenShell (Runtime Isolation & Governance). While both are prominent projects, they are not designed to integrate with each other, as the integration point lies between them and is not addressed by either vendor.
Gauntlet provides this integration with a single command:
pip install notionalpha-gauntlet
gauntlet run \
--agent-image my-agent:latest \
--policy policy.yaml
Gauntlet initiates an OpenShell sandbox with a deny-by-default policy, launches the agent image within the sandbox, exposes its HTTP endpoint to the host via the gateway, and directs RAMPART's pytest-native assurance suite to it. The output is a two-part report: one section details the sandbox guarantees, and the other presents RAMPART's verdict, allowing you to assess safety results against the isolation contract.
The primary demonstration runs RAMPART against the included Qwen 3 agent. The agent is designed to fail the test_send_email_xpia_resistance cross-prompt-injection probe, illustrating that the harness accurately detects genuine failures rather than only manufactured successes.
Gauntlet is licensed under Apache-2.0. It is intentionally minimal, serving as the simplest tool to demonstrate that the architecture is functional code rather than a conceptual diagram. Gauntlet runs locally and in CI within your environment, without reliance on any hosted service.
- PyPI: pypi.org/project/notionalpha-gauntlet
- Repository: github.com/NotionAlpha/gauntlet
- Architecture site: notionalpha.com#gauntlet
Contribution, not declaration
This aspect is central to the lab's approach.
Developing Gauntlet with the actual upstream SDKs revealed nine necessary workarounds for gaps and packaging issues in the OpenShell Python client. While each workaround was minor, collectively they addressed challenges that every downstream user would otherwise need to solve independently, such as auto-discovery helpers for gateway mTLS material, import indirection for proto stubs, daemon-thread wrappers for long-running processes, and custom host:port parsers lacking IPv6 support.
So we patched the SDK.
Eight commits are now live on a gauntlet-bindings branch of NotionAlpha/OpenShell, tagged v0.0.47-gauntlet-2:
- Top-level proto aliases
- Linux wheel includes the generated _pb2.* stubs
- A public Sandbox.expose_http(port) convenience
- A non-blocking Sandbox.exec_detached(command) with an ExecHandle for error capture.
- An openshell.policy_from_network_allow(destinations) builder that handles URL, host: port, bare-hostname, and bracketed-IPv6 forms
- An openshell.http_client_for_sandbox(target) helper that returns a requests.Session pre-configured with the active gateway's mTLS material.
- A clearer SandboxError with a remediation hint when no active gateway is configured
- Sandbox(start_command=, start_env=) convenience kwargs that auto-launch a detached process after wait_ready
Each commit represents a single-purpose change prepared for upstream submission. These changes are not yet included in the upstream NVIDIA repository, as OpenShell's contributor program requires endorsement from an existing maintainer. Currently, Gauntlet relies on the fork; once the upstream pull requests are accepted, it will transition accordingly.
The lab's approach is to earn the architecture through contribution, not merely declare it. A reference architecture without supporting contributions is only a manifesto, while one published alongside merged upstream code is a tangible asset. We are currently between these two states: the eight commits exist, but upstream merges are pending. We disclose this status openly, as transparency is essential.
The methodology, briefly
Each implementation recommendation is evaluated using a consistent rubric: six criteria, each scored from 0 to 3, for a total of 18 points, followed by layer-specific assessments. Scores are dated, interpretation bands are documented, and re-evaluation triggers are specified in advance.
A few sample scores from the current recommendation table:
- Identity & Delegation → SPIRE (Apache-2.0): 17 / 18 — strong candidate.
- Tools & Effectors → MCP Python SDK (MIT): 17 / 18 — strong candidate.
- Runtime Isolation & Governance → OpenShell (Apache-2.0): 12 / 18 — viable with caveats (single-vendor risk; addressed by fork + drafted upstream PRs).
- Assurance, Evaluation & Forensics → RAMPART (MIT): 13 / 18 — viable with caveats.
Most of the lab's contribution efforts focus on projects rated as "viable with caveats." These are not projects that fail the rubric, but those that are recommended and require further improvement to achieve higher scores.
If you build, evaluate, or purchase enterprise agentic AI, we encourage you to use the artifact, test it with your own agent image, and share your feedback. Issues, pull requests, contradictions, and disagreements are all valuable. The most effective way to improve the architecture is to identify its current limitations.
- pip install notionalpha-gauntlet
- github.com/NotionAlpha/gauntlet
- notionalpha.com
— Murali