Platform Improvement Data Policy

Preamble: Why This Policy Exists

Adjudica exists to make attorneys better at their jobs. We believe that an AI platform that does not learn and improve is a platform that quietly becomes less useful over time — and less useful to attorneys means less effective representation for injured workers.

We also operate in one of the most sensitive data environments that exists: healthcare and legal information combined. The data that flows through Adjudica — medical records, QME reports, depositions, wage histories — is among the most legally protected data in the country.

These two facts create a responsibility to draw a precise line: we will collect every signal we can use to make Adjudica better, and we will never allow a single piece of Protected Health Information or attorney-client privileged content to cross into the improvement pipeline.

This policy defines exactly where that line falls and how it is enforced. We publish it because we are not hiding the ball. Improving Adjudica serves attorneys. Serving attorneys serves their clients. That is why we collect improvement data, and that is a purpose we are willing to defend publicly.

1. Scope

This Policy governs the collection, use, and protection of Platform Improvement Data — behavioral signals, feedback, telemetry, and observability data generated by user interactions with Adjudica.AI.

This Policy applies to:

All Glass Box Solutions employees, contractors, and data processors who handle Platform Improvement Data
All third-party observability and analytics tools used by Glass Box Solutions
All Adjudica.AI product features and services

This Policy supplements and should be read alongside:

Data Handling Policy — governs PHI and case content
AI Governance Policy — governs responsible AI development
Privacy Notice — governs what is disclosed to users
Business Associate Agreements — govern PHI obligations with AI providers

2. Foundational Principle: The PHI Firewall

All Platform Improvement Data collection is governed by a single non-negotiable rule:

PHI and attorney-client privileged content never leave the secure, HIPAA-compliant data perimeter for improvement purposes.

This means:

No document content
No OCR-extracted text
No AI queries or responses that contain PHI
No case identifiers, client names, dates of injury, diagnoses, treatment information, or any other PHI as defined by HIPAA

What travels beyond the secure perimeter for improvement purposes is behavioral metadata — signals derived from user interactions, stripped of all PHI before transmission. The pattern of how a user interacted. Never the content of what they were working on.

3. Distinguishing "Customer Data" from "Platform Improvement Signals"

Existing Glass Box policies state: "We do not use your data to train our AI models."

That commitment stands. "Your data" means the documents you upload, the case information you enter, the PHI processed on your behalf. None of that is used for model training or improvement without explicit written consent.

Platform Improvement Signals are a distinct category. They are de-identified behavioral metadata generated by your use of the platform — not the content of your work. A thumbs-down on a Matter Chat response tells us the response was wrong; it does not tell us what the question was or what the response said. A document classification correction tells us the AI guessed incorrectly; it does not tell us what was in the document.

	Customer Data (PHI/Content)	Platform Improvement Signals
Examples	Medical records, QME reports, case queries, AI responses	Correction pairs, quality ratings, latency, error codes
Contains PHI	Yes — protected by HIPAA and BAA	No — PHI-stripped before collection
Used for service delivery	Yes	No
Used for improvement	No	Yes
Leaves secure perimeter	Never	Only after PHI stripping
Stored with case data	Yes	No — separate, isolated store

4. What We Collect for Platform Improvement

The following categories of Platform Improvement Signals are permitted and are actively collected:

4.1 Document Classification Feedback

When a user corrects an AI-generated document classification, we collect:

Signal	Example	Contains PHI
Original AI prediction	`"MEDICAL_REPORT"`	No
User correction	`"PANEL_QME_REPORT"`	No
AI confidence score	`0.73`	No
Document metadata	Page count, file type, file size	No
Prediction model version	`"doc-classifier-v4.2"`	No
Session timestamp	`2026-02-18T14:32:00Z`	No

What we do NOT collect: Document content, OCR text, file name (which may contain client name), case number, or any identifying information from the document.

Purpose: To retrain and improve document classification accuracy across document types common in California Workers' Compensation practice (QME reports, Panel QME reports, medical records, depositions, wage histories, legal filings, etc.).

4.2 AI Response Quality Feedback

When a user provides explicit feedback on an AI-generated output (Matter Chat responses, document summaries, drafted content, form suggestions), we collect:

Signal	Example	Contains PHI
Feedback type	`thumbs_up`, `thumbs_down`, `correction_submitted`	No
Feature context	`"matter_chat"`, `"document_summary"`, `"form_fill"`	No
Feedback category (if provided)	`"citation_error"`, `"incomplete"`, `"hallucination"`, `"accurate"`	No
Prompt template ID	`"matter-chat-v3"`	No
Response latency	`2,340ms`	No
Token count	`1,847 tokens`	No
AI provider used	`"google"`	No
Session timestamp	`2026-02-18T14:33:00Z`	No

What we do NOT collect: The actual question posed, the actual AI response, any matter content, any case references, or any text that may contain PHI.

Purpose: To improve AI response quality, identify prompt templates that underperform, detect hallucination patterns, and improve the accuracy of legal analysis outputs over time.

4.3 Form Auto-Fill Correction Signals

When a user corrects an AI-suggested value in a form field, we collect:

Signal	Example	Contains PHI
Form type	`"DWC-AD 10133.36"`	No
Field identifier	`"disability_rating_percentage"`	No
Correction type	`"value_changed"`, `"field_cleared"`, `"value_accepted"`	No
AI confidence score	`0.84`	No
Whether source citation was provided	`true`	No

What we do NOT collect: The original AI-suggested value (which may be derived from PHI), the user-entered value (which IS PHI), the patient name, or any case-specific data.

Purpose: To improve form-filling accuracy, identify which fields have high correction rates, and prioritize engineering effort on high-error-rate form sections.

4.4 Feature Usage Telemetry

We collect aggregate usage patterns to understand how the platform is being used:

Signal	Example	Contains PHI
Feature activated	`"document_ingestion"`, `"timeline_view"`, `"matter_chat"`	No
Session duration	`47 minutes`	No
Documents processed per session	`12 documents`	No
Feature completion rate	User started form fill, completed or abandoned	No
Error encounters	Feature errors, timeouts, failures	No
Navigation patterns	Order of feature usage	No

What we do NOT collect: Which matter was open, client identifiers, case names, or any content viewed.

Purpose: To prioritize feature development, identify usability problems, and understand which capabilities deliver the most value to attorneys.

4.5 Platform Reliability and Performance Telemetry

Signal	Example	Contains PHI
Request latency	`1,240ms`	No
Error codes and types	`"timeout"`, `"model_overload"`, `"validation_error"`	No
API endpoint performance	`/api/documents/classify` — p95 latency	No
System resource utilization	CPU, memory, queue depth	No
AI provider availability	Uptime, error rates per provider	No

Purpose: System reliability, performance optimization, and SLA compliance.

5. What We Will Never Collect for Improvement

The following are absolutely prohibited from the Platform Improvement pipeline:

Prohibited Data	Why
Document content or OCR text	Contains PHI
AI queries or prompts containing case information	Contain PHI
AI responses containing case-specific information	Contain PHI
Patient names, dates of birth, SSNs	PHI
Diagnoses, treatment history, medications	PHI
Dates of injury or disability ratings	PHI
Attorney-client privileged communications	Privileged
Case numbers or matter identifiers	Case identifiers
Client names or claimant identifiers	PHI adjacent
File names (which frequently contain client names)	PHI adjacent
User-entered text in free-form fields	May contain PHI

6. Observability Tools: Langfuse and AI Tracing

6.1 What Langfuse Does

Adjudica uses Langfuse as its LLM observability platform. Langfuse captures AI interaction traces — the inputs and outputs of AI calls — along with latency, token usage, and user feedback. This gives our engineering team visibility into how AI features are performing in production.

6.2 The PHI Problem with Observability

By default, an LLM observability tool captures full prompt content and full response content. In a standard SaaS application, this is unproblematic. In Adjudica, the prompt to an AI model may contain PHI extracted from medical records, and the response may contain analysis of that PHI. Capturing this in an observability tool would constitute an impermissible disclosure of PHI to a third party.

6.3 Our Solution: PHI Stripping Before Trace Transmission

Before any AI interaction trace is transmitted to Langfuse, it passes through a PHI stripping layer:

PHI Detection: Every prompt and response is scanned for PHI using automated redaction (leveraging phileas, an open-source PII/PHI redaction library by Philterd, LLC, forked and adapted by Glass Box Solutions, which covers 30+ entity types including names, dates, diagnoses, and identifiers).
Redaction: All detected PHI is replaced with typed placeholders (e.g., [PATIENT_NAME], [DATE_OF_INJURY], [DIAGNOSIS]) before the trace is prepared for transmission.
Template Extraction: Where possible, we transmit the prompt template (the structural prompt without variable substitution) rather than the rendered prompt, so Langfuse receives prompt engineering patterns, not patient data.
Metadata Only: For high-sensitivity operations, we transmit only metadata (latency, token counts, model used, feedback scores) and no prompt/response content at all.

What Langfuse receives:

Prompt templates and structures (not rendered with PHI)
Response quality scores
Latency and token counts
User feedback signals (ratings, correction flags)
Error codes and model identifiers
Redacted/placeholder versions of prompts and responses

What Langfuse never receives:

PHI in any form
Actual case content
Patient identifiers
Attorney-client communications

6.4 Langfuse Deployment and Data Agreements

Glass Box Solutions operates Langfuse in a configuration that ensures:

[Self-hosted / Cloud with BAA — to be determined by engineering team at implementation]
Data transmitted to Langfuse does not leave U.S. jurisdictions
Langfuse is contractually prohibited from using Glass Box data for its own model training or product improvement
Data retention in Langfuse is limited to 90 days for trace data

7. Technical Implementation Requirements

The following technical controls are mandatory before any improvement data collection is deployed:

7.1 PHI Firewall (Pre-transmission)

All improvement data pipelines must implement a PHI firewall that:

Runs PHI detection against all data before it leaves the secure perimeter
Replaces PHI with typed placeholders rather than simply deleting (preserving structure for analysis)
Logs all PHI detections for audit purposes
Rejects transmission if PHI detection fails (fail-closed, not fail-open)

7.2 Data Isolation

Platform Improvement Signals must be stored in a data store that is:

Logically and physically isolated from the case data and PHI stores
Not accessible to the same application roles that access PHI
Subject to separate access controls and audit logging
Governed by the shorter retention periods defined in Section 8

7.3 Anonymization Before Aggregation

Before improvement signals are used in aggregate analysis (model retraining, dashboard reporting), signals must be:

Stripped of any user or firm identifiers (anonymized at the firm level, not just patient level)
Aggregated across a minimum cohort size to prevent re-identification
Reviewed by the Privacy Officer before use in model fine-tuning

7.4 No Cross-Matter Correlation

Platform Improvement Signals must not be linkable back to a specific matter or client. No improvement data collection system may record matter identifiers, case numbers, or any field that would allow correlation with a specific legal matter.

8. Retention of Platform Improvement Signals

Signal Type	Retention Period	Basis
Document classification corrections	3 years	Model improvement value
AI response feedback signals	2 years	Model improvement value
Form-fill correction signals	3 years	Model improvement value
Feature usage telemetry	1 year	Product analytics
Langfuse traces (redacted)	90 days	Operational observability
System performance telemetry	1 year	Reliability engineering

Improvement data that has been incorporated into model training artifacts or aggregate analytics may be retained as part of those artifacts under the retention schedules applicable to model artifacts.

9. Transparency: Our Commitment to Openness

Glass Box Solutions operates on the principle that transparency is not a compliance checkbox — it is the product. Adjudica's commercial brand proposition is that we are not a black box. We apply that same principle to our own data practices.

We make the following transparency commitments:

9.1 We Tell You What We Collect

This Policy is publicly accessible. We describe, in plain language, exactly what behavioral signals we collect, what they contain, what they do not contain, and what we use them for. There are no hidden categories of improvement data collection.

9.2 We Tell You Why We Collect It

Every data collection practice in this Policy has a stated purpose tied directly to improving Adjudica, which improves attorney capability, which improves client representation. We do not collect improvement data speculatively or for purposes unrelated to platform improvement.

9.3 We Tell You What We Will Never Do

We have enumerated, explicitly, the data we will never collect for improvement purposes. That list is a commitment, not an aspiration.

9.4 The Purpose We Are Proud Of

We are willing to state publicly: Adjudica collects behavioral feedback and usage signals — stripped of all PHI — because improving Adjudica helps attorneys do their jobs better. Better-equipped attorneys achieve better outcomes for injured workers navigating the California Workers' Compensation system. That is the downstream beneficiary of our data collection. We think that is worth being transparent about.

10. User Rights Regarding Improvement Data

10.1 Right to Know

Users may request a description of the categories of Platform Improvement Signals collected through their use of Adjudica. Contact: privacy@adjudica.ai

10.2 Opt-Out of Improvement Data Collection

Users or firms may request to opt out of Platform Improvement Data collection. Upon opt-out:

No improvement signals will be captured from that user's or firm's sessions
Existing improvement signals (which contain no PHI) will be retained as part of aggregate model improvement datasets
Service quality will not be affected by opting out

To opt out, contact: privacy@adjudica.ai

10.3 No Right to Deletion of Aggregate Signals

Because Platform Improvement Signals are anonymized and may be incorporated into aggregate model training datasets, it may not be technically feasible to identify and delete signals attributable to a specific user once aggregated. We will honor deletion requests to the extent technically feasible.

11. Roles and Responsibilities

Role	Responsibility
Privacy Officer (Alexander Brewsaugh)	Policy ownership, user rights requests, PHI firewall audits
Security Officer (Stephen Cefali)	Technical implementation of PHI stripping, access controls, monitoring
Legal Counsel (Sarah Brewsaugh)	Policy compliance review, regulatory alignment, vendor contract review
Engineering Lead	Implementation of PHI firewall, data isolation, observability tool configuration
AI/ML Lead	Oversight of how improvement signals are used in model development

12. Third-Party Observability and Analytics Tools

All third-party tools used for platform improvement data collection must:

Be approved in writing by the Privacy Officer and Security Officer
Receive only PHI-stripped data (enforced technically, not contractually alone)
Be contractually prohibited from using Glass Box data for their own model training or product improvement
Operate within U.S. jurisdictions only
Be listed in the Approved Observability Tool Register maintained by the Security Officer

Current approved tools:

Langfuse (LLM observability) — see Section 6 for PHI handling specifics
[Additional tools to be added upon approval]

13. Policy Review

This Policy will be reviewed:

Annually: By Privacy Officer, Security Officer, and Legal Counsel
Upon deployment of new improvement data collection: Prior to each new collection category going live
Upon adoption of new observability tools: Before the tool is connected to production data
Following any PHI breach involving improvement pipelines: Immediate review

14. Relationship to Other Policies

This Policy introduces no exceptions to or overrides of the following:

Policy	Relationship
Data Handling Policy	This Policy is additive. PHI handling rules in the Data Handling Policy apply without modification.
AI Governance Policy	The principle "we do not use customer data to train AI models" is preserved. Customer data (PHI, case content) is not used. Platform Improvement Signals are a distinct category that is not "customer data" as defined.
Privacy Notice	The Privacy Notice should be updated to reference this Policy and describe Platform Improvement Signals as a disclosed category of data collection.
BAA / AI Provider Agreements	Our AI provider (Google) is prohibited from training on PHI. PHI stripping ensures no PHI reaches observability tools; BAA requirements for the underlying AI services remain unchanged.

15. Definitions

Platform Improvement Signals: Anonymized, de-identified behavioral metadata generated by user interactions with Adjudica, collected for the purpose of improving platform performance, AI accuracy, and user experience. Does not include PHI or attorney-client privileged content.

PHI Firewall: The technical and procedural controls that prevent Protected Health Information from leaving the secure HIPAA-compliant data perimeter and entering the Platform Improvement data pipeline.

PHI Stripping: The automated process of detecting and redacting Protected Health Information from data before it is transmitted to observability, analytics, or improvement tools.

Observability Tool: A software system used to capture, analyze, and visualize how the Adjudica platform is operating, including AI model performance, latency, error rates, and user feedback. Langfuse is the primary observability tool.

Customer Data: As used in other Glass Box Solutions policies, refers to PHI, case content, documents, and client information processed through Adjudica on behalf of a user's clients. Distinct from Platform Improvement Signals.

Correction Signal: A behavioral signal generated when a user modifies or corrects an AI-generated output, capturing the nature of the correction (e.g., classification changed, rating submitted) but not the content involved.

Prompt Template: The structural framework of a prompt to an AI model, prior to substitution of case-specific variables. Prompt templates do not contain PHI.

phileas: An open-source PII/PHI redaction library developed by Philterd, LLC (Apache License 2.0), forked and adapted by Glass Box Solutions. Covers 30+ entity types and is used as the primary PHI detection and redaction engine in the PHI Firewall. Upstream: github.com/philterd/phileas.

Appendix A: PHI Stripping — What Goes In, What Comes Out

The following examples illustrate how the PHI Firewall transforms data before it leaves the secure perimeter:

Document Classification Correction

Raw internal event (stays inside secure perimeter):

{
  "matter_id": "MTR-2024-0847",
  "document_id": "DOC-98234",
  "original_filename": "Smith_John_QME_2024_01_15.pdf",
  "ai_prediction": "MEDICAL_REPORT",
  "user_correction": "PANEL_QME_REPORT",
  "confidence_score": 0.73,
  "page_count": 24,
  "file_size_kb": 4821,
  "model_version": "doc-classifier-v4.2"
}

PHI-stripped signal (transmitted to improvement pipeline):

{
  "ai_prediction": "MEDICAL_REPORT",
  "user_correction": "PANEL_QME_REPORT",
  "confidence_score": 0.73,
  "page_count": 24,
  "file_size_kb": 4821,
  "model_version": "doc-classifier-v4.2"
}

Removed: matter_id, document_id, original_filename (which contained patient name)

AI Response Quality Signal

Raw internal event (stays inside secure perimeter):

{
  "matter_id": "MTR-2024-0847",
  "user_id": "USR-4421",
  "feature": "matter_chat",
  "prompt": "What was the treating physician's assessment of permanent partial disability for the lumbar spine for claimant John Smith?",
  "response": "According to Dr. Martinez's report dated...",
  "feedback": "thumbs_down",
  "feedback_note": "Citation was wrong",
  "latency_ms": 2340,
  "model": "claude-opus-4",
  "prompt_template_id": "matter-chat-v3",
  "token_count": 1847
}

PHI-stripped signal (transmitted to improvement pipeline):

{
  "feature": "matter_chat",
  "feedback": "thumbs_down",
  "feedback_category": "citation_error",
  "latency_ms": 2340,
  "model": "claude-opus-4",
  "prompt_template_id": "matter-chat-v3",
  "token_count": 1847
}

Removed: matter_id, user_id, full prompt (contains patient name and PHI), full response (contains PHI)

Appendix B: Frequently Asked Questions

Q: Does Adjudica use my clients' medical records to train AI?

A: No. Document content — including all medical records, QME reports, and other PHI — never enters the improvement pipeline. Only behavioral metadata (e.g., "the user corrected the document classification") is collected, and it contains no document content.

Q: Does Langfuse see my case queries?

A: No. Queries to Adjudica's AI features are PHI-stripped before any data is transmitted to Langfuse. Langfuse sees prompt templates and quality metrics — not the actual questions you ask about your clients' cases.

Q: What happens to improvement data if I cancel my account?

A: Because improvement signals contain no PHI and are anonymized, they cannot be linked back to your account after anonymization. Signals collected prior to cancellation may be retained as part of aggregate training datasets.

Q: Can I opt my firm out of improvement data collection?

A: Yes. Contact privacy@adjudica.ai to opt out. Service quality will not be affected.

Q: Does Glass Box benefit financially from improvement data?

A: Improvement data is used exclusively to improve Adjudica. It is not sold, licensed, or used to generate revenue independent of improving the platform. A better Adjudica serves more attorneys more effectively — that is the commercial interest this data serves.

For questions about this Policy, contact:

Privacy Officer: Alexander Brewsaugh — Alex@Adjudica.ai

Security Officer: Stephen Cefali — Steve@Brightdock.com

Legal Counsel: Sarah Brewsaugh — Sarah@Adjudica.ai

@Developed & Documented by Glass Box Solutions, Inc. using human ingenuity and modern technology