Prompt Guardrails: When the Data Model Is the Real Fix

When an LLM produces wrong output, the first instinct is to fix the prompt.

Add more instructions. Be more specific. Add examples.

This works sometimes.

But there is a class of errors where the prompt is not the problem. The problem is in the data model the LLM is reasoning about.

Here is a real case from a production marketing AI system.

The symptom

The system detects marketing performance issues and writes human-readable recommendations.

One issue type - call it FATIGUED_OR_RED_MEDIA - covered two distinct situations:

Creative fatigue. Ads seen too many times by the same audience, leading to declining engagement.

Low-quality or disapproved media. Ads being suppressed by the platform because of quality signals, policy flags, or audience rejection rates.

The problem: the LLM was regularly confusing the two.

A recommendation for a creative being suppressed for quality reasons would describe it as "fatigued" - pointing the team toward refreshing the creative, when the actual fix was reviewing the quality signals.

The inverse also happened. Fatigued creatives got described with language about "platform suppression."

Wrong direction every time.

The wrong fix

First attempt was a prompt change:

IMPORTANT: Do not describe frequency-based fatigue as platform suppression.
Do not describe platform-suppressed media as fatigued.
If the issue is frequency, use the word "fatigue".
If the issue is quality/rejection, use "quality signals" or "disapproval".

This helped on straightforward cases.

It failed on edge cases where the evidence was mixed or the context was long.

Prompt instructions are not machine code. They are suggestions with varying compliance.

The deeper issue: the prompt instruction was trying to do the work that the data model was supposed to do.

FATIGUED_OR_RED_MEDIA was ambiguous by design. The original detection logic could not reliably distinguish the two situations, so it combined them into one issue type.

The LLM was being asked to distinguish what the data model refused to distinguish.

The real fix

Split FATIGUED_OR_RED_MEDIA into two distinct issue types:

FATIGUED_MEDIA: frequency-driven fatigue (high avg frequency, declining CTR over time)
RED_MEDIA_CLASSIFICATION: platform quality/rejection signals (low quality score, disapproval flags, high audience rejection rate)

Each issue type now has distinct detection criteria, different evidence fields, and different recommended actions.

@dataclass
class FatiguedMediaEvidence:
    avg_frequency: float
    frequency_benchmark: float
    ctr_delta_7d: float
    engagement_rate_delta_7d: float
    # No quality_score, no disapproval_flags
 
@dataclass
class RedMediaEvidence:
    quality_score: float
    quality_benchmark: float
    disapproval_flags: list[str]
    audience_rejection_rate: float
    # No frequency fields

The LLM prompt for FATIGUED_MEDIA does not mention platform suppression because the evidence object does not contain suppression signals.

The LLM prompt for RED_MEDIA_CLASSIFICATION does not mention frequency because the evidence object does not contain frequency data.

The evidence object makes conflation impossible.

Deterministic post-processing as a guardrail

Even with the split data model, I added one more check.

A deterministic keyword validation runs on LLM output before it writes to the database:

FATIGUE_TERMS = {"fatigued", "fatigue", "frequency", "overexposed"}
SUPPRESSION_TERMS = {"suppressed", "disapproved", "rejected", "quality score", "policy"}
 
def validate_recommendation(
    issue_type: str,
    recommendation_text: str,
) -> tuple[bool, str | None]:
    text_lower = recommendation_text.lower()
 
    if issue_type == "FATIGUED_MEDIA":
        found = SUPPRESSION_TERMS & set(text_lower.split())
        if found:
            return False, f"Suppression language in fatigue recommendation: {found}"
 
    if issue_type == "RED_MEDIA_CLASSIFICATION":
        found = FATIGUE_TERMS & set(text_lower.split())
        if found:
            return False, f"Fatigue language in suppression recommendation: {found}"
 
    return True, None

If the check fails, the recommendation is flagged for review or regeneration.

After the data model split, this check fires near zero times in normal operation. But the flag rate is a useful signal for prompt quality. If it starts rising, something changed.

The general pattern

A few things this case makes clear.

Ambiguous issue types produce ambiguous outputs. If your data model has catch-all categories, your LLM will produce catch-all recommendations. The fix is in the schema, not the prompt.

Evidence objects prevent cross-contamination. When the evidence object for an issue type only contains fields relevant to that type, the LLM cannot reason about irrelevant signals. This is more effective than telling the LLM what to ignore.

Deterministic post-processing is cheap insurance. A keyword check on the output catches the rare cross-contamination that slips through.

Prompts handle language, not logic. When you find yourself writing long prompt instructions to compensate for an underspecified data model, the data model is the problem.

What changed

After splitting the issue type:

FATIGUED_MEDIA recommendations consistently describe frequency trends, audience saturation, and creative refresh direction
RED_MEDIA_CLASSIFICATION recommendations consistently describe quality signals, platform feedback, and remediation steps
Post-processing validation fires near zero times

The creative team now gets recommendations that point to the correct fix without having to re-diagnose the issue themselves.

Related posts in this series:

If you are building AI systems that generate recommendations from structured data and want to discuss the architecture, get in touch.