GMS Inventory Health Analyzer — Eval Review

Skill 6 of 6 • Iteration 1 • 2026-03-28 • 20 assertions across 3 evals

With Skill
90%
18 / 20 assertions passed
Without Skill (Baseline)
15%
3 / 20 assertions passed
Delta
+75%
15 additional assertions passed
Note: The with-skill agent missed 2 assertions in Eval 1 — mislabeling MEDIUM severity as HIGH (threshold mapping error) and not explicitly classifying velocity tier. These are minor presentation issues; the underlying math was correct. Eval 2 and Eval 3 scored 100%.

Eval 1: Reorder Recommendation Analysis

With: 5/7 Without: 1/7
Prompt: A product has 15 units in stock. Over the last 90 days it sold 270 units (3 units/day average). The 30-day velocity is 4 units/day, 60-day is 3.5 units/day, 90-day is 3 units/day. Lead time from the supplier is 14 days. Walk me through the reorder analysis.
AssertionTypeWith SkillWithout Skill
Weighted velocity: (4x0.5)+(3.5x0.3)+(3x0.2) = 3.65CRITICAL PASS
Exact: 2.0 + 1.05 + 0.6 = 3.65 units/day
FAIL
Wrong weights (40/35/25), gets 3.575
Reorder point: 3.65 x (14+7) = ~77 unitsCRITICAL PASS
3.65 x 21 = 76.65 ~ 77 units
FAIL
Gets 126.4 using safety factor 1.5x multiplier
Days on hand: 15 / 3.65 ~ 4.1 daysCRITICAL PASS
15 / 3.65 = 4.11 days
PASS
15 / 3.6 = 4.2 days (close with rounding)
Alert severity: MEDIUM (15 < 38.5 but > 7.7) FAIL
Correctly checks 15 < 38.5 but labels it HIGH instead of MEDIUM
FAIL
Says CRITICAL — massively overclassified
Order qty: max(154, 219) - 15 = 204 units PASS
max(77x2, 3.65x30x2) - 15 = 204 units exact
FAIL
Gets 234 using ad-hoc formula
Uses 7-day safety stock buffer PASS
"Default safety stock: 7 days" — 3.65 x 7 = 25.55 units
FAIL
Uses "safety factor 1.5" not 7-day buffer
Classifies velocity as 'Normal' (1-5 units/day) FAIL
Doesn't use velocity tier classification. Says "Accelerating" but not Fast/Normal/Slow/Dead
FAIL
No velocity classification system used

Eval 2: Dead Stock Identification & Action Plan

With: 7/7 Without: 1/7
Prompt: Explain how to identify dead stock at GMS. We have about 25,000 active products. What's the definition, how do I find it, what metrics should I look at, and what should I do with it? Give me the full framework including the action matrix.
AssertionTypeWith SkillWithout Skill
Dead stock: stock > 0, zero sales in 180 days, not discontinuedCRITICAL PASS
Exact 3-part definition with field names (decUnitsInStock, ysnDiscontinued)
FAIL
Uses "12+ months" not 180 days, no ysnDiscontinued
Filter ysnDiscontinued to exclude 46,606 discontinuedCRITICAL PASS
"Start with 25,283 active products (filter ysnDiscontinued = False)"
FAIL
Never mentions ysnDiscontinued or product count split
Action matrix: >$5K/$500-$5K/<$500 x 180-365d/>365d PASS
Full matrix with all 6 cells and specific actions per cell
FAIL
Has matrix but wrong tiers (>$500/<$500 and month-based)
Value at Risk = Dead Stock Qty x curCost PASS
"Value at Risk = Dead Stock Quantity x Current Cost"
FAIL
Uses carrying cost calc, not "Value at Risk" concept
Cut-to-order warning — check parent COIL PASS
"Low panel inventory is often normal. Check the parent COIL."
FAIL
Vague mention of panel niche, no coil-checking guidance
Paradigm ERP source of truth, API down since 2026-03-14 PASS
"Paradigm API is currently down (as of 2026-03-14)"
FAIL
Mentions Paradigm but doesn't know API status
Does NOT recommend /api/user/Inventory/AdjustCRITICAL PASS
Uses Physical Count Worksheets and IADJ 2-step only
PASS
Doesn't mention the forbidden endpoint

Eval 3: Portfolio Health Score Calculation

With: 6/6 Without: 1/6
Prompt: Explain the health score system. How is the 0-100 score calculated? Walk through scoring a product with 200 units, RP of 150, 8 units/day steady, $12/unit cost, 25 days at current levels.
AssertionTypeWith SkillWithout Skill
All 5 factors with correct weightsCRITICAL PASS
Stock Level 30%, Velocity 25%, VAR 20%, DOH 15%, Consistency 10%
FAIL
Wrong names and weights (20-25% ranges, guessed factors)
Stock Level = HEALTHY (100) since 200 > 150CRITICAL PASS
"HEALTHY (100 pts): At or above reorder point" → 30 points
FAIL
"200/150 = 1.33x = ~80-85/100" — wrong scoring method
Velocity = 'Fast Mover' (>5 units/day) PASS
"8 units/day falls in the Fast Mover category (> 5 units/day)"
FAIL
"8 units/day is healthy" — no tier system
DOH: 200/8 = 25 days, within target range PASS
"DOH = 200 / 8 = 25 days" referencing 14-30 day target
FAIL
Says "25-day supply" but no DOH calc or benchmarks
Status tiers: EXCELLENT/GOOD/FAIR/POOR/CRITICAL PASS
All 5 tiers with exact score ranges (80-100, 60-79, etc.)
FAIL
Just says "77/100 Healthy" — no tier system
Product identified as healthy fast mover PASS
Score 100/100 EXCELLENT — "optimal health, well-stocked fast mover"
PASS
"~77/100 Healthy" — directionally correct