GMS Supplier & Purchasing Intelligence — Eval Review

Skill 9 of 11 • Iteration 1 • 2026-03-28 • 21 assertions across 3 evals

With Skill
100%
21 / 21 assertions passed
Baseline
10%
2 / 21 assertions passed
Delta
+90%
19 additional assertions passed

Eval 1: Supplier Routing Decision for 26ga Order

With: 7/7Without: 1/7
Prompt: 26ga SSQ550 Dark Bronze, architect spec requires 40-year warranty. Which supplier, cost, lead time? What changes if warranty dropped?
AssertionWWith SkillWithout Skill
Routes to CMG for 40-year KynarCRITP
"CMG (Coated Metals Group)" — Kynar only option for 40-year spec
P
Correctly identifies CMG for 40-year Kynar
CMG 26ga: $4.37/LF for 43"CRITP
"$4.37 per linear foot (43" width)"
F
"$2.80-$3.20/LF" — 27-35% below actual
CMG lead time: 4-6 weeksCRITP
"Lead Time: 4-6 weeks"
F
"8-12 weeks" — 2× too long
If dropped: USS SMP at $2.59-2.65/LFP
"$2.59/LF (41.5625" width)"
F
"$2.30-$2.60" — low end of range
USS lead time: 2-3 weeksP
"Lead Time: 2-3 weeks"
F
"4-8 weeks" — 2-4× too long
SSQ550 = standing seam, 16" coverageP
Correctly references as standing seam panel
F
No panel-specific details
CMG is 65-69% more expensiveP
"USS costs 65% less than CMG"
F
"~18-20%" — massively underestimates

Eval 2: Reorder Point and PO Recommendation

With: 7/7Without: 0/7
Prompt: CO4129ARW at 45 LF. Velocities: 30d=8.2, 60d=6.5, 90d=5.8 LF/day. Calculate reorder point, urgency, and PO recommendation.
AssertionWWith SkillWithout Skill
Routes to USS (29ga only)CRITP
"Supplier: USS (only source for 29ga)"
F
No supplier routing mentioned
Weighted velocity = 7.21 LF/dayCRITP
"(8.2×0.5)+(6.5×0.3)+(5.8×0.2) = 7.21 LF/day"
F
Used only 30-day velocity (8.2), not weighted
USS: 14-21d lead + 7d safety = 21-28dCRITP
"14-21 days lead time + 7 days safety stock = 21-28 days"
F
"10 business days" + 15% buffer — wrong
Reorder point: 151-202 LFP
"Conservative: 151.41, Standard: 176.64, Aggressive: 201.88"
F
"94.3 LF" — 47-53% below correct range
Urgency: HIGHP
"HIGH — 45 < 88.32 (RP × 0.50)" with threshold math
F
Said "CRITICAL" — wrong urgency level
Order qty: max(RP×2, monthly×2) - stockP
"max(353.28, 432.6) = 432.6, Qty = 432.6 - 45 = 388 LF"
F
Generic EOQ: 340 LF — wrong formula
Base cost: $2.17/LF for 29gaP
"$2.17/LF × 388 = $841.96"
F
No cost mentioned

Eval 3: Supplier Cost Analysis for Quote

With: 7/7Without: 1/7
Prompt: 500 LF of 26ga Charcoal Gray, no paint system specified. Compare USS SMP vs CMG Kynar — cost, total, lead time, and what to ask.
AssertionWWith SkillWithout Skill
USS SMP 26ga: $2.59-2.65/LFCRITP
"USS: $2.65/LF, finish +$0.00, Total: $2.65"
F
Refused: "need current spot pricing"
CMG Kynar 26ga: $4.37/LFCRITP
"CMG: $4.37/LF, finish +$0.00, Total: $4.37"
F
Refused: "need current spot pricing"
Total difference ~$860 for 500 LFP
"$1,325 vs $2,185 = +$860 (65% premium)"
F
No calculation possible
Lead: USS 2-3wk vs CMG 4-6wkP
"USS: 2-3 weeks, CMG: 4-6 weeks"
F
No lead times provided
Warranty: Lifetime vs 40-YearP
"Lifetime Paint & Fade vs 40-Year Paint & Fade"
F
Said "30-year" for SMP — wrong
Customer questions: application, spec, warrantyP
7 detailed questions + recommendation framework
P
6 solid questions about application and environment
26ga = crossover gauge from both suppliersP
Explicitly noted as key routing decision
F
Not mentioned