ykachala/evals

Prompt and LLM-output regression testing for PHP. Define golden datasets, assert on model output (schema, semantic similarity, LLM-as-judge), and gate CI on quality scores.