Add faithfulness check by josediegorobles · Pull Request #2556 · Giskard-AI/giskard-oss

josediegorobles · 2026-06-22T23:41:15Z

Adds a Faithfulness check for evaluating whether an answer is grounded in the provided source material.

The check follows the existing LLM judge pattern, uses a structured judge response with a 0-1 faithfulness score, and supports a configurable passing threshold. Source material can be passed directly or resolved from the trace with source_key; answers can also be passed directly or resolved with answer_key.

Includes unit tests for passing and failing cases, threshold handling, trace-based inputs, prompt contents, serialization, and exports.

Validation:

make test-unit PACKAGE=giskard-checks
make check

Closes #2368

gemini-code-assist

Code Review

This pull request introduces a new LLM-based check, Faithfulness, along with its structured output model FaithfulnessCheckResult, a Jinja2 prompt template, and corresponding unit tests. The check evaluates whether an AI agent's answer faithfully represents the provided source material. The feedback recommends validating that both the answer and source are successfully resolved before running the check to prevent sending unresolved inputs (like None or NoMatch) to the LLM, and suggests adding a unit test to verify this error handling.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-22T23:42:35Z

+    async def get_inputs(self, trace: Trace[InputType, OutputType]) -> dict[str, Any]:
+        answer = provided_or_resolve(
+            trace,
+            key=self.answer_key,
+            value=provide_not_none(self.answer),
+        )
+        source = self._resolve_source(trace)
+
+        return {
+            "answer": answer,
+            "source": self._format_source(source),
+            "threshold": self.threshold,
+        }


If answer or source cannot be resolved (resulting in None or NoMatch), the check currently proceeds and passes the stringified representation (e.g., "None" or "NoMatch(...)") to the LLM. This leads to incorrect evaluations and wasted API calls.

We should validate that both answer and source are successfully resolved and are not None before proceeding.

@override async def get_inputs(self, trace: Trace[InputType, OutputType]) -> dict[str, Any]: answer = provided_or_resolve( trace, key=self.answer_key, value=provide_not_none(self.answer), ) if answer is None or isinstance(answer, NoMatch): raise ValueError( f"Could not resolve answer from trace using key '{self.answer_key}' " "and no direct answer was provided." ) source = self._resolve_source(trace) if source is None or isinstance(source, NoMatch): raise ValueError( f"Could not resolve source from trace using key '{self.source_key}' " "and no direct source was provided." ) return { "answer": answer, "source": self._format_source(source), "threshold": self.threshold, }

Closing to consolidate efforts. The feedback here is solid and actionable — will revisit in the next iteration if we return to faithfulness evals. For now, focusing on #2555 which has a strategic duplicate scenario to resolve first. Thanks for the thorough review.—

gemini-code-assist · 2026-06-22T23:42:35Z

+    assert data["kind"] == "faithfulness"
+    assert isinstance(restored, Faithfulness)
+    assert restored.answer_key == "trace.last.outputs.answer"
+    assert restored.source_key == "trace.last.metadata.source"
+    assert restored.threshold == 0.9


It would be beneficial to add a unit test to verify that a ValueError is raised when the answer or source cannot be resolved from the trace.

Suggested change

assert data["kind"] == "faithfulness"

assert isinstance(restored, Faithfulness)

assert restored.answer_key == "trace.last.outputs.answer"

assert restored.source_key == "trace.last.metadata.source"

assert restored.threshold == 0.9

assert data["kind"] == "faithfulness"

assert isinstance(restored, Faithfulness)

assert restored.answer_key == "trace.last.outputs.answer"

assert restored.source_key == "trace.last.metadata.source"

assert restored.threshold == 0.9

async def test_unresolved_inputs_raise_value_error() -> None:

check = Faithfulness(

answer_key="trace.last.outputs.non_existent",

source_key="trace.last.metadata.non_existent",

)

with pytest.raises(ValueError, match="Could not resolve answer"):

await check.run(Trace())

feat(checks): add faithfulness check

fba35dc

josediegorobles mentioned this pull request Jun 22, 2026

Add faithfulness check #2368

Open

4 tasks

github-actions Bot added the Scope: Checks label Jun 22, 2026

gemini-code-assist Bot reviewed Jun 22, 2026

View reviewed changes

josediegorobles closed this Jun 24, 2026

fix(checks): validate faithfulness inputs

b4f81ab

josediegorobles reopened this Jun 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add faithfulness check#2556

Add faithfulness check#2556
josediegorobles wants to merge 2 commits into
Giskard-AI:mainfrom
josediegorobles:add-faithfulness-check

josediegorobles commented Jun 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Uh oh!

josediegorobles Jun 24, 2026

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

josediegorobles commented Jun 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

josediegorobles Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant