Add faithfulness check#2484
Open
DarioDiPalma-DDP wants to merge 1 commit into
Open
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces a new Faithfulness check to the Giskard library, designed to evaluate whether an AI agent's response accurately represents provided source material. The implementation includes the Faithfulness class, a dedicated Jinja2 prompt template, and comprehensive unit tests. Feedback was provided regarding the get_inputs method to improve extensibility by calling the base class, enhance LLM readability by joining list-based sources with newlines, and ensure type consistency with the base class signature.
Comment on lines
+74
to
+104
| async def get_inputs(self, trace: Trace[InputType, OutputType]) -> dict[str, str]: | ||
| """Build template variables from resolved inputs. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| trace : Trace | ||
| Trace for resolving inputs. | ||
|
|
||
| Returns | ||
| ------- | ||
| dict[str, str] | ||
| Template variables with ``answer`` and ``source`` keys. | ||
| """ | ||
| answer = provided_or_resolve( | ||
| trace, | ||
| key=self.answer_key, | ||
| value=provide_not_none(self.answer), | ||
| ) | ||
|
|
||
| source: Any | ||
| if self.source is not None: | ||
| source = self.source | ||
| elif self.source_key is not None: | ||
| source = provided_or_resolve(trace, key=self.source_key) | ||
| else: | ||
| source = "" | ||
|
|
||
| return { | ||
| "answer": str(answer), | ||
| "source": str(source), | ||
| } |
Contributor
There was a problem hiding this comment.
The get_inputs implementation can be improved in several ways:
- Extensibility: It should call
await super().get_inputs(trace)to include base inputs (like thetraceobject itself). This ensures that users can reference the trace in custom prompt templates if they choose to override the default prompt. - List Handling: Since
sourcecan be alist[str], usingstr(source)results in a Python list representation (e.g.,['doc1', 'doc2']) being injected into the prompt. Joining the list with double newlines provides a much more natural and effective format for LLM evaluation. - Type Consistency: The return type should be
dict[str, Any]to match the base class signature and accommodate thetraceobject in the returned dictionary.
async def get_inputs(self, trace: TraceType) -> dict[str, Any]:
"""Build template variables from resolved inputs.
Parameters
----------
trace : Trace
Trace for resolving inputs.
Returns
-------
dict[str, Any]
Template variables with ``answer`` and ``source`` keys.
"""
answer = provided_or_resolve(
trace,
key=self.answer_key,
value=provide_not_none(self.answer),
)
source: Any
if self.source is not None:
source = self.source
elif self.source_key is not None:
source = provided_or_resolve(trace, key=self.source_key)
else:
source = ""
# Join list-based sources for better LLM readability
if isinstance(source, list):
source = "\n\n".join(map(str, source))
inputs = await super().get_inputs(trace)
inputs.update({
"answer": str(answer),
"source": str(source),
})
return inputs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a built-in
FaithfulnessLLM-based check for evaluating whether a generated answer faithfully represents the provided source material.The check is intended for RAG and source-grounded LLM workflows, where the answer should preserve the meaning, scope, nuance, and factual content of the source without distortion, misrepresentation, or unsupported claims.
Changes included:
Faithfulnesscheck.faithfulness.j2judge prompt.Related Issue
Closes #2368
Type of Change
Checklist
CODE_OF_CONDUCT.mddocument.CONTRIBUTING.mdguide.uv.lockrunninguv lock(only applicable whenpyproject.tomlhas beenmodified)