Skip to content

Add faithfulness check#2484

Open
DarioDiPalma-DDP wants to merge 1 commit into
Giskard-AI:mainfrom
DarioDiPalma-DDP:feat/faithfulness-check
Open

Add faithfulness check#2484
DarioDiPalma-DDP wants to merge 1 commit into
Giskard-AI:mainfrom
DarioDiPalma-DDP:feat/faithfulness-check

Conversation

@DarioDiPalma-DDP

Copy link
Copy Markdown

Description

Adds a built-in Faithfulness LLM-based check for evaluating whether a generated answer faithfully represents the provided source material.

The check is intended for RAG and source-grounded LLM workflows, where the answer should preserve the meaning, scope, nuance, and factual content of the source without distortion, misrepresentation, or unsupported claims.

Changes included:

  • Added the Faithfulness check.
  • Added the faithfulness.j2 judge prompt.
  • Registered and exported the new check.
  • Added tests for faithful, distorted, partially faithful, trace-based extraction, direct-value priority, and list-based source material.

Related Issue

Closes #2368

Type of Change

  • 📚 Examples / docs / tutorials / dependencies update
  • 🔧 Bug fix (non-breaking change which fixes an issue)
  • 🥂 Improvement (non-breaking change which improves an existing feature)
  • 🚀 New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to change)
  • 🔐 Security fix

Checklist

  • I've read the CODE_OF_CONDUCT.md document.
  • I've read the CONTRIBUTING.md guide.
  • I've written tests for all new methods and classes that I created.
  • I've written the docstring in NumPy format for all the methods and classes that I created or modified.
  • I've updated the uv.lock running uv lock (only applicable when pyproject.toml has been
    modified)

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Faithfulness check to the Giskard library, designed to evaluate whether an AI agent's response accurately represents provided source material. The implementation includes the Faithfulness class, a dedicated Jinja2 prompt template, and comprehensive unit tests. Feedback was provided regarding the get_inputs method to improve extensibility by calling the base class, enhance LLM readability by joining list-based sources with newlines, and ensure type consistency with the base class signature.

Comment on lines +74 to +104
async def get_inputs(self, trace: Trace[InputType, OutputType]) -> dict[str, str]:
"""Build template variables from resolved inputs.

Parameters
----------
trace : Trace
Trace for resolving inputs.

Returns
-------
dict[str, str]
Template variables with ``answer`` and ``source`` keys.
"""
answer = provided_or_resolve(
trace,
key=self.answer_key,
value=provide_not_none(self.answer),
)

source: Any
if self.source is not None:
source = self.source
elif self.source_key is not None:
source = provided_or_resolve(trace, key=self.source_key)
else:
source = ""

return {
"answer": str(answer),
"source": str(source),
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The get_inputs implementation can be improved in several ways:

  1. Extensibility: It should call await super().get_inputs(trace) to include base inputs (like the trace object itself). This ensures that users can reference the trace in custom prompt templates if they choose to override the default prompt.
  2. List Handling: Since source can be a list[str], using str(source) results in a Python list representation (e.g., ['doc1', 'doc2']) being injected into the prompt. Joining the list with double newlines provides a much more natural and effective format for LLM evaluation.
  3. Type Consistency: The return type should be dict[str, Any] to match the base class signature and accommodate the trace object in the returned dictionary.
    async def get_inputs(self, trace: TraceType) -> dict[str, Any]:
        """Build template variables from resolved inputs.

        Parameters
        ----------
        trace : Trace
            Trace for resolving inputs.

        Returns
        -------
        dict[str, Any]
            Template variables with ``answer`` and ``source`` keys.
        """
        answer = provided_or_resolve(
            trace,
            key=self.answer_key,
            value=provide_not_none(self.answer),
        )

        source: Any
        if self.source is not None:
            source = self.source
        elif self.source_key is not None:
            source = provided_or_resolve(trace, key=self.source_key)
        else:
            source = ""

        # Join list-based sources for better LLM readability
        if isinstance(source, list):
            source = "\n\n".join(map(str, source))

        inputs = await super().get_inputs(trace)
        inputs.update({
            "answer": str(answer),
            "source": str(source),
        })
        return inputs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Add faithfulness check

1 participant