Answer context similarity

AnswerContextSimilarityEvaluator #

Bases: BaseEvaluator

Measures how much context are related to the given answer. A higher value suggests a greater proportion of the context is present in the LLM's response.

Attributes:

Name	Type	Description
`embed_model`	`BaseEmbedding`	The embedding model used to compute vector representations.
`similarity_mode`	`SimilarityMode`	Similarity strategy to use. Supported options are `"cosine"`, `"dot_product"`, and `"euclidean"`. Defaults to `"cosine"`.
`score_threshold`	`float`	Determining whether a context segment "passes". Must be between 0.0 and 1.0. Defaults to `0.8`.

Example

from beekeeper.core.evaluation import AnswerContextSimilarityEvaluator
from beekeeper.embedding.huggingface import HuggingFaceEmbedding

embedding = HuggingFaceEmbedding()
answer_ctx_evaluator = AnswerContextSimilarityEvaluator(embed_model=embedding)

evaluate #

evaluate(query: str | None = None, generated_text: str | None = None, contexts: list[str] | None = None, **kwargs: Any) -> dict

Evaluate the given inputs and return evaluation results.

Parameters:

Name	Type	Description	Default
`generated_text`	`str`	LLM response based on given context.	`None`
`contexts`	`list[str]`	List of contexts used to generate LLM response.	`None`

Example

evaluation_result = answer_ctx_evaluator.evaluate(
    contexts=["context 1", "context 2"],
    generated_text="The capital of France is Paris.",
)
print(f"Score: {evaluation_result['score']}")
print(f"Passing: {evaluation_result['passing']}")