Benchmarks
Module for benchmarking language models.
This module provides an Evaluator class to run evaluations with various metrics.
Evaluator
Evaluator class to execute language model evaluations using various metrics.
Attributes:
Name | Type | Description |
---|---|---|
metrics |
list
|
List of metric classes to evaluate outputs. |
online |
bool
|
Flag indicating if evaluation should be run online using a pipeline. |
pipeline |
callable
|
A callable that runs the model inference on input text. |
Source code in langbench/benchmarks.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
|
__init__(pipeline=None, online=False)
Initializes the Evaluator instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipeline
|
callable
|
A callable function for processing text. Required if 'online' is True. |
None
|
online
|
bool
|
If True, executes pipeline processing during evaluation. |
False
|
Raises:
Type | Description |
---|---|
ValueError
|
If 'online' is True but pipeline is not provided. |
Source code in langbench/benchmarks.py
add_metric(metric)
Adds a metric to the evaluator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
metric
|
class
|
A metric class to be added to the evaluator. |
required |
call_pipeline(text)
Calls the pipeline function on the provided text.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The input text to evaluate. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
The output content from the pipeline's processing. |
Source code in langbench/benchmarks.py
evaluate(input_data)
Evaluates the input data using all added metrics.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_data
|
DataFrame
|
A pandas DataFrame with at least an 'input' column. |
required |
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
A DataFrame containing original data along with evaluation metric outputs. |
Source code in langbench/benchmarks.py
execute(data)
Executes the pipeline on the 'input' column of the data frame and calculates latency.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
DataFrame
|
A pandas DataFrame containing the 'input' column. |
required |
Source code in langbench/benchmarks.py
generate_report(data)
Generates an HTML report of evaluation metrics using Plotly box plots. The report is saved as 'report.html' in the current directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
DataFrame
|
The evaluated DataFrame containing metrics. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
An HTML string containing the report with embedded Plotly graphs. |