Amazon SageMaker Make clear makes it simpler to judge and choose basis fashions (preview)

November 29, 2023

29

I’m completely happy to share that Amazon SageMaker Make clear now helps basis mannequin (FM) analysis (preview). As an information scientist or machine studying (ML) engineer, now you can use SageMaker Make clear to judge, examine, and choose FMs in minutes primarily based on metrics equivalent to accuracy, robustness, creativity, factual information, bias, and toxicity. This new functionality provides to SageMaker Make clear’s present potential to detect bias in ML knowledge and fashions and clarify mannequin predictions.

The brand new functionality supplies each computerized and human-in-the-loop evaluations for giant language fashions (LLMs) wherever, together with LLMs accessible in SageMaker JumpStart, in addition to fashions educated and hosted exterior of AWS. This removes the heavy lifting of discovering the fitting mannequin analysis instruments and integrating them into your growth surroundings. It additionally simplifies the complexity of attempting to undertake tutorial benchmarks to your generative synthetic intelligence (AI) use case.

Consider FMs with SageMaker Make clear
With SageMaker Make clear, you now have a single place to judge and examine any LLM primarily based on predefined standards throughout mannequin choice and all through the mannequin customization workflow. Along with computerized analysis, you can even use the human-in-the-loop capabilities to arrange human evaluations for extra subjective standards, equivalent to helpfulness, artistic intent, and elegance, through the use of your individual workforce or managed workforce from SageMaker Floor Reality.

To get began with mannequin evaluations, you should utilize curated immediate datasets which can be purpose-built for widespread LLM duties, together with open-ended textual content era, textual content summarization, query answering (Q&A), and classification. It’s also possible to lengthen the mannequin analysis with your individual customized immediate datasets and metrics on your particular use case. Human-in-the-loop evaluations can be utilized for any process and analysis metric. After every analysis job, you obtain an analysis report that summarizes the ends in pure language and consists of visualizations and examples. You may obtain all metrics and reviews and likewise combine mannequin evaluations into SageMaker MLOps workflows.

In SageMaker Studio, yow will discover Mannequin analysis underneath Jobs within the left menu. It’s also possible to choose Consider immediately from the mannequin particulars web page of any LLM in SageMaker JumpStart.

Choose Consider a mannequin to arrange the analysis job. The UI wizard will information you thru the choice of computerized or human analysis, mannequin(s), related duties, metrics, immediate datasets, and evaluate groups.

As soon as the mannequin analysis job is full, you’ll be able to view the ends in the analysis report.

Along with the UI, you can even begin with instance Jupyter notebooks that stroll you thru step-by-step directions on methods to programmatically run mannequin analysis in SageMaker.

Consider fashions wherever with the FMEval open supply library
To run mannequin analysis wherever, together with fashions educated and hosted exterior of AWS, use the FMEval open supply library. The next instance demonstrates methods to use the library to judge a customized mannequin by extending the ModelRunner class.

For this demo, I select GPT-2 from the Hugging Face mannequin hub and outline a customized HFModelConfig and HuggingFaceCausalLLMModelRunner class that works with causal decoder-only fashions from the Hugging Face mannequin hub equivalent to GPT-2. The instance can be accessible within the FMEval GitHub repo.

!pip set up fmeval

# ModelRunners invoke FMs
from amazon_fmeval.model_runners.model_runner import ModelRunner

# Further imports for customized mannequin
import warnings
from dataclasses import dataclass
from typing import Tuple, Non-compulsory
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

@dataclass
class HFModelConfig:
    model_name: str
    max_new_tokens: int
    normalize_probabilities: bool = False
    seed: int = 0
    remove_prompt_from_generated_text: bool = True

class HuggingFaceCausalLLMModelRunner(ModelRunner):
    def __init__(self, model_config: HFModelConfig):
        self.config = model_config
        self.mannequin = AutoModelForCausalLM.from_pretrained(self.config.model_name)
        self.tokenizer = AutoTokenizer.from_pretrained(self.config.model_name)

    def predict(self, immediate: str) -> Tuple[Optional[str], Non-compulsory[float]]:
        input_ids = self.tokenizer(immediate, return_tensors="pt").to(self.mannequin.system)
        generations = self.mannequin.generate(
            **input_ids,
            max_new_tokens=self.config.max_new_tokens,
            pad_token_id=self.tokenizer.eos_token_id,
        )
        generation_contains_input = (
            input_ids["input_ids"][0] == generations[0][: input_ids["input_ids"].form[1]]
        ).all()
        if self.config.remove_prompt_from_generated_text and never generation_contains_input:
            warnings.warn(
                "Your mannequin doesn't return the immediate as a part of its generations. "
                "`remove_prompt_from_generated_text` does nothing."
            )
        if self.config.remove_prompt_from_generated_text and generation_contains_input:
            output = self.tokenizer.batch_decode(generations[:, input_ids["input_ids"].form[1] :])[0]
        else:
            output = self.tokenizer.batch_decode(generations, skip_special_tokens=True)[0]

        with torch.inference_mode():
            input_ids = self.tokenizer(self.tokenizer.bos_token + immediate, return_tensors="pt")["input_ids"]
            model_output = self.mannequin(input_ids, labels=input_ids)
            chance = -model_output[0].merchandise()

        return output, chance

Subsequent, create an occasion of HFModelConfig and HuggingFaceCausalLLMModelRunner with the mannequin info.

hf_config = HFModelConfig(model_name="gpt2", max_new_tokens=32)
mannequin = HuggingFaceCausalLLMModelRunner(model_config=hf_config)

Then, choose and configure the analysis algorithm.

# Let's consider the FM for FactualKnowledge
from amazon_fmeval.fmeval import get_eval_algorithm
from amazon_fmeval.eval_algorithms.factual_knowledge import FactualKnowledgeConfig

eval_algorithm_config = FactualKnowledgeConfig("<OR>")
eval_algorithm = get_eval_algorithm("factual_knowledge", eval_algorithm_config)

Let’s first check with one pattern. The analysis rating is the share of factually appropriate responses.

model_output = mannequin.predict("London is the capital of")[0]
print(model_output)

eval_algo.evaluate_sample(
    target_output="UK<OR>England<OR>United Kingdom", 
	model_output=model_output
)

the UK, and the UK is the biggest producer of meals on the planet.

The UK is the world's largest producer of meals on the planet.
[EvalScore(name="factual_knowledge", value=1)]

Though it’s not an ideal response, it consists of “UK.”

Subsequent, you’ll be able to consider the FM utilizing built-in datasets or outline your customized dataset. If you wish to use a customized analysis dataset, create an occasion of DataConfig:

config = DataConfig(
    dataset_name="my_custom_dataset",
    dataset_uri="dataset.jsonl",
    dataset_mime_type=MIME_TYPE_JSONLINES,
    model_input_location="query",
    target_output_location="reply",
)

eval_output = eval_algorithm.consider(
    mannequin=mannequin, 
    dataset_config=config, 
    prompt_template="$function", #$function is changed by the enter worth within the dataset 
    save=True
)

The analysis outcomes will return a mixed analysis rating throughout the dataset and detailed outcomes for every mannequin enter saved in a neighborhood output path.

Be a part of the preview
FM analysis with Amazon SageMaker Make clear is out there right this moment in public preview in AWS Areas US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Tokyo), Europe (Frankfurt), and Europe (Eire). The FMEval open supply library] is out there on GitHub. To be taught extra, go to Amazon SageMaker Make clear.

Get began
Log in to the AWS Administration Console and begin evaluating your FMs with SageMaker Make clear right this moment!

— Antje

Previous articleAccountable AI: The Essential Function of AI Watchdogs in Countering Election Disinformation

Next articleHow do you scale non-public 5G networks? Nicely, it relies upon…

Amazon SageMaker Make clear makes it simpler to judge and choose basis fashions (preview)

Dell Applied sciences storage developments speed up AI and generative AI methods

Meet the Cisco Safety Danger Rating (previously Kenna Danger Rating)

The seven pillars of recent AI improvement: Leaning into the period of customized copilots

LEAVE A REPLY Cancel reply

Most Popular

iOS Dev Weekly – The perfect iOS improvement hyperlinks, each Friday

Seven Key Product Bulletins from Google I/O 2024

OFRF Awarded USDA NRCS Cooperative Settlement to Bolster Natural Producers Nationwide

The best way to resolve between a Set and Array in Swift? – Donny Wals

Recent Comments

ABOUT US

POPULAR POSTS

iOS Dev Weekly – The perfect iOS improvement hyperlinks, each Friday

Seven Key Product Bulletins from Google I/O 2024

OFRF Awarded USDA NRCS Cooperative Settlement to Bolster Natural Producers Nationwide

POPULAR CATEGORY