Finetuning Claude 3 Haiku on Bedrock
In this notebook, we'll walk you through the process of finetuning Claude 3 Haiku on Amazon Bedrock
What You'll Need
- An AWS account with access to Bedrock
- A dataset (or you can use the sample dataset provided here)
- A service role capable of accessing the s3 bucket where you save your training data
Install Dependencies
!pip install boto3import boto3Prep a Dataset
Your dataset for bedrock finetuning needs to be a JSONL file (i.e. a file with a json object on each line).
Each line in the JSONL file should be a JSON object with the following structure:
{
"system": "<optional_system_message>",
"messages": [
{"role": "user", "content": "user message"},
{"role": "assistant", "content": "assistant response"},
...
]
}
- The
systemfield is optional. - There must be at least two messages.
- The first message must be from the "user".
- The last message must be from the "assistant".
- User and assistant messages must alternate.
- No extraneous keys are allowed.
Sample Dataset - JSON Mode
We've included a sample dataset that teaches a model to respond to all questions with JSON. Here's what that dataset looks like:
import json
sample_dataset = []
dataset_path = "datasets/json_mode_dataset.jsonl"
with open(dataset_path) as f:
for line in f:
sample_dataset.append(json.loads(line))
print(json.dumps(sample_dataset[0], indent=2))Upload your dataset to S3
Your dataset for finetuning should be available on s3; for this demo we'll write the sample dataset to an s3 bucket you control
bucket_name = "YOUR_BUCKET_NAME"
s3_path = "json_mode_dataset.jsonl"
s3 = boto3.client("s3")
s3.upload_file(dataset_path, bucket_name, s3_path)Launch Bedrock Finetuning Job
Now that you have your dataset ready, you can launch a finetuning job using boto3. First we'll configure a few parameters for the job:
# Configuration
job_name = "anthropic-finetuning-cookbook-training"
custom_model_name = "anthropic_finetuning_cookbook"
role = "YOUR_AWS_SERVICE_ROLE_ARN"
output_path = f"s3://{bucket_name}/finetuning_example_results/"
base_model_id = (
"arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-haiku-4-5-20251001-v1:0:200k"
)
# Hyperparameters
epoch_count = 5
batch_size = 4
learning_rate_multiplier = 1.0Then we can launch the job with boto3
bedrock = boto3.client(service_name="bedrock")
bedrock_runtime = boto3.client(service_name="bedrock-runtime")
bedrock.create_model_customization_job(
customizationType="FINE_TUNING",
jobName=job_name,
customModelName=custom_model_name,
roleArn=role,
baseModelIdentifier=base_model_id,
hyperParameters={
"epochCount": f"{epoch_count}",
"batchSize": f"{batch_size}",
"learningRateMultiplier": f"{learning_rate_multiplier}",
},
trainingDataConfig={"s3Uri": f"s3://{bucket_name}/{s3_path}"},
outputDataConfig={"s3Uri": output_path},
)You can use this to check the status of your job while its training:
# Check for the job status
status = bedrock.get_model_customization_job(jobIdentifier=job_name)["status"]Use your finetuned model!
To use your finetuned model, you'll need to host it using Provisioned Throughput in Amazon Bedrock. Once your model is ready with Provisioned Throughput, you can invoked your model via the Bedrock API.
provisioned_throughput_arn = "YOUR_PROVISIONED_THROUGHPUT_ARN"bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
body = json.dumps(
{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1000,
"system": "JSON Mode: Enabled",
"messages": [
{
"role": "user",
"content": [{"type": "text", "text": "What is a large language model?"}],
}
],
}
)
response = bedrock_runtime.invoke_model(modelId=provisioned_throughput_arn, body=body)
body = json.loads(response["body"].read().decode("utf-8"))print(body["content"][0]["text"])