Get in touch
Email me at gorka@iand.dev gorka@iand.dev link
This script helps to automate the process of preparing data for finetuning on OpenAI models, specifically GPT-3.5 and Babbage. It also provides utilities to validate the data, transform the data to the required JSONL format, and estimate the cost of the finetuning process.
pyfiglet
, openai
, tiktoken
, dotenv
, argparse
, json
, re
, os
, sys
, time
, clint
To install the required libraries:
pip install pyfiglet openai tiktoken python-dotenv argparse clint
or
pip install requirements.txt
python ftup.py [-k <API_KEY>] -m <MODEL_NAME> -f <INPUT_FILE> [-s <SUFFIX>] [-e <EPOCHS>]
Arguments:
-k, --key
: Optional. API key. Optional argument, but required in default env to have an API key in enviroment. OPENAI_API_KEY-m, --model
: Required. Model to use. Options: gpt
for gpt-3.5-turbo-0613
or bab
for babbage-002
.-f, --file
: Required. Input data file (JSONL format).-s, --suffix
: Optional. Add a suffix for your finetuned model. E.g., ‘my-suffix-title-v-1’.-e, --epoch
: Optional. Number of epochs for training. Default is 3.Store your API key in a .env
file in the format:
OPENAI_API_KEY=your_api_key_here
The script will load by default this key if not -k / --key
passed as an argument.
check_key(key)
: Validates format for OpenAI API key.check_model(model)
: Validates the model name.check_jsonl_file(file)
: Checks if the provided file has a valid JSONL name and if it exists.create_update_jsonl_file(model, file)
: Check if JSONL have a correct format and uploads file to OpenAI.update_ft_job(file_id_name, model, suffix, epoch)
: Creates or updates the finetuning job on OpenAI.check_jsonl_gpt35(file)
: Validates the format for GPT-3.5 training.check_jsonl_babbage(file)
: Validates the format for Babbage-002 training.cost_gpt(file, epochs)
: Estimates the cost of the finetuning process.Email me at gorka@iand.dev gorka@iand.dev link