OpenAI不同模型测试
本文将通过简单代码示例,快速测试 OpenAI 的多种模型能力,包括:
🎨 图片生成(DALL·E)、🗣️ 语音交互(TTS/Whisper)、💬 文本生成(GPT) 等。
关于常用参数解释:生成图片有不止一个模型,文中代码所调取哪个模型,则是哪个模型的参数。其他功能模型同上。
Vector embeddings
encoding_format
The format to return the embeddings in. Can be either
floatorbase64
1 | |
Video generation
OpenAI 代理暂未提供 Sora,所以暂不测试
Image generation
常用参数解释
n
The number of images to generate. Must be between 1 and 10. For
dall-e-3, onlyn=1is supported.size
The size of the generated images. Must be one of
1024x1024,1536x1024(landscape),1024x1536(portrait), orauto(default value) forgpt-image-1, one of256x256,512x512, or1024x1024fordall-e-2, and one of1024x1024,1792x1024, or1024x1792fordall-e-3.style
The style of the generated images. This parameter is only supported for
dall-e-3. Must be one ofvividornatural. Vivid causes the model to lean towards generating hyper-real and dramatic images. Natural causes the model to produce more natural, less hyper-real looking images.quality
auto(default value) will automatically select the best quality for the given model.high,mediumandloware supported forgpt-image-1.hdandstandardare supported fordall-e-3.standardis the only option fordall-e-2.
response_format
The format in which generated images with
dall-e-2anddall-e-3are returned. Must be one ofurlorb64_json. URLs are only valid for 60 minutes after the image has been generated. This parameter isn’t supported forgpt-image-1which will always return base64-encoded images.
1 | |
Text to speech
voice
The voice to use when generating the audio. Supported voices are
alloy,ash,ballad,coral,echo,fable,onyx,nova,sage,shimmer, andverse. Previews of the voicesresponse_format
The format to audio in. Supported formats are
mp3,opus,aac,flac,wav, andpcm.speed
The speed of the generated audio. Select a value from
0.25to4.0.1.0is the default.
1 | |
Speech to text
| 维度 | Transcriptions(转写) | Translations(翻译) |
|---|---|---|
| 功能目标 | 将音频识别为音频中原始语言的文本 | 将音频识别并翻译成英文文本 |
| 输出语言 | 原始语言(如中文→中文、英文→英文) | 固定为英文 |
| 模型支持 | whisper-1、gpt-4o-transcribe、gpt-4o-mini-transcribe | 目前仅支持 whisper-1 |
| 输出格式 | 默认 JSON;可选 text、srt、verbose_json、vtt | 目前仅 text |
| 时间戳与词级对齐 | 通过 timestamp_granularities=[“word”/“segment”] 获取分段/词级时间戳(verbose_json) | 不支持时间戳相关参数 |
| 提示词 prompting | 可用(新版转写模型支持上下文/风格/术语提示) | 不支持(仅 whisper-1,且提示词能力有限) |
| 文件与大小 | 支持 mp3、mp4、mpeg、mpga、m4a、wav、webm;单文件 ≤ 25 MB | 同左 |
| 典型场景 | 字幕生成、会议纪要、资料归档、语种保留的文本产出 | 跨语种内容统一为英文文本、英文检索/分析管线 |
下面仅展示Translations
参数与代码都只针对Translations
prompt
An optional text to guide the model’s style or continue a previous audio segment. The promptshould be in English.
response_format
The format of the output, in one of these options:
json,text,srt,verbose_json, orvtt.temperature
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
1 | |