OpenAI不同模型测试

本文将通过简单代码示例,快速测试 OpenAI 的多种模型能力,包括:

🎨 图片生成(DALL·E)、🗣️ 语音交互(TTS/Whisper)、💬 文本生成(GPT) 等。

关于常用参数解释:生成图片有不止一个模型,文中代码所调取哪个模型,则是哪个模型的参数。其他功能模型同上。

Vector embeddings

  • encoding_format

    The format to return the embeddings in. Can be either float or base64

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from dotenv import load_dotenv
from openai import OpenAI
import os

# load_dotenv() 自动从 .env文件中加载环境变量
load_dotenv()

OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY')
client = OpenAI(
base_url='https://api.openai-proxy.org/v1',
api_key=OPENAI_API_KEY
)

response = client.embeddings.create(
input="Your text string goes here",
model="text-embedding-3-small"
)

print(response.data[0].embedding)

Video generation

OpenAI 代理暂未提供 Sora,所以暂不测试

Image generation

常用参数解释

  • n

    The number of images to generate. Must be between 1 and 10. For dall-e-3, only n=1 is supported.

  • size

    The size of the generated images. Must be one of 1024x1024, 1536x1024 (landscape), 1024x1536 (portrait), or auto (default value) for gpt-image-1, one of 256x256, 512x512, or 1024x1024 for dall-e-2, and one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3.

  • style

    The style of the generated images. This parameter is only supported for dall-e-3. Must be one of vivid or natural. Vivid causes the model to lean towards generating hyper-real and dramatic images. Natural causes the model to produce more natural, less hyper-real looking images.

  • quality

    • auto (default value) will automatically select the best quality for the given model.

    • high, medium and low are supported for gpt-image-1.

    • hd and standard are supported for dall-e-3.

    • standard is the only option for dall-e-2.

  • response_format

    The format in which generated images with dall-e-2 and dall-e-3 are returned. Must be one of url or b64_json. URLs are only valid for 60 minutes after the image has been generated. This parameter isn’t supported for gpt-image-1 which will always return base64-encoded images.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from dotenv import load_dotenv
from openai import OpenAI
import os
import base64

# load_dotenv() 自动从 .env文件中加载环境变量
load_dotenv()

OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY')
client = OpenAI(
base_url='https://api.openai-proxy.org/v1',
api_key=OPENAI_API_KEY
)

prompt = """
A children's book drawing of a veterinarian using a stethoscope to
listen to the heartbeat of a baby otter.
"""

result = client.images.generate(
model="dall-e-2",
prompt=prompt,
response_format='b64_json'
)

# 从 OpenAI 返回的 API 结果中,获取第一张图片的 Base64 编码字符串(图片的文本表示形式)
image_base64 = result.data[0].b64_json
# 将这个 Base64 字符串 解码成原始的二进制数据(即图片的字节内容),可用于保存为图片文件或进一步处理
image_bytes = base64.b64decode(image_base64)


# 写入到本地
with open("otter.png", "wb") as f:
f.write(image_bytes)

Text to speech

  • voice

    The voice to use when generating the audio. Supported voices are alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, and verse. Previews of the voices

  • response_format

    The format to audio in. Supported formats are mp3, opus, aac, flac, wav, and pcm.

  • speed

    The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from dotenv import load_dotenv
from openai import OpenAI
from pathlib import Path
import os
import base64

# load_dotenv() 自动从 .env文件中加载环境变量
load_dotenv()

OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY')
client = OpenAI(
base_url='https://api.openai-proxy.org/v1',
api_key=OPENAI_API_KEY
)

speech_file_path = Path(__file__).parent / "speech.mp3"

# 写入到本地
with client.audio.speech.with_streaming_response.create(
model="gpt-4o-mini-tts",
voice="coral",
input="Today is a wonderful day to build something people love!",
instructions="Speak in a cheerful and positive tone.",
) as response:
response.stream_to_file(speech_file_path)

Speech to text

维度 Transcriptions(转写) Translations(翻译)
功能目标 将音频识别为音频中原始语言的文本 将音频识别并翻译成英文文本
输出语言 原始语言(如中文→中文、英文→英文) 固定为英文
模型支持 whisper-1gpt-4o-transcribegpt-4o-mini-transcribe 目前仅支持 whisper-1
输出格式 默认 JSON;可选 text、srt、verbose_json、vtt 目前仅 text
时间戳与词级对齐 通过 timestamp_granularities=[“word”/“segment”] 获取分段/词级时间戳(verbose_json) 不支持时间戳相关参数
提示词 prompting 可用(新版转写模型支持上下文/风格/术语提示) 不支持(仅 whisper-1,且提示词能力有限)
文件与大小 支持 mp3、mp4、mpeg、mpga、m4a、wav、webm;单文件 ≤ 25 MB 同左
典型场景 字幕生成、会议纪要、资料归档、语种保留的文本产出 跨语种内容统一为英文文本、英文检索/分析管线

下面仅展示Translations

参数与代码都只针对Translations

  • prompt

    An optional text to guide the model’s style or continue a previous audio segment. The promptshould be in English.

  • response_format

    The format of the output, in one of these options: json, text, srt, verbose_json, or vtt.

  • temperature

    The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from dotenv import load_dotenv
from openai import OpenAI
from pathlib import Path
import os
import base64

# load_dotenv() 自动从 .env文件中加载环境变量
load_dotenv()

OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY')
client = OpenAI(
base_url='https://api.openai-proxy.org/v1',
api_key=OPENAI_API_KEY
)

audio_file= open("voice/test1.mp3", "rb")

transcription = client.audio.translations.create(
model="whisper-1",
file=audio_file
)

print(transcription.text)