8

Generate Images With DALL·E 2 and the OpenAI API

 1 year ago
source link: https://realpython.com/generate-images-with-dalle-openai-api/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Generate Images With DALL·E 2 and the OpenAI API – Real Python

Complete the Setup Requirements

If you’ve seen what DALL·E can do and you’re eager to make its functionality part of your Python applications, then you’re in the right spot! In this first section, you’ll quickly walk through what you need to do to get started using DALL·E’s image creation capabilities in your own code.

Install the OpenAI Python Library

Confirm that you’re running Python version 3.7.1 or higher, create and activate a virtual environment, and install the OpenAI Python library:

PS> python --version
Python 3.11.0
PS> python -m venv venv
PS> .\venv\Scripts\activate
(venv) PS> python -m pip install openai

The openai package gives you access to the full OpenAI API. In this tutorial, you’ll focus on the Image class, which you can use to interact with DALL·E to create and edit images from text prompts.

Get Your OpenAI API Key

You need an API key to make successful API calls. Sign up for the OpenAI API and create a new API key by clicking on the dropdown menu on your profile and selecting View API keys:

API key page in the OpenAI web UI profile window

On this page, you can manage your API keys, which allow you to access the service that OpenAI offers through their API. You can create and delete secret keys.

Click on Create new secret key to create a new API key, and copy the value shown in the pop-up window:

Pop up window displaying the generated secret API key

Always keep this key secret! Copy the value of this key so you can later use it in your project. You’ll only see the key value once.

Save Your API Key as an Environment Variable

A quick way to save your API key and make it available to your Python scripts is to save it as an environment variable. Select your operating system to learn how:

(venv) PS> $ENV:OPENAI_API_KEY = "<your-key-value-here>"

With this command, you make the API key accessible under the environment variable OPENAI_API_KEY in your current terminal session. Keep in mind that you’ll lose it if you close your terminal.

You could name your variable however you like, but if you use OPENAI_API_KEY, which is the name suggested by the OpenAI documentation, then you’ll be able to use the provided code examples without needing to do any additional setup.

Understand Pricing for DALL·E and Other OpenAI API Products

OpenAI assigns your API usage through the unique key values, so make sure to keep your API key private only to yourself. The company calculates pricing of requests to the Images API on a per-image basis that depends on the resolution of the output image:

Resolution Price per image
256×256 $0.016
512×512 $0.018
1024×1024 $0.020

If you signed up with OpenAI’s API recently, then you’ll benefit from the free trial that allows you to use $18 of free credits within your first three months. That allows you to generate a lot of images if you’re just here to explore!

However, keep in mind that it’s a single free trial budget across all OpenAI API services, so you might not want to spend it all on creating stunning images. Also note that you can’t use the credits from the DALL·E web interface for API calls.

Note: OpenAI’s API services are changing rapidly. You should check their web page for up-to-date information about pricing and offers.

With the pricing and logistics out of the way, and your API key safely stored, you’re now ready to create some images from text prompts.

Create an Image From a Text Prompt With OpenAI’s DALL·E

Start by confirming that you’re set up and ready to go by using the openai library through its command-line interface:

(venv) $ openai api image.create -p "a vaporwave computer"

This command will send a request to OpenAI’s Images API and create one image from the text prompt "a vaporwave computer". As a result, you’ll receive a JSON response that contains a URL that points to your freshly created image:

{
  "created": 1668073562,
  "data": [
    {
      "url": "https://oaidalleapiprodscus.blob.core.windows.net/private/org
      ⮑ -QANMxYn3BsMeuAbRT8X3iiu3/user-xSuQTJ0IIVj3dHM4DPymXTg4/img-5GqtVx
      ⮑ L86Retwi282RbE8HzA.png?st=2022-11-10T08%3A46%3A02Z&se=2022-11-10T1
      ⮑ 0%3A46%3A02Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&sk
      ⮑ oid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-
      ⮑ a814-9c849652bcb3&skt=2022-11-09T14%3A20%3A19Z&ske=2022-11-10T14%3
      ⮑ A20%3A19Z&sks=b&skv=2021-08-06&sig=yorbHuIy/qHhWvGPmJrZ8apJptorzpI
      ⮑ 0/62VH2lmhcg%3D"
    }
  ]
}

Click your URL or copy and paste it into your browser to view the image. Here’s the image that DALL·E dreamt up for my request:

A computer from the 90ies with a plant growing out of it in vaporwave style colors

'a vaporwave computer'

Your image will look different. That’s because the diffusion model creates each of these images only when you submit the request.

Note: The URL with your generated image is only valid for one hour, so make sure to save the image to your computer if you like it and want to keep it around.

The API also follows the same content policy as the web interface. If you send text prompts that conflict with the content policy, you won’t receive a result, and you might get blocked after repeated violations.

Now that you’ve confirmed that everything is set up correctly and you got a glimpse of what you can do with the OpenAI Images API, you’ll next learn how to integrate it into a Python script.

Call the API From a Python Script

It’s great that you can create an image from the command-line interface (CLI), but it’d be even better to incorporate this functionality into your Python applications. There’s a lot of exciting stuff you could build!

Note: Note that the Images API is in public beta. This means that the API will still evolve, might change significantly, and might therefore not be ideal for building production applications. It also currently enforces a rate limit of ten images per minute and twenty-five images per five minutes.

Open your favorite code editor and write a script that you’ll use to create an image from a text prompt just like you did using the command-line before:

 1# create.py
 2
 3import os
 4
 5import openai
 6
 7PROMPT = "An eco-friendly computer from the 90s in the style of vaporwave"
 8
 9openai.api_key = os.getenv("OPENAI_API_KEY")
10
11response = openai.Image.create(
12    prompt=PROMPT,
13    n=1,
14    size="256x256",
15)
16
17print(response["data"][0]["url"])

Just like before, this code sends an authenticated request to the API that generates a single image based on the text in PROMPT. Note that this code adds some tweaks that’ll help you to build more functionality into the script:

  • Line 7 defines the text prompt as a constant. For more specific results, you added more text to better describe the image that you want to get. Additionally, putting this text into a constant at the top of your script allows you to quickly refactor your code to collect the text from user input instead, because its value is quicker to find and edit.

  • Line 9 gets your API key from the environment variable that you saved it to earlier. Because you’ve named the environment variable OPENAI_API_KEY, you don’t even need this line of code. The openai library automatically accesses the API key value from your environment as long as you stuck to the suggested name. With this line of code, you could also load it from a differently named environment variable.

  • Line 11 creates an instance of openai.Image and calls .create() on it. The next couple of lines contain some of the parameters that you can pass to the method.

  • Line 12 passes the value of PROMPT to the fittingly named prompt parameter. With that, you give DALL·E the text that it’ll use to create the image. Note that you also passed a text prompt when you called the API from the command-line interface.

  • Line 13 is a parameter that you haven’t used before. It passes the integer 1 to the parameter n. This parameter lets you define how many new images you want to create with the prompt. The value of n needs to be between one and ten and defaults to 1.

  • Line 14 shows you another new parameter that you haven’t used when calling the API from your CLI. With size, you can define the dimensions of the image that DALL·E should generate. The argument needs to be a string—either "256x256", "512x512", or "1024x1024". Each string represents the dimensions in pixels of the image that you’ll receive. It defaults to the largest possible setting, 1024x1024.

Finally, you also want to get the URL so that you can look at the generated image online. For this, you step through the JSON response to the "url" key in line 17 and print its value to your terminal.

When you run this script, you’ll get output that’s similar to before, but now you won’t see the whole JSON response, only the URL:

(venv) $ python create.py
https://oaidalleapiprodscus.blob.core.windows.net/private/org-QANMxYn3BsMe
⮑ uAbRT8X3iiu3/user-xSuQTJ0IIVj3dHM4DPymXTg4/img-4AMS4wJJLFsu6ClQmGDppAeV
⮑ .png?st=2022-11-10T12%3A22%3A46Z&se=2022-11-10T14%3A22%3A46Z&sp=r&sv=20
⮑ 21-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-
⮑ 684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2022-11-10T
⮑ 10%3A55%3A29Z&ske=2022-11-11T10%3A55%3A29Z&sks=b&skv=2021-08-06&sig=xJW
⮑ imMiA1/nGmFMYKUTsJq7G1u4xSL652r/MrzTH0Nk%3D

Click the link or paste it in your browser to view the generated image. Your image will again look different, but you should see an image that resembles the prompt that you used in PROMPT:

A vaporwave style computer from the 90ies that is green with a plant next to it

'An eco-friendly computer from the 90s in the style of vaporwave'

You may notice that this image is much smaller than the one you created with the CLI call. That’s because you asked the API for a 256x256 pixel image through the size parameter. Smaller are less expensive, so you just saved some money! As a successful saver, maybe you’d like to save something else—your image data.

Save the Image Data to a File

While it’s great that you’re creating images from text using Python, DALL·E, and the OpenAI API, the responses are currently quite fleeting. If you want to continue to work with the generated image within your Python script, it’s probably better to skip the URL and access the image data directly instead:

 1# create.py
 2
 3import os
 4
 5import openai
 6
 7PROMPT = "An eco-friendly computer from the 90s in the style of vaporwave"
 8
 9openai.api_key = os.getenv("OPENAI_API_KEY")
10
11response = openai.Image.create(
12    prompt=PROMPT,
13    n=1,
14    size="256x256",
15    response_format="b64_json",
16)
17
18print(response["data"][0]["b64_json"][:50])

The API allows you to switch the response format from a URL to the Base64-encoded image data. In line 15, you set the value of response_format to "b64_json". The default value of this parameter is "url", which is why you’ve received URLs in the JSON responses up to now.

While the JSON response that you get after applying this change looks similar to before, the dictionary key to access the image data is now "b64_json" instead of "url". You applied this change in the call to print() on line 18 and limited the output to the first fifty characters.

If you run the script with these settings, then you’ll get the actual data of the generated image. But don’t run the script yet, because the image data will be lost immediately after the script runs, and you’ll never get to see the image!

To avoid losing the one perfect image that got away, you can store the JSON responses in a file instead of printing them to the terminal:

 1# create.py
 2
 3import json
 4import os
 5from pathlib import Path
 6
 7import openai
 8
 9PROMPT = "An eco-friendly computer from the 90s in the style of vaporwave"
10DATA_DIR = Path.cwd() / "responses"
11
12DATA_DIR.mkdir(exist_ok=True)
13
14openai.api_key = os.getenv("OPENAI_API_KEY")
15
16response = openai.Image.create(
17    prompt=PROMPT,
18    n=1,
19    size="256x256",
20    response_format="b64_json",
21)
22
23file_name = DATA_DIR / f"{PROMPT[:5]}-{response['created']}.json"
24
25with open(file_name, mode="w", encoding="utf-8") as file:
26    json.dump(response, file)

With a few additional lines of code, you’ve added file handling to your Python script using pathlib and json:

  • Lines 10 and 12 define and create a data directory called "responses/" that’ll hold the API responses as JSON files.

  • Line 23 defines a variable for the file path where you want to save the data. You use the beginning of the prompt and the timestamp from the JSON response to create a unique file name.

  • Lines 25 and 26 create a new JSON file in the data directory and write the API response as JSON to that file.

With these additions, you can now run your script and generate images, and the image data will stick around in a dedicated file within your data directory.

Did you run the script and inspect the generated JSON file? Looks like gibberish, doesn’t it? So where’s that beautiful image that you know with certainty is the best image ever created by DALL·E?

It’s right there, only it’s currently represented as Base64-encoded bits, which doesn’t make for a great viewing experience if you’re a human. In the next section, you’ll learn how you can convert Base64-encoded image data into a PNG file that you can look at.

Decode a Base64 JSON Response

You just saved a PNG image as a Base64-encoded string in a JSON file. That’s great because it means that your image won’t get lost in the ether of the Internet after one hour, like it does if you keep generating URLs with your API calls.

However, now you can’t look at your image—unless you learn how to decode the data. Fortunately, this doesn’t require a lot of code in Python, so go ahead and create a new script file to accomplish this conversion:

 1# convert.py
 2
 3import json
 4from base64 import b64decode
 5from pathlib import Path
 6
 7DATA_DIR = Path.cwd() / "responses"
 8JSON_FILE = DATA_DIR / "An ec-1667994848.json"
 9IMAGE_DIR = Path.cwd() / "images" / JSON_FILE.stem
10
11IMAGE_DIR.mkdir(parents=True, exist_ok=True)
12
13with open(JSON_FILE, mode="r", encoding="utf-8") as file:
14    response = json.load(file)
15
16for index, image_dict in enumerate(response["data"]):
17    image_data = b64decode(image_dict["b64_json"])
18    image_file = IMAGE_DIR / f"{JSON_FILE.stem}-{index}.png"
19    with open(image_file, mode="wb") as png:
20        png.write(image_data)

The script convert.py will read a JSON file with the filename that you defined in JSON_FILE. Remember that you’ll need to adapt the value of JSON_FILE to match the filename of your JSON file, which will be different.

The script then fetches the Base64-encoded string from the JSON data, decodes it, and saves the resulting image data as a PNG file in a directory. Python will even create that directory for you, if necessary.

Note that this script will also work if you’re fetching more than one image at a time. The for loop will decode each image and save it as a new file.

Note: You can generate JSON files with Base64-encoded data of multiple images by running create.py after passing a value higher than 1 to the n parameter.

Most of the code in this script is about reading and writing files from and into the correct folders. The true star of the code snippet is b64decode(). You import the function in line 4 and put it to work in line 17. It decodes the Base64-encoded string so that you can save the actual image data as a PNG file. Your computer will then be able to recognize it as a PNG image and know how to display to you.

After running the script, you can head into the newly created folder structure and open the PNG file to finally see the ideal generated image that you’ve been waiting for so long:

An eco-friendly computer from the 90ies in the style of vaporwave

'An eco-friendly computer from the 90s in the style of vaporwave'

Is it everything you’ve ever hoped for? If so, then rejoice! However, if the image you got looks kind of like what you’re looking for but not quite, then you can make another call to the API where you pass your image as input and create a couple of variations of it.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK