1const axios = require('axios');
2
3const fs = require('fs');
4const path = require('path');
5
6async function toB64(imgPath) {
7 const data = fs.readFileSync(path.resolve(imgPath));
8 return Buffer.from(data).toString('base64');
9}
10
11const api_key = "YOUR API-KEY";
12const url = "https://api.segmind.com/v1/sd1.5-img2img";
13
14const data = {
15 "image": "toB64('https://www.segmind.com/sd-img2img-input.jpeg')",
16 "samples": 1,
17 "prompt": "A fantasy landscape, trending on artstation, mystical sky",
18 "negative_prompt": "nude, disfigured, blurry",
19 "scheduler": "DDIM",
20 "num_inference_steps": 25,
21 "guidance_scale": 10.5,
22 "strength": 0.75,
23 "seed": 98877465625,
24 "img_width": 512,
25 "img_height": 512,
26 "base64": false
27};
28
29(async function() {
30 try {
31 const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
32 console.log(response.data);
33 } catch (error) {
34 console.error('Error:', error.response.data);
35 }
36})();
Input Image.
Number of samples to generate.
min : 1,
min : 4
Prompt to render
Prompts to exclude, eg. 'bad anatomy, bad hands, missing fingers'
Type of scheduler.
Allowed values:
Number of denoising steps.
min : 20,
min : 100
Scale for classifier-free guidance
min : 0.1,
min : 25
How much to transform the reference image
min : 0.1,
min : 1
Seed for image generation.
Image resolution.
Allowed values:
Image resolution.
Allowed values:
Base64 encoding of the output image.
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
Stable Diffusion 1.5 Img2Img, released in 2022, is a groundbreaking deep learning model that has revolutionized the field of image generation. This model is primarily designed to generate detailed images based on text descriptions, perform inpainting and outpainting tasks, and facilitate image-to-image translations guided by a text prompt. The model's unique approach involves consuming a text prompt, an existing image, and a strength value ranging from 0.0 to 1.0 to output a new image that incorporates elements from the original image and the text prompt.
The technical architecture of Stable Diffusion 1.5 Img2Img is a blend of an autoencoder and a diffusion model trained in the autoencoder's latent space. The model encodes images through an encoder that transforms them into latent representations, using a relative downsampling factor of 8. Text prompts are encoded through a ViT-L/14 text-encoder, and the non-pooled output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention. The model's loss is a reconstruction objective between the noise added to the latent and the prediction made by the UNet. The strength value denotes the amount of noise added to the output image, with higher values producing more variation but potentially less semantic consistency with the provided prompt.
Stable Diffusion 1.5 Img2Img offers several advantages over traditional image generation models. Its ability to add noise to the original image makes it a valuable tool for data anonymization and data augmentation, where visual features of image data are altered and anonymized. The model's potential for image upscaling, adding more detail to an image while increasing its resolution, is another significant benefit. Furthermore, Stable Diffusion has been explored as a tool for image compression, although it currently faces limitations in preserving small text and faces compared to JPEG and WebP.
Data Anonymization: Adding noise to original images to protect sensitive information.
Data Augmentation: Altering and enhancing image data for machine learning tasks.
Image Upscaling: Increasing the resolution of an image and potentially adding more detail.
Image Compression: Compressing images while preserving their essential features.
Image-to-Image Translations: Generating new images based on a given prompt and an existing image.
The model is licensed under the Creative ML OpenRAIL-M license, a form of Responsible AI License (RAIL). This license prohibits certain use cases, including crime, libel, harassment, doxing, exploiting minors, giving medical advice, creating legal obligations automatically, producing legal evidence, and discriminating against or harming individuals or groups based on social behavior, personal characteristics, or legally protected categories. However, users retain the rights to their generated output images and are free to use them commercially.