• 免费开放商用!Stability AI推轻量级AI绘画利器 Stable Diffusion 3.5 Medium模型


    Stability AI再次突破技术壁垒,推出全新Stable Diffusion3.5Medium模型。这款面向大众的AI绘画工具不仅完全免费开放商用,更重要的是实现了高性能与普及性的完美平衡。

    这款采用多模态扩散变换器(MMDiT-X)架构的模型,以25亿参数的精简设计,巧妙解决了普通用户的硬件门槛问题。仅需9.9GB显存,便能在大多数消费级显卡上流畅运行,真正实现了"人人可用"的愿景。

    在这里插入图片描述
    在技术创新方面,该模型整合了三种预训练文本编码器,并引入QK标准化技术提升训练稳定性。特别值得一提的是,其前12个变换层中的双重注意力模块设计,让模型在图像质量、排版效果和复杂提示理解等方面都有显著提升。

    模型的训练过程融合了合成数据与精选公共数据,采用渐进式分辨率提升的混合训练策略,确保了生成图像的多样性和质量。与同类中型模型相比,它在图像生成效果和处理速度上都展现出明显优势。

    不过,用户在使用过程中需要注意一些细节:过长的提示词可能导致图像边缘出现瑕疵;建议使用跳层指导采样方式来优化图像的结构完整性;同时要注意,由于训练数据分布的差异,相同提示词可能会产生不同的创作效果。

    这款模型的发布,不仅为个人创作者和初创企业提供了便捷的AI创作工具,更体现了Stability AI推动AI技术普及化的决心。无论是用于艺术创作还是教育开发,它都将为更广泛的用户群体带来AI创作的可能性。

    模型下载地址:https://huggingface.co/stabilityai/stable-diffusion-3.5-medium

    架构

    在这里插入图片描述
    Stable Diffusion 3.5 Medium 是一款改进型多模态扩散转换器(MMDiT-X)文本到图像模型,在图像质量、排版、复杂提示理解和资源效率方面的性能都有所提高。

    ├── text_encoders/  
    │   ├── README.md
    │   ├── clip_g.safetensors
    │   ├── clip_l.safetensors
    │   ├── t5xxl_fp16.safetensors
    │   └── t5xxl_fp8_e4m3fn.safetensors
    │
    ├── README.md
    ├── LICENSE
    ├── sd3.5_medium.safetensors
    ├── SD3.5M_example_workflow.json
    ├── SD3.5M_SLG_example_workflow.json
    ├── SD3.5L_plus_SD3.5M_upscaling_example_workflow.json
    └── sd3_medium_demo.jpg
    
    ** File structure below is for diffusers integration**
    ├── scheduler/
    ├── text_encoder/
    ├── text_encoder_2/
    ├── text_encoder_3/
    ├── tokenizer/
    ├── tokenizer_2/
    ├── tokenizer_3/
    ├── transformer/
    ├── vae/
    └── model_index.json
    

    Diffusers

    pip install -U diffusers
    
    import torch
    from diffusers import StableDiffusion3Pipeline
    
    pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-medium", torch_dtype=torch.bfloat16)
    pipe = pipe.to("cuda")
    
    image = pipe(
        "A capybara holding a sign that reads Hello World",
        num_inference_steps=40,
        guidance_scale=4.5,
    ).images[0]
    image.save("capybara.png")
    
    

    使用扩散器量化模型

    减少 VRAM 使用量,让模型适合 🤏 VRAM GPU

    pip install bitsandbytes
    
    from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
    from diffusers import StableDiffusion3Pipeline
    import torch
    
    model_id = "stabilityai/stable-diffusion-3.5-medium"
    
    nf4_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )
    model_nf4 = SD3Transformer2DModel.from_pretrained(
        model_id,
        subfolder="transformer",
        quantization_config=nf4_config,
        torch_dtype=torch.bfloat16
    )
    
    pipeline = StableDiffusion3Pipeline.from_pretrained(
        model_id, 
        transformer=model_nf4,
        torch_dtype=torch.bfloat16
    )
    pipeline.enable_model_cpu_offload()
    
    prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree.  As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"
    
    image = pipeline(
        prompt=prompt,
        num_inference_steps=40,
        guidance_scale=4.5,
        max_sequence_length=512,
    ).images[0]
    image.save("whimsical.png")
    
    
  • 相关阅读:
    春风吹又生的开源项目「GitHub 热点速览」
    会议OA之会议排座&送审
    谷粒商城-基础篇-Day01
    2022年下半年系统架构设计师下午真题及答案解析
    QT-界面控件学习笔记
    C#堆排序算法
    火山引擎云原生存储加速实践
    浅析RocketMQ-消息重建
    Jenkins集成AppScan实现
    “看片”神器没了,又将有谁突出重围?
  • 原文地址:https://blog.csdn.net/weixin_41446370/article/details/143363240