Deep Currents 04.04.25

Deep Currents 04.04.25
Listen to this post with the Elevenlabs Text to Speech AudioNative Player.

Welcome to the second instalment of Deep Currents, a monthly curated list of recent breakthroughs, product updates, and helpful articles that have surfaced recently in the rapidly-evolving world of generative AI. These are the stories that stood out as particularly relevant to me over the past few weeks, as a professional UX designer trying to stay on top of this rapidly evolving field. Hopefully, this transmission will help you keep your head above water too.

Okay, let's dive into this month’s currents…

Image Generation

After a relatively slow winter, this spring is shaping up to be a big season for AI image generators, with several significant product launches and updates recently.

  • First out of the gate was an entirely new entrant in the image generation sector, Reve, based in Palo Alto, California. They came out of nowhere with a new model that scored top marks on third-party benchmarks. Highlights of this new model include exceptional realism and ability to render text accurately. It’s also very competitively priced, and all images are private by default. This is just their first release so it will be interesting to see where they go next.
  • Alas, Reve didn’t stay at the top of the charts for very long though, because OpenAI launched a major upgrade to its multi-modal GPT-4o model, with an integrated image generator that replaces the venerable DALL-E 3. For several days the world was obsessed with using it to create Studio Ghibli versions of memes and vacation pics. The image quality is exceptional but it’s really slow and only generates one image at a time, so it’s more utilitarian and less capable as a creative exploration tool.
  • I felt a bit sorry for the team at Ideogram, who after several weeks of buildup launched their v3.0 model the day after the OpenAI announcement. I don’t think many people noticed. But on the bright side, at least they didn’t have to worry about their servers being overloaded and the GPUs melting.
  • Finally, after months of anticipation and repeated delays, Midjourney launched v7 Alpha last night. As the name suggests it’s not feature-complete, but they didn’t want to hold off the launch any longer. Instead the company hopes to push updates every week or so over the next couple months, as they continue to optimize the model and polish new features. The first new feature to launch is Draft mode which allows you to generate ideas quickly. Omni-reference is the next priority, which will allow for consistent characters and objects between images. 

Voice and Audio Generation

  • ElevenLabs launched a very cool new feature called Actor Mode for their Studio audio editor. Actor Mode lets you use your own voice to guide the delivery of their AI voices, allowing you to control the pacing, intonation, and emphasis of the AI-generated audio.
  • Stability AI, best known for its image generator Stable Diffusion, released Stable Audio 2.0 which can generate music tracks up to three minutes long from text descriptions. It also supports audio-to-audio generation, sound effects generation, and a feature called “Style Transfer” that allows you to modify newly generated or uploaded audio to customize the output’s theme. As a feel-good bonus, this new model was trained exclusively on licensed data from the AudioSparx music library which respects artist opt-out requests.
  • OpenAI launched a new voice called Monday. The new character has a black swirling circle (instead of blue) and is described as “Whatever”. If you’ve always wished your ChatGPT character was less peppy and more morose, this one is for you. This voice launched on April 1st, which is April Fools' Day, but they claim it's not a joke. Either way it's not very funny, but that seems to be the point.

Video Generation

  • Runway launched their Gen-4 series of AI video models, with the ability to create consistent characters, locations, and objects across camera angles, scenes, and lighting conditions. They also improved prompt adherence and the model’s understanding of physics for more realistic motion and object behaviours. With this new release it’s finally possible to create entire narrative films with consistent elements using AI.

Frontier models

The big labs continue to put out better models every few weeks, each trying to out-do the others.

  • Without much fanfare or garnering much attention, Google launched Gemini 2.5 Pro for Gemini Advanced subscribers. I’ve tried it and was impressed by it’s ability to generate a comprehensive report with minimal prompting.
  • OpenAI released an update to GPT-4o a couple days after they launched the new image generation feature. Supposedly it’s better at handling complex prompts, intuition, and creativity, and it uses less emojis. 🤔

Open Source models

Last month also saw a flurry of open source LLM (large language model) releases, from both big names and relative newcomers alike. These models aren't nearly as powerful or capable as the Frontier models (like ChatGPT, Gemini, and Claude), but they can be downloaded, modified, and used for free (assuming you’ve got access to the hardware needed to run them).

  • Reka Flash 3 is a new light-weight open-source (open-weights) reasoning model that compares with OpenAI’s o1-mini and Qwen’s QwQ-32B model. It’s billed as a general-purpose model that’s good at everything from general chats to coding.
  • One of the only fully open source LLM makers, Ai2, released OLMo 2 32B, a truly open source model (meaning the training data, code, and the model weights are all available for download). They claim that this accessible model matches GPT-3.5 and 4o-mini performance.
  • In the same week Google released Gemma 3, a somewhat open source model that they claim is so efficient it can run on a single GPU or TPU while beating much larger models on performance benchmarks. What is amazing is that it includes support for 35 languages (pre-trained to support 140), it’s multi-modal (can analyze images text and short videos) and it supports function calling (for agentic tasks).
  • Finally, obviously not wanting to feel left out, OpenAI has formally announced that they too will launch an open source model at some point “in the coming months”, but first they want to know what you’ll use it for so they’re asking the public for input.

Prompting Tips

A comprehensive study on prompting techniques was released last month with some interesting insights. Here are the key takeaways:

  • Asking a model the same question 100 times will often generate different answers.
  • Sometimes being polite helps, sometimes it doesn’t.
  • Sometimes being authoritative helps, sometimes it doesn’t.
  • Providing the LLM with instructions on how to format the response generally improves answers.
  • Depending on the context, any given request requires a different structure and style of prompt to get the best answer. There is no magic formula.
  • The only guaranteed approach to getting the best possible output is to ask several times in different ways then compare the answers.

Designing for AI

While many designers are now using AI as part of their workflow, some designers are having to figure out how to design AI interfaces. An article on the UX Collective blog takes an interesting look at three companies’ efforts to incorporate AI-based components and patterns into their design systems.

That’s all for this month! Let me know what’s resonated with you lately, either in the comments, or send me an email. Thanks!