ESRGAN Upresser - AI Development

I started my dive into AI in 2008 writing a Boid / Crowd system for my thesis while in art college, School of Visual Arts.
It was an insane particle script + 3d animation cycles in Maya haha.
Then I did Boid movement, navigation, & obstacle detection in animated films for 5 years at Blue Sky Studios, using Houdini.

I dove into Style-Transfer AI & Long Short-Term Memory (LSTM) training in 2019-2020,
Like making a Node.js server (web site) understand my voice & auto google search for me.

Since then, I've been developing different multi-media AI structures in my spare time.

In 2015 I decided I'd cram a machine learning AI into a single-board computer, a Jetson TK1, by the end of 2026.
   Something that could write down what I say,
   Use vision to understand an object simply went out of frame.
     Yet "knows" if it looks over, the object is still there; 'Attention'

At the end of 2023, this evolved into a deep learning AI crammed into, likely, a Jetson Nano.
   As something to infer what I mean, from what I say,
   Or give a "thought" on what it saw or heard in the world around it.

'Machine Learning' is AI that can learn basic patterns.
'Deep Learning' is Machine Learning,
But uses neural networks to form patterns of patterns.

Realistically, I'd just be happy to make something that can understand what I say and can give a semi coherent response without an internet connection.

As of May 24th 2025, I've started on the core of the AI,
   But still testing different structure's ability in adapting to stimuli.
   ... It really seems like any network could work for most things, but some are better than others per task.

You could guess,
All the recent AI hullabaloo (2019-...)
Has been quite serendipitous for my creation!

ESRGAN Image Upresser!
This was a fun one for me, I've been using ESRGANs for a while now,
And wanted to build a GAN to better understand how they work.

ESRGANs are a type of Generative Adversarial Network (GAN),
An 'Enhanced Super Resolution GAN' to be specific.
They are used to upscale images, making them larger and clearer.
Like in FBI shows where they enhance the security footage,
Enhance..... Enhance! .... ENHANCE!

So I built an ESRGAN, more specifically a 'Real-ESRGAN',
   Which learns from synthetic data;
     Images with noise added.
   Making guesses about the original image + variances / static / compression issues / etc.
     Learning to clean up & enhance the input image.

This always seemed like magic to me,
   Figuring out the associations between pixels in an image,
And then using those associations to create a larger, clearer image.

In this video, you'll see 4 images and the 'Training Loss' or 'Discriminator Loss' graphs. Input Noise, Low Resolution Image, the Upresser Output, and the Original Image.
   The graph shows how well the GAN is learning to generate realistic images.

The training is being done by a Generator AI and a Discriminator AI.
The Generator creates images, and the Discriminator checks if they look like the original images.

As the training progresses, the Generator gets better at creating realistic images,
   You can see how well the AI Upress looks after just a few epochs.

But, it takes a lot of training before it has a good understanding of the images.
Once the Generator has a reasonable understanding of the images,
And the Discriminator has a good understanding of what a real image looks like,
The two ai's begin to work together, becoming 'balanced' in their understanding.

This is what happens at the end of the video here.
   The two converge on an 'understanding' of the image,
And the Generator starts to create images that look even closer to the original image.

The biggest aspect of a GAN is the 'adversarial' part,
The Generator and Discriminator are constantly trying to outsmart each other.
The Generator tries to create images that look like the original images,
And the Discriminator tries to figure out if the images are real or fake.

As they train, they get better and better at their tasks.

What's not shown here?
I implemented a 'memory supported' training method.

Should confidence change too drastically for too many epochs,
   Or loss increases too much too quickly,

The training will adapt by 'remembering' the last 'good direction' of changing pixels.
   Allowing the model to maintain a sense of continuity,
     Even if it loses confidence in itself.

As a result, the Generator seemed to learn faster and smoother.
   If this is causing a negative effect, I'm yet to see it.
   More testing is needed with larger datasets and more complex images.

I only implemented this 'memory support' for the Generator,
   As the Discriminator is more of a 'check' and doesn't need to remember past states.
   Who knows, perhaps if I grow this AI further, I may need to implement a memory for the Discriminator as well.

But it seems to be working so far!