[Omni Aggregation Networks for Lightweight Image Super-Resolution (OmniSR)](https://github.com/Francis0625/Omni-SR)

While lightweight ViT framework has made tremendous progress in image super-resolution, its uni-dimensional
self-attention modeling, as well as homogeneous aggregation scheme, limit its effective receptive field (ERF) to
include more comprehensive interactions from both spatial and channel dimensions. To tackle these drawbacks,
this work proposes two enhanced components under a new Omni-SR architecture. First, an Omni Self-Attention (OSA) block is proposed based on dense interaction principle, which can simultaneously model pixel-interaction from both spatial and channel dimensions, mining the potential correlations across omni-axis (i.e., spatial and channel). Coupling with mainstream window partitioning strategies, OSA can achieve superior performance with compelling computational budgets. Second, a multi-scale interaction scheme is proposed to mitigate sub-optimal ERF (i.e., premature saturation) in shallow models, which facilitates local propagation and meso-/global-scale interactions, rendering an omni-scale aggregation building block. Extensive experiments demonstrate that Omni-SR achieves recordhigh performance on lightweight super-resolution benchmarks (e.g., 26.95dB@Urban100 ×4 with only 792K parameters). Our code is available at https://github.com/Francis0625/Omni-SR

OmniSR 2x DF2K

OmniSR 2x DIV2K

[Swift-SRGAN - Rethinking Super-Resolution for real-time inference](https://github.com/Koushik0901/Swift-SRGAN)

In recent years, there have been several advancements in the task of image super-resolution using the state of the art Deep Learning-based architectures. Many super-resolution-based techniques previously published, require high-end and top-of-the-line Graphics Processing Unit (GPUs) to perform image super-resolution. With the increasing advancements in Deep Learning approaches, neural networks have become more and more compute hungry. We took a step back and, focused on creating a real-time efficient solution. We present an architecture that is faster and smaller in terms of its memory footprint. The proposed architecture uses Depth-wise Separable Convolutions to extract features and, it performs on-par with other super-resolution GANs (Generative Adversarial Networks) while maintaining real-time inference and a low memory footprint. A real-time super-resolution enables streaming high resolution media content even under poor bandwidth conditions. While maintaining an efficient trade-off between the accuracy and latency, we are able to produce a comparable performance model which is one-eighth (1/8) the size of super-resolution GANs and computes 74 times faster than super-resolution GANs.

NOTE: The author used the wrong file extensions for the models *on GitHub*. You will download a `.pth.tar` file. This is not actually a TAR file. Change the file extension to just `.pth` and the model will work.

Swift-SRGAN 2x

[Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) aims at developing Practical Algorithms for General Image/Video Restoration.

Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
We extend the powerful ESRGAN to a practical restoration application (namely, Real-ESRGAN), which is trained with pure synthetic data.

RealESRGAN_x2Plus

DAT_2_X3

OmniSR 3x DF2K

OmniSR 3x DIV2K

[SwinIR: Image Restoration Using Swin Transformer](https://github.com/JingyunLiang/SwinIR)


Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by up to 0.14~0.45dB, while the total number of parameters can be reduced by up to 67%.

SwinIR-M-x3 (classicalSR-DF2K-s64w8)

SwinIR-M-x3 (classicalSR-DIV2K-s64w8)

SwinIR-M-x3 (lightweightSR-DIV2K-s64w8)

Nice to release a model again! This one is intended for realistic imagery, and works especially well on faces, hair, and nature shots. It should only be used on somewhat clear shots, without a lot of grain. I trained this model on SPAN, which as of the time of release, you'll need chaiNNer-nightly for. I aimed for a softer, more natural look for this model with as few artifacts as possible.

In addition to the Normal model, I've included a "soft" model. The Soft model is... softer. Basically it was an earlier version of the model with a more limited dataset. It produces more natural output on games or rendered content, but suffers a bit more with realistic stuff.

Note: In shots with DOF (depth of field) or bokeh, unfortunately there will be artifacts. 

Compatibility: You'll have to use the latest chaiNNer-nightly to use this model

ClearRealityV1

Official Paper pretrain model.

See https://github.com/XPixelGroup/HAT for more details.

HAT-L_SRx4_ImageNet-pretrain

Official Paper model.

See https://github.com/XPixelGroup/HAT for more details.

HAT-S_SRx4

[Link to Github Release](https://github.com/Phhofm/models/releases/tag/4xNomos2_hq_atd)

4xNomos2_hq_atd  
Scale: 4  
Architecture: [ATD](https://github.com/LabShuHangGU/Adaptive-Token-Dictionary)  
Architecture Option: [atd](https://github.com/muslll/neosr/blob/dc4e3742132bae2c2aa8e8d16de3a9fcec6b1a74/neosr/archs/atd_arch.py#L891)  

Author: Philip Hofmann  
License: CC-BY-0.4  
Purpose: Upscaler  
Subject: Photography  
Input Type: Images  
Release Date: 05.09.2024  

Dataset: [nomosv2](https://github.com/muslll/neosr/?tab=readme-ov-file#-datasets)  
Dataset Size: 6000  
OTF (on the fly augmentations): No  
Pretrained Model: 003_ATD_SRx4_finetune  
Iterations: 180'000  
Batch Size: 2  
Patch Size: 48  
Norm: true  

Description:   
An atd 4x upscaling model, similiar to the [4xNomos2_hq_dat2](https://github.com/Phhofm/models/releases/tag/4xNomos2_hq_dat2) or [4xNomos2_hq_mosr](https://github.com/Phhofm/models/releases/tag/4xNomos2_hq_mosr) models, trained and for usage on non-degraded input to give good quality output.

4xNomos2_hq_atd

[Link to Github Release](https://github.com/Phhofm/models/releases/tag/4xNomos2_hq_dat2)

# 4xNomos2_hq_dat2   
Scale: 4   
Architecture: [DAT](https://github.com/zhengchen1999/dat)   
Architecture Option: [dat2](https://github.com/muslll/neosr/blob/5fba7f162d36052010169e6517dec3b406c569ab/neosr/archs/dat_arch.py#L1111)   

Author: Philip Hofmann   
License: CC-BY-0.4   
Purpose: Upscaler   
Subject: Photography   
Input Type: Images   
Release Date: 29.08.2024   

Dataset: [nomosv2](https://github.com/muslll/neosr/?tab=readme-ov-file#-datasets)   
Dataset Size: 6000   
OTF (on the fly augmentations): No   
Pretrained Model: DAT_2_x4   
Iterations: 140'000   
Batch Size: 4   
Patch Size: 48   

Description:    
A dat2 4x upscaling model, similiar to the [4xNomos2_hq_mosr](https://github.com/Phhofm/models/releases/tag/4xNomos2_hq_mosr) model, trained and for usage on non-degraded input to give good quality output.    

I scored 7 validation outputs of each of the 21 checkpoints (10k-210k) of this model training with 68 metrics.    
[The metric scores can be found in this google sheet](https://docs.google.com/spreadsheets/d/1NL-by7WvZyDMHj5XN8UeDALVSSwH70IKvwV65ATWqrA/edit?usp=sharing).   
The corresponding image files for this scoring can be [found here](https://drive.google.com/file/d/1ZTp9fBMeawftNqzg4RN9_zIvHtul5jVc/view?usp=sharing)     
Screenshot of the google sheet:     
![|100](https://i.slow.pics/VZJTsrUv.webp)

Release checkpoint has been selected by looking at the scores, manually inspecting, and then getting responses on discord to this quick visual test, A B or C, which denote different checkpoints: https://slow.pics/c/8Akzj6rR   

Checkpoint B is the one released here, but you can also try out [Checkpoint A](https://github.com/Phhofm/models/releases/download/4xNomos2_hq_dat2/4xNomos2_hq_dat2_150000.pth) or [Checkpoint C](https://github.com/Phhofm/models/releases/download/4xNomos2_hq_dat2/4xNomos2_hq_dat2_10000.pth) if you like them better.

## Model Showcase:
[Slowpics](https://slow.pics/c/yuue9WpF)

4xNomos2_hq_dat2

[Link to Github Release](https://github.com/Phhofm/models/releases/tag/4xNomos2_hq_drct-l)

4xNomos2_hq_drct-l  
Scale: 4  
Architecture: [DRCT](https://github.com/ming053l/DRCT)  
Architecture Option: [drct_l](https://github.com/muslll/neosr/blob/dc4e3742132bae2c2aa8e8d16de3a9fcec6b1a74/neosr/archs/drct_arch.py#L937)  

Author: Philip Hofmann  
License: CC-BY-0.4  
Purpose: Upscaler  
Subject: Photography  
Input Type: Images  
Release Date: 08.09.2024  

Dataset: [nomosv2](https://github.com/muslll/neosr/?tab=readme-ov-file#-datasets)  
Dataset Size: 6000  
OTF (on the fly augmentations): No  
Pretrained Model: DRCT-L_X4  
Iterations: 200'000  
Batch Size: 2  
Patch Size: 64  

Description: 
An drct-l 4x upscaling model, similiar to the [4xNomos2_hq_atd](https://github.com/Phhofm/models/releases/tag/4xNomos2_hq_atd), [4xNomos2_hq_dat2](https://github.com/Phhofm/models/releases/tag/4xNomos2_hq_dat2) and [4xNomos2_hq_mosr](https://github.com/Phhofm/models/releases/tag/4xNomos2_hq_mosr) models, trained and for usage on non-degraded input to give good quality output.   

Model Showcase:
[Slowpics](https://slow.pics/c/1mODuuUS)

4xNomos2_hq_drct-l

[Link to Github Release](https://github.com/Phhofm/models/releases/tag/4xNomos2_hq_mosr)

# 4xNomos2_hq_mosr  
Scale: 4  
Architecture: [MoSR](https://github.com/umzi2/MoSR)  
Architecture Option: [mosr](https://github.com/umzi2/MoSR/blob/95c5bf73cca014493fe952c2fbc0bdbe593da08f/neosr/archs/mosr_arch.py#L117)  

Author: Philip Hofmann  
License: CC-BY-0.4  
Purpose: Upscaler  
Subject: Photography  
Input Type: Images  
Release Date: 25.08.2024  

Dataset: [nomosv2](https://github.com/muslll/neosr/?tab=readme-ov-file#-datasets)  
Dataset Size: 6000  
OTF (on the fly augmentations): No  
Pretrained Model: [4xmssim_mosr_pretrain](https://github.com/Phhofm/models/releases/tag/4xmssim_mosr_pretrain)  
Iterations: 190'000  
Batch Size: 6  
Patch Size: 64  

Description:   
A 4x [MoSR](https://github.com/umzi2/MoSR) upscaling model, meant for non-degraded input, since this model was trained on non-degraded input to give good quality output.   

If your input is degraded, use a 1x degrade model first. So for example if your input is a .jpg file, you could use a 1x dejpg model first.  

Model Showcase:  [Slowpics](https://slow.pics/c/cqGJb0gT)

4xNomos2_hq_mosr

[Link to Github Release](https://github.com/Phhofm/models/releases/4xNomos8k_atd_jpg)

Name: 4xNomos8k_atd_jpg   
License: CC BY 4.0   
Author: Philip Hofmann   
Network: [ATD](https://github.com/LabShuHangGU/Adaptive-Token-Dictionary)   
Scale: 4   
Release Date: 22.03.2024   
Purpose: 4x photo upscaler, handles jpg compression   
Iterations: 240'000   
epoch: 152   
batch_size: 6, 3   
HR_size: 128, 192   
Dataset: nomos8k   
Number of train images: 8492   
OTF Training: Yes   
Pretrained_Model_G: 003_ATD_SRx4_finetune   

Description:
4x photo upscaler which handles jpg compression. This model will preserve noise. Trained on the very recently released (~2 weeks ago) Adaptive-Token-Dictionary network.   

Training details: 
AdamW optimizer with U-Net SN discriminator and BFloat16.
Degraded with otf jpg compression down to 40, re-compression down to 40, together with resizes and the blur kernels.  
Losses: PixelLoss using CHC (Clipped Huber with Cosine Similarity Loss), PerceptualLoss using Huber, GANLoss, [LDL](https://github.com/csjliang/LDL) using Huber, YCbCr Color Loss (bt601) and Luma Loss (CIE XYZ) on [neosr](https://github.com/muslll/neosr).

7 Examples:
[Slowpics](https://slow.pics/s/uwnoI435)

 4xNomos8k_atd_jpg

OmniSR 4x DF2K

OmniSR 4x DIV2K

[Structure-Preserving Super Resolution with Gradient Guidance](https://github.com/Maclory/SPSR)

Structures matter in single image super resolution (SISR). Recent studies benefiting from generative adversarial network (GAN) have promoted the development of SISR by recovering photo-realistic images. However, there are always undesired structural distortions in the recovered images. In this paper, we propose a structure-preserving super resolution method to alleviate the above issue while maintaining the merits of GAN-based methods to generate perceptual-pleasant details. Specifically, we exploit gradient maps of images to guide the recovery in two aspects. On the one hand, we restore high-resolution gradient maps by a gradient branch to provide additional structure priors for the SR process. On the other hand, we propose a gradient loss which imposes a second-order restriction on the super-resolved images. Along with the previous image-space loss functions, the gradient-space objectives help generative networks concentrate more on geometric structures. Moreover, our method is model-agnostic, which can be potentially used for off-the-shelf SR networks. Experimental results show that we achieve the best PI and LPIPS performance and meanwhile comparable PSNR and SSIM compared with state-of-the-art perceptual-driven SR methods. Visual results demonstrate our superiority in restoring structures while generating natural SR images.

4x SPSR

Swift-SRGAN 4x

RealESRGAN_x4Plus

Real-World Super-Resolution via Kernel Estimation and Noise Injection

Our solution is the winner of CVPR NTIRE 2020 Challenge on Real-World Super-Resolution in both tracks.

Recent state-of-the-art super-resolution methods have achieved impressive performance on ideal datasets regardless of blur and noise. However, these methods always fail in real-world image super-resolution, since most of them adopt simple bicubic downsampling from high-quality images to construct Low-Resolution (LR) and High-Resolution (HR) pairs for training which may lose track of frequency-related details. To address this issue, we focus on designing a novel degradation framework for real-world images by estimating various blur kernels as well as real noise distributions. Based on our novel degradation framework, we can acquire LR images sharing a common domain with real-world images. Then, we propose a real-world super-resolution model aiming at better perception. Extensive experiments on synthetic noise data and real-world images demonstrate that our method outperforms the state-of-the-art methods, resulting in lower noise and better visual quality. In addition, our method is the winner of NTIRE 2020 Challenge on both tracks of Real-World Super-Resolution, which significantly outperforms other competitors by large margins.

for corrupted images with processing noise.

RealSR DF2K

Real-World Super-Resolution via Kernel Estimation and Noise Injection

Our solution is the winner of CVPR NTIRE 2020 Challenge on Real-World Super-Resolution in both tracks.

Recent state-of-the-art super-resolution methods have achieved impressive performance on ideal datasets regardless of blur and noise. However, these methods always fail in real-world image super-resolution, since most of them adopt simple bicubic downsampling from high-quality images to construct Low-Resolution (LR) and High-Resolution (HR) pairs for training which may lose track of frequency-related details. To address this issue, we focus on designing a novel degradation framework for real-world images by estimating various blur kernels as well as real noise distributions. Based on our novel degradation framework, we can acquire LR images sharing a common domain with real-world images. Then, we propose a real-world super-resolution model aiming at better perception. Extensive experiments on synthetic noise data and real-world images demonstrate that our method outperforms the state-of-the-art methods, resulting in lower noise and better visual quality. In addition, our method is the winner of NTIRE 2020 Challenge on both tracks of Real-World Super-Resolution, which significantly outperforms other competitors by large margins.

for real images taken by cell phone camera.

Architecture	OmniSR
Scale	3x
Size	64nf5nr
Color Mode	RGB
License	Apache-2.0 Private use Commercial use Distribution Modifications Credit required State Changes No Liability & Warranty Disclaimer
Date	2023-04-19
Dataset	DF2K
Training epochs	920
Training batch size	64

OmniSR 3x DF2K

Similar Models