
2xLiveActionV1_SPAN
SPAN model for live action film and digital video. The main goal is to fix/reduce common video quality problems while maintaining fidelity. I tried the existing video-focused models and they all denoise or cause colour shifts so I decided to train my own.
The model is trained with compression (JPEG, MPEG-4 ASP, H264, VP9, H265), chroma subsampling, blurriness from multiple scaling, uneven horizontal and vertical resolution, oversharpening halos, bad deinterlacing jaggies, and onscreen text. It is not trained to remove noise at all so it preserves details in the source well. To prevent colour/brightness shifts, I used consistency loss in neosr. I had to modify consistency loss to use a stronger blur so it doesn't interfere with the halo removal.
Limitations:
- The model has limited ability to see details through heavy grain, but light to moderate grain is fine.
- The model still does not handle bad deinterlacing perfectly, especially if the source is vertically resized. Fixing bad deinterlacing is not the main goal so it is what it is. Sources that are line-doubled throughout should be descaled back to half height first for best results.
- The model sometimes oversharpens a little. This is probably because the training data has some oversharpened images.
- This model generally cannot handle VHS degradation.
More comparisons: https://slow.pics/c/DtDN7gaq
The training config and image degradation scripts used to create training data can be found in https://github.com/jcj83429/upscaling/tree/9332e7d5b07747ff347e5abdc43f8144364de9f7/2xLiveActionV1_SPAN
Architecture | SPAN |
---|---|
Scale | 2x |
Size | 48nf |
Color Mode | |
License | CC-BY-NC-SA-4.0 Private use Distribution Modifications Credit required Same License State Changes No Liability & Warranty |
Date | 2025-05-19 |
Dataset | nomosv2 |
Dataset size | 36000 |
Training iterations | 490000 |
Training epochs | 271 |
Training batch size | 20 |
Training HR size | 128 |
Training OTF | No |