Documentation

Datasets

A collection of different sources for Datasets.

DIV2K
- Free | 800 files
- 800 HQ real-world images
SYNLA
- Free | 2,448 files
- This dataset is designed to simulate complex line art.
Quixel Megascans
- Limited Free | 9,890 files
- Discover a world of unbounded creativity. Explore a massive asset library, and Quixel’s powerful tools, plus free in-depth tutorials and resources.
Adobe Stock
- $29.99+/Month | 300,000,000 files
- Stock photos, royalty-free images, graphics, vectors & videos
Pexels
- Free
- Free stock photos you can use everywhere. ✓ Free for commercial use ✓ No attribution required
Poliigon
- $16+/Month | 3,069 files
CC0 Textures
- Free | 1,914 files
- Textures with Normal Maps, Displacement Maps and others in form of JPGS. Most seem to be 8K. The name of the website comes from the license.
Textures.com
- Limited Free | 134,872 files
- Textures for 3D, Graphic Design and Photoshop 15 Free downloads every day!
texturehaven
- Free | 133 files
- 100% Free High Quality Textures for Everyone
Gustave Doré's 1866 Bible Illustrations
- Free | 241 files
- High-Res Scans of Gustave Doré's 1866 Bible Illustrations
texturelib
- Limited Free | 6,605 files
- Library of quality high resolution textures. Free for personal and commercial use.
Vimeo-90k
- Free | 89,800 files
- This dataset consists of 89,800 video clips downloaded from vimeo.com, which covers large variety of scenes and actions. It is designed for the following four video processing tasks: temporal frame interpolation, video denoising, video deblocking, and video super-resolution.
Triplet (for temporal frame interpolation)
- Free | 73,171 files
- The triplet dataset consists of 73,171 3-frame sequences with a fixed resolution of 448 x 256, extracted from 15K selected video clips from Vimeo-90k. This dataset is designed for temporal frame interpolation.
Septuplets
- Free | 91,701 files
- The septuplet dataset consists of 91,701 7-frame sequences with fixed resolution 448 x 256, extracted from 39K selected video clips from Vimeo-90k. This dataset is designed to video denoising, deblocking, and super-resolution.
falcoon300
- Free | 1,233 files
- LyonHrt: As it has been mentioned, here is the almost complete works of falcoon, as used in the falcoon300 model, this has a selection of 1233 images from original source, I should add there are some scantly clad woman, so nsfw lol.
Danbooru2018
- Free | 3,330,000 files
- Danbooru2018 is a large-scale anime image database with 3.33m+ images annotated with 99.7m+ tags; it can be useful for machine learning purposes such as image recognition and generation.
Flickr1024
- Free | 2,048 files
- Flickr1024 is a large stereo dataset, which consists of 1024 high-quality images pairs and covers diverse scenarios. This dataset can be employed for stereo image super-resolution (SR).
Flickr2K
- Free
- Huge dataset that is being used to train a lot of models.
Caltech Game Covers
- Free | 11,400 files
- Found a dataset of game covers, but there's a ton of duplicates. If I can figure out how to parse the text files and remove the dupes, I'll upload the trimmed down version.
outdoor scene training
- Free | 8,137 files
- outdoor scene training is huge, just the first file has 2,187 pictures (8,137 total)
Nomos2k
- Free | 2,536 files
- A dataset made by musl on the Game Upscale Discord server. Description: A dataset containing 2536 images of 2000px. I hand selected it from multiple sources, based on the following criteria: High signal-to-noise ratio (low noise), diverse, sharp (no motion blur, shallow DOF is ok), contains mixed and complex textures/shapes that cover most part of the image. Raw images were processed on rawtherapee using prebayer deconvolution, AMaZe and AP1 color space. Sources: Adobe-MIT-5k, RAISE, FFHQ, DIV2K, DIV8k, Flickr2k, Rawsamples, SignatureEdits, Hasselblad raw samples and Unsplash. KernelGAN was trained using DLIP on all images, with scale 4x and up to 5k iter, instead of 3k. Hopefully it increases the accuracy of kernels. All files are provided on "kernelgan" folder. Note: in order to use it on traiNNer, you have to give dataroot_kernels path, along with enabling realistic under the resizing presets. I've also made available my selected noise patches. They were extracted from multiple images "in the wild", with unknown degradation: Noise Patches I encourage everyone to give it a try and, if possible, mirror the dataset. For now it was made available on MEGA, but I plan to mirror it on other solutions. Sample Video

Other sources lists

If you have some time consider adding them to this list here.

Community datasets

Our community regularly creates and shares custom datasets. You can find them in the dataset-release channel on our Discord server.

TrainingContributing

2023-10-05Edit this page on GitHub