-- Welcome! -- Synthetic Data and Generative AI -- Community of Learning and Practice

-- Welcome! -- Synthetic Data and Generative AI -- Community of Learning and Practice

Turn your device sideways if viewing on mobile.

View the timeline in its own window by clicking here.

Scroll Down for more fun links, and how to create your own synthetic data

MovieLens 1B Synthetic Dataset

MovieLens 1B is a synthetic dataset that is expanded from the 20 million real-world ratings from ML-20M, distributed in support of MLPerf. Note that these data are distributed as .npz files, which you must read using python and numpy.

The code for the expansion algorithm is available here: https://github.com/mlperf/training/tree/master/data_generation

To create the dataset above, we ran the algorithm (using commit 1c6ae725a81d15437a2b2df05cac0673fde5c3a4) as described in the README under the section “Running instructions for the recommendation benchmark”.

Permalink: https://grouplens.org/datasets/movielens/movielens-1b/

GANs Surge!

Generative Adversarial Networks (GANs) are essential. Thanks, Mr. Goodfellow!

They generate more samples for clinical trials to check validity, help track cybersecurity threats by augmenting network logs, detect fraud to protect your money, and are part of the foundation of the all-popular ChatGPT!

GAN Applications by Level of User:

Everyday User

Pay for Synthetic Data

Get Synthetic Datasets and Practice Datasets

Create Your Own GAN