Role overview
The generative neural networks (GANs) and probabilistic latent diffusion models recently have showed their efficiency in data generation and style transfer [1,2]. The main tasks of this internship are:
1) To learn a surrogate representation of one pair printer-scanner using existing large dataset of L3iTextCopies [3].
2) To experiment with different architectures of GANs and probabilistic diffusion approaches to identify the best method for our task.
3) To compare the pseudo-synthetic samples with real printed documents using some commonly used metrics as Pearson correlation, Mean square error (MSE) distance and Fréchet Inception Distance (FID) between the datasets [4].
4) To evaluate the possibility of fine-tuning the proposed models for unseen pairs of printer and scanner.
5) To create a public synthetic dataset of printed documents and if possible, to publish the results in the international conference or scientific journal.
01/04/2026