From lines to evocative fashion sketches using pix2pix
pix2pix is a machine learning model that uses conditional adversarial networks. It is used for generic image-to-image translation and was originally introduced by Phillip Isola and his team. pix2pix works with a training set that contains pairs of related images (an image of type “A” and an image of type "B”), and it learns how to convert “A” into “B”, or vice-versa.
For example, a black & white image - type A - can be converted to its color version - type B - by training the pix2pix network on a training set of corresponding color and black & white pairs to learn the pixel-to-pixel relationship between full-color images and their black & white counterpart. Once a pix2pix network has been trained on such a dataset, it could then be used to color arbitrary black & white images.
Below, is an example of the colorization model applied on a black & white image from the test set to generate a colored version of it. For comparison, the actual color image that it came with is shown in the middle to see how well the network is able to reconstruct the original color image (called the “target”).
Using this generic image transformation capability of pix2pix, I wanted to explore the concept of human-machine collaboration further by applying pix2pix to the fashion design process. In fashion, one of the most traditional ways of visualizing an idea is to sketch. Through quick sketches, designers communicate their ideas to the patternmaker, client and other designers; as such sketching is both an ideation tool and a communication tool. Oftentimes, the sketch is not a complete, accurate depiction of the idea since the idea itself is not final and is still in morphing state. So what the sketcher tries to convey is more the essence of the idea - the silhouette, the drape of materials, the color/texture and so on. The visual effect of the sketch in this sense is more evocative and less concerned with being technically accurate.
The generic nature of the pix2pix model means that as long as there is plenty of data, it can learn to understand the relationship between any two types of images. I decided to use this feature to create colored fashion sketches from minimal line drawings. To do this, I created a small dataset of fashion images showing the full body, in different postures, and traced over these images to create minimal line drawings. This produced two sets of data: one set of full color fashion images and another set of minimal line drawings. The original full color images all came from NET-A-PORTER.com - online retailer of designer clothing.
Then, I trained Phillip Isola's pix2pix model to learn the pixel-to-pixel relationship between the two datasets. I trained the model with 100 epochs (iterations) first, then trained it again (from scratch) with 1000 iterations to compare the results of different number of training iterations.
After 100 epochs
After 1000 epochs
It appears that for the kind of evocative sketch that we are after (see below example of a fashion sketch), 100 epochs is a little bit better as it does not try to guess specific sewing details (rather unsuccessfully in some cases) which seems to be creating distortions in the output image reducing its evocative qualities.
However, as noted by Aaron Hill in the midterm presentation feedback, this prototype highlights both the potential and constraints of using pix2pix for creating colored fashion sketches.