5

My goal is to create simple geometric line drawings in pure black and white. I do not need gray tones. Something like this (example of training image):

enter image description here

But using that GAN it produces gray tone images. For example, here is some detail from a generated image.

enter image description here

I used this Pytorch based Vanilla GAN as the base for what I am trying to do. I suspect my GAN is doing far too much work calculating all those floats. I'm pretty sure it is normalized to use numbers between -1 and 1 inside the nn? I have read it is a bad idea to try to using 0 and 1 due to problems with tanh activation layer. So any other ideas? Here is the code for my discriminator and generator.

image_size=248
batch_size = 10
n_noise = 100
class Discriminator(nn.Module):
    """
        Simple Discriminator w/ MLP
    """
    def __init__(self, input_size=image_size ** 2, num_classes=1):
        super(Discriminator, self).__init__()
        self.layer = nn.Sequential(
            nn.Linear(input_size, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, num_classes),
            nn.Sigmoid(),
        )

    def forward(self, x):
        y_ = x.view(x.size(0), -1)
        y_ = self.layer(y_)
        return y_

Generator:

class Generator(nn.Module):
    """
        Simple Generator w/ MLP
    """
    def __init__(self, input_size=batch_size, num_classes=image_size ** 2):
        super(Generator, self).__init__()
        self.layer = nn.Sequential(
            nn.Linear(input_size, 128),
            nn.LeakyReLU(0.2),
            nn.Linear(128, 256),
            nn.BatchNorm1d(256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 512),
            nn.BatchNorm1d(512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 1024),
            nn.BatchNorm1d(1024),
            nn.LeakyReLU(0.2),
            nn.Linear(1024, num_classes),
            nn.Tanh()
        )

    def forward(self, x):
        y_ = self.layer(x)
        y_ = y_.view(x.size(0), 1, image_size, image_size)
        return y_

What I have so far pretty much consumes all the available memory I have so simplifying it and / or speeding it up would both be a plus. My input images are 248px by 248px. If I go any smaller than that, they are no longer useful. So quite a bit larger than the MNIST digits (28x28) the original GAN was created over. I am also quite new to all of this so any other suggestions are also appreciated.

EDIT: What I have tried so far. I tried making the final output of the Generator B&W by making the output binary (-1 or 1) using this class:

class Binary(nn.Module):
    def __init__(self):
        super(Binary, self).__init__()

    def forward(self, x):
        x2 = x.clone()
        x2 = x2.sign()
        x2[x2==0] = -1.
        x = x2
        return x

And then I replaced nn.Tanh() with Binary(). It did generate black and white images. But no matter how many epochs, the output still looked random. Using grayscale and nn.Tanh() I do at least see good results.

Todd Chaffee
  • 103
  • 7
  • Hi, your output is continuous because your final layer is a Tanh. Please check https://pytorch.org/docs/stable/nn.html#tanh for a reference of what that looks like. Normally in these types of applications you put in a final convolution of x,y,1. – S van Balen Apr 01 '20 at 20:57
  • I had already tried that. See the edit for what I did. I didn't add that at first because I wanted to see what others would come up with. – Todd Chaffee Apr 01 '20 at 23:32

1 Answers1

2

The output is going to be "continuous" if you don't use it for "classification".

You can follow few approaches here:

  • This would be the simplest approaches but you probably need some post processing to fill holes in lines:

    1. Define a Threshold to make the image binary;
    2. Redefine the last layer to output a image with 2 channels and define your final image by taking the index of the maximum channel value for each pixel (as it was a classification problem)
  • You can also redefine your problem:

PS:Every time a mention "line" from now on I actually mean line segment.

Given that your images are a set of geometric figures, which can all be broken down to lines and any line can be described by a set of 4 numbers, you can use a RNN to generate the set of lines (pair of points) that can be drawn easily afterwards.

To do so you need to rewrite your training set:

  • Define a line width for your images
  • Based on that line width you can set a number of line orientations possible (for example, with width of one pixel you can only draw lines in 4 orientations 0, 45, 90 135 degrees)
  • Use that to find every point that define line (start and end) by finding points where these orientation changes
  • To transform the set of points into a set of lines you can check if there is a black pixel in that orientation that is in the interval.

Binary Layer Issue

Your binary layer implements a Heaveside Step function which will kill the gradient since it's derivative is 0 for all values other than x=0. In x=0 the derivative is actually not defined (limit goes to infinity).

Thus your weights can't update, so every forward pass will filter data with random weights, causing random effects to the output.

Check this answer to understand more about why we don't use Heaveside Step function as an activation function on neural networks

Pedro Henrique Monforte
  • 1,606
  • 1
  • 11
  • 26
  • Thanks for your answer. I tried your first approach already. Can you please take a look at my Binary class in my answer? For some reason when I do that, the NN stops learning. Every epoch just produces what looks like noise. – Todd Chaffee Apr 02 '20 at 12:47
  • Todd, that first approach must be done only to test performance in the validation, if ou training using a step function (thresholding) you won't be able to backpropagate properly... Train using the tanh but test/validate using the binary layer – Pedro Henrique Monforte Apr 02 '20 at 18:46
  • Can you point me to some articles or other resources that explain why the back propagation won't work with a black & white (i.e -1, 1) output? Maybe I need to build a different type of NN. But the idea that an NN couldn't work on binary input / output doesn't seem right. – Todd Chaffee Apr 02 '20 at 21:27
  • 1
    The problem of a Heaveside Step function (binarization) will kill the gradient (since it's derivative is 0 for all values other than x=0, where it is not defined). Check this https://stats.stackexchange.com/questions/271701/why-is-step-function-not-used-in-activation-functions-in-machine-learning/318772 – Pedro Henrique Monforte Apr 02 '20 at 21:39
  • The lightbulb goes on! Now I finally understand why I'm getting random output and no learning. I'm going to go with grayscale for training, but reduce to b&w for the outputted progress. – Todd Chaffee Apr 02 '20 at 21:43
  • Yep, the random output is because you filtering with random filters (the initial kernels) which will always be update with 0 (or in the worst case, with NaN) – Pedro Henrique Monforte Apr 02 '20 at 21:50
  • So how would you go about building an NN for a collection of binary data? Plenty of data is binary, so it seems like it would be useful and that someone would have figured out how to do back propagation in a different way to accommodate this? Or no? – Todd Chaffee Apr 02 '20 at 21:53
  • this is getting a bit long for comment section. But usually we deal with binary data by post-processing or using it as a classification problem. Let's move this to a chat. – Pedro Henrique Monforte Apr 02 '20 at 21:56
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/106257/discussion-between-pedro-henrique-monforte-and-todd-chaffee). – Pedro Henrique Monforte Apr 02 '20 at 21:56
  • 1
    Could you edit your answer to include the point about the problem of a Heaveside Step function (binarization), along with the link? That was the key point for me. – Todd Chaffee Apr 07 '20 at 19:56