16

I want to make a CNN model in Keras which can be fed images of different sizes. According to other questions, I could understand how to set a model, like Input =(None,None,3). However, I'm not sure how to prepare the input/output datasets. Concretely, now I want to combine the datasets with (100,100) and (240,360). However, I don't know how to combine these datasets.

Green Falcon
  • 13,868
  • 9
  • 55
  • 98
kainamanama
  • 311
  • 1
  • 3
  • 8

6 Answers6

11

Conventionally, when dealing with images of different sizes in CNN(which happens very often in real world problems), we resize the images to the size of the smallest images with the help of any image manipulation library (OpenCV, PIL etc) or some times, pad the images of unequal size to desired size. Resizing the image is simpler and is used most often.

As mentioned by Media in the above answer, it is not possible to directly use images of different sizes. It is because when you define a CNN architecture, you plan as to how many layers you should have depending on the input size. Without having a fixed input shape, you cannot define architecture of your model. It is therefore necessary to convert all your images to same size.

  • 2
    Actually, we don't resize images to the the smallest one size, but to the same size of the Cnn's inputs ! (plus you can change the sort of the answers on this website, there is no evidence that Media answer always be above yours ! ) – Jérémy Blain Oct 31 '18 at 14:07
  • 2
    Thanks @JérémyBlain. I think when we build CNN architecture based on our dataset, we resize the images to the least size of all images in the dataset. But, when we already have a CNN architecture defined, then as you said, we resize the images to the input size of CNN. So, the size depends on whether we already have a CNN or we building a CNN for this particular dataset. Please correct me if I am wrong. – Amruth Lakkavaram Nov 01 '18 at 03:57
  • I think you're right :) I don't really know if the image are resized to the smallest (in practice), but I think it's the best way to do it (it is better to lose some informations than recreate ones, with possibly conflict or artifact !) – Jérémy Blain Nov 01 '18 at 13:36
  • I don't agree with those statements: in theory you can define a CNN without taking into account the size of the input... The weights and the biases are related to the shape of the filter kernels, not to the image shape. Indeed, you can use the same CNN for a 255x255 and for a 1024x1024 images, isn it? What we can't do with the majority of the API is to use the same network for different image sizes at the same time. The thing is that, in practical implementations, it is an arduous task handling variable sized data (allocating memory in the gpu, transfer data between cpu/gpu) – ignatius Nov 08 '18 at 12:10
  • With fully convolutional networks, image sizes can vary batch to batch with no problem. – Alexander Soare Nov 16 '20 at 19:46
4

There is a way to include both image sizes. You can preprocess your images so that they are re-sized to the same dimensions.

Some of the freely available code that shows this:

img_width, img_height = 150, 150

train_data_dir = '/yourdir/train'
validation_data_dir = '/yourdir/validation'
nb_train_samples = 
nb_validation_samples = 
epochs = 50
batch_size = 16

if K.image_data_format() == 'channels_first':
    input_shape = (3, img_width, img_height)
else:
    input_shape = (img_width, img_height, 3)



model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.3))
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])


train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary')

This uses the Keras image flow API for data augmentation on the fly, and the data generators at the bottom of the code will adjust your images to whatever dimensions you specify at the top.

  • 1
    This is very interesting! How does it deal with images that have different proportions? For example, if you specify 150 x 150, how does it adjust an image that has 150 x 300, does the final image become shrinked, out of proportion? – Ruthger Righart Feb 03 '21 at 16:05
2

There is a concatenate function in Keras (docs and docs).
Also see this paper. Its application can be seen here and here.

This method can be used to have multiple input channels with different image sizes.

Zephyr
  • 997
  • 4
  • 10
  • 20
rnso
  • 1,558
  • 3
  • 16
  • 34
1

One way is to pad the images while training. That is to say, while training, Keras will expect all tensors in a batch to be of the same size. However, while inference, if you use only a single image, it can be of any size. So what you can do while training is to pad your 100 x 100 images so that their new dimension after padding becomes 240 x 360.

You can have a look at this tutorial.

Ethan
  • 1,625
  • 8
  • 23
  • 39
user117206
  • 11
  • 1
0

At least, as far as I know, you can't. The reason is clear. In neural networks, you attempt to find appropriate weights to diminish a typical cost function. You have to find appropriate weights for a specified number of predefined weights. When you specify an input shape, the rest of the network weights will depend on the weights of input. You can't change the input size of a network. In other words, you can't feed your network with different input sizes for convolutional networks. A typical solution for dealing with such situations is to resize the input.

Green Falcon
  • 13,868
  • 9
  • 55
  • 98
  • No. A convolutional network takes a fixed size kernel and gives an output who's size depends on the size of the input. This means that you can use multiple input sizes, it will just give you multiple output sizes. Something like a Spatial Pyramid Pooling layer could then give you a fixed size final result. – MegaTom Mar 27 '19 at 22:06
  • dear @MegaTom, the point is that suppose you have a conv network which is SAME, and also suppose your data is $32 \times 32$. If you expect your network to be applied on an input with a similar size, the output will be a single value. Now suppose you increase the number of pixels of the input. Now, the number of outputs will increase. For the rest of the network you are supposed to connect these outputs to dense layers. These connections need weight, variable. If you just have a single value, you need a single weight. If you increase the size of the input, the connection of dense layers and – Green Falcon Mar 27 '19 at 22:18
  • ... dense layers and convolutional layers will need different weight. This is why you *cannot* feed inputs of different sizes to convolutional layers in CNNs due to having conv to dense connections. – Green Falcon Mar 27 '19 at 22:19
  • @Media, what you are saying makes sense. But then why many people upvoted [this answer](https://stackoverflow.com/a/41092113/6013016) by supporting the possibility of different size inputs? – Scott Sep 25 '20 at 02:54
  • @Sherzod Fully convolutional networks and CNNs are different. In CNNs, you are supposed to provide dense layers after convolutional layer. Read the above comment that I've spoken about. – Green Falcon Sep 25 '20 at 05:04
  • @Royi Are you familiar with autoencoders? – Green Falcon Oct 03 '20 at 13:44
  • @Media, I am, but why is matter? Fully CNN (Namely only Convolution layers and Element Wise Activations) should support varied size input. Isn't that what the OP is after? – Royi Oct 03 '20 at 14:20
  • @Royi have you read the second comment below my answer? I've explained it there. – Green Falcon Oct 03 '20 at 20:07
  • Media, I think, as @MegaTon pointed, that for FCNN you can have a net with no pre defined input size. You can support any size along the convolution. So I don't get your remark to me. – Royi Oct 04 '20 at 05:17
  • @Royi Yes, I've mentioned that what I've referred to is related to pure CNN. I've directly stated that, but I don't know which part is not clear. FCNN are not simple CNN. Usual CNNs are accompanied by dense layers. – Green Falcon Oct 04 '20 at 17:18
0

There are some ways to deal with it but they do not solve the problem well. You can use black pixels, special values for nan, resizing and a separate mask layer that says where the information on the picture is. But most likely they are working not so well. Otherwise the image datasets would have images of different sizes. Separate layers for masks is used in the currently best image recognition neural network (SENet. Hu et al. Winner of ImageNet in 2017). But they use masking for zooming into the picture and not for different image sizes.

keiv.fly
  • 1,239
  • 8
  • 14