How is the output of a maxpool layer window size=1x2 and stride=2 calculated?

Question

I'm looking at the architecture proposed in the following paper: Baoguang Shi et al, An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition.

In the proposed architecture of the model, a MaxPooling Window:1 × 2, s:2 layer is mentioned. I'm not sure what the size of the output of this layer would be.

If i have an input of size (32 x 8), then the output would be:

(32-1)/2 + 1 = 16.5, <- this part doesn't make sense to me

(8-2)/2 + 1 = 4

*ignoring depth and batch size here

check https://stackoverflow.com/q/37674306/10899915 – Uday Oct 17 '19 at 03:36 — Uday, Oct 17 '19 at 03:36

Fortune Seeker · Answer 1 · 2020-07-24T09:05:33.860

0

According to the paper, maybe "s" represents stride in row, while the stride in column equals 1.

edited Jul 24 '20 at 09:05

answered Jul 24 '20 at 09:00

Fortune Seeker

1
1

How is the output of a maxpool layer window size=1x2 and stride=2 calculated?

1 Answers1