Computer Vision is a subfield of computer science which deals with analyzing and understanding images. This includes detection of objects like faces in images or segmenting images.
Questions tagged [computer-vision]
625 questions
45
votes
3 answers
What does the notation mAP@[.5:.95] mean?
For detection, a common way to determine if one object proposal was right is Intersection over Union (IoU, IU). This takes the set $A$ of proposed object pixels and the set of true object pixels $B$ and calculates:
$$IoU(A, B) = \frac{A \cap B}{A…
Martin Thoma
- 18,630
- 31
- 92
- 167
41
votes
2 answers
How to calculate mAP for detection task for the PASCAL VOC Challenge?
How to calculate the mAP (mean Average Precision) for the detection task for the Pascal VOC leaderboards?
There said - at page 11:
Average Precision (AP). For the VOC2007 challenge, the interpolated
average precision (Salton and Mcgill 1986) was…
Alex
- 639
- 1
- 7
- 13
28
votes
3 answers
Why convolutions always use odd-numbers as filter size
If we have a look to 90-99% of the papers published using a CNN (ConvNet).
The vast majority of them use filter size of odd numbers:{1, 3, 5, 7} for the most used.
This situation can lead to some problem: With these filter sizes, usually the…
Jonathan DEKHTIAR
- 590
- 2
- 5
- 10
27
votes
2 answers
What is the difference between semantic segmentation, object detection and instance segmentation?
I'm fairly new at computer vision and I've read an explanation at a medium post, however it still isn't clear for me how they truly differ.
Guilherme Marques
- 398
- 1
- 3
- 8
23
votes
4 answers
What is the difference between Inception v2 and Inception v3?
The paper Going deeper with convolutions describes GoogleNet which contains the original inception modules:
The change to inception v2 was that they replaced the 5x5 convolutions by two successive 3x3 convolutions and applied pooling:
What is the…
Martin Thoma
- 18,630
- 31
- 92
- 167
14
votes
3 answers
What is the difference between Dilated Convolution and Deconvolution?
These two convolution operations are very common in deep learning right now.
I read about dilated convolutional layer in this paper : WAVENET: A GENERATIVE MODEL FOR RAW AUDIO
and De-convolution is in this paper : Fully Convolutional Networks for…
Shamane Siriwardhana
- 827
- 1
- 8
- 25
13
votes
1 answer
What is difference between Fully Connected layer and Bilinear layer in CNN?
What is the difference between Fully Connected layers and Bilinear layers in deep learning?
N.IT
- 1,975
- 4
- 17
- 35
12
votes
2 answers
Class token in ViT and BERT
I'm trying to understand the architecture of the ViT Paper, and noticed they use a CLASS token like in BERT.
To the best of my understanding this token is used to gather knowledge of the entire class, and is then solely used to predict the class of…
Shir
- 221
- 2
- 5
12
votes
1 answer
Optimizer for Convolutional neural network
What is the best optimizer for Convolutional neural network (CNN)?
Can I use RMSProp for CNN or only for RNN?
Noran
- 758
- 3
- 8
- 21
12
votes
5 answers
Unsupervised image segmentation
I am trying to implement an algorithm where given an image with several objects on a plane table, desired is the output of segmentation masks for each object. Unlike in CNN's, the objective here is to detect objects in an unfamiliar environment.…
MuhsinFatih
- 221
- 2
- 5
11
votes
1 answer
Data preprocessing: Should we normalise images pixel-wise?
Let me present you with a toy example and a reasoning on image normalisation I had:
Suppose we have a CNN architecture to classify NxN grayscale images in two categories. Pixel values range from 0 (black) to 255 (white).
Class 0:
Images that…
lucasrodesg
- 235
- 2
- 7
10
votes
2 answers
Are there studies which examine dropout vs other regularizations?
Are there any papers published which show differences of the regularization methods for neural networks, preferably on different domains (or at least different datasets)?
I am asking because I currently have the feeling that most people seem to use…
Martin Thoma
- 18,630
- 31
- 92
- 167
10
votes
1 answer
What is fractionally-strided convolution layer?
In paper Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs, in Section 3.4, it said
Since, the aim of this work is to estimate high-resolution and
high-quality density maps, F-CNN is constructed using a set of
…
Haha TTpro
- 233
- 1
- 2
- 7
10
votes
2 answers
How can I detect blocks of text from scanned document images
ORIGINAL IMAGE:
GOAL:
I want to separate texts into individual paragraphs by placing bounding boxes over them (as shown above).
I tried it do this via traditional computer vision approach using opencv.
I plotted character level bounding…
DGS
- 291
- 1
- 3
- 7
10
votes
2 answers
How can I detect if an image was photoshopped?
I would like to check JPG files if they were manipulated to change the content.
What I consider NOT photoshopped:
Cropping
Rotating
(Scaling)
Image resolution
Automatic changes smartphones might make
What I consider photoshopping:
Adding a new…
Martin Thoma
- 18,630
- 31
- 92
- 167