Statistical method to find the value which preserves the most information inside "most" of data points. (resize images to a common height)

Question

So I have this data of around 88K images and I found out some interesting properties for my images.

print(np.median(width),np.mean(width),scipy.stats.mode(width))
print(np.median(height),np.mean(height),scipy.stats.mode(height))

>>
1280.0 1266.8129869839922 ModeResult(mode=array([1280]), count=array([84584]))
377.0 438.3157888861602 ModeResult(mode=array([125]), count=array([3113]))

So I am resizing all of the images to width of 1280 because it'll preserve the images for most of the images when scaling up or down as all of three are same.

But what I want to know is that what should I do for the height so that it preserves most of the information. Or to rephrase, to which height I should resize my image so that I can preserve most of the information. In, my opinion, Scaling down is better than scaling up.

for q in [0.35,0.55,0.75,0.95,] :
    print(np.quantile(height,q))
>> 274.0
414.0
562.0
1057.0

Is there any statistical method so that I can find the appropriate range?

My height data is positively skewed and looks like:

Black line is the scipy.stats.norm

While it isn't asked, it is a very (impo) fun exercise, you might consider seam carving for your resizing. [Youtube video](https://www.youtube.com/watch?v=6NcIJXTlugc), [SIGGRAPH paper](https://www.win.tue.nl/~wstahw/edu/2IV05/seamcarving.pdf). Here is a tutorial on it [tutorial](https://www.geeksforgeeks.org/image-resizing-using-seam-carving-using-opencv-in-python/). The stats you gave are not for the blue line or bars, but something else, the black line perhaps. For the approach I wanted to use they are not really that useful. — EngrStudent, Jan 17 '21 at 00:37

Statistical method to find the value which preserves the most information inside "most" of data points. (resize images to a common height)

0 Answers0