0

There are 4 datasets (all in csv format), each has a uniqueID column by which each record can be identified. Image and text datasets are dense datasets.(need to be converted to ndarray).

Can someone suggest how to use all these 4 datasets for building a regression model?

This is how the datasets look,

Metadata having some input features and target variable(views)

uniqueID    ad_blocked embed  duration language hour views
     1        True     True    68        3      10   244
     2        False    True    90        1      15   63
     3        True     False   195       3      7    350

Vectorized title data - one entire row represents a title

uniqueID  title_1   title_2    title_3 
     1   -0.977637  -0.543310  0.079403
     2    0.041873   0.644655  -0.406487        
     3    0.503560  -0.085412  0.841144

Vectorized descriptions data - one entire row represents a description

 uniqueID  title_1   title_2    title_3 
     1   -0.052256  -0.016036  0.079403
     2    0.000106  0.356706  -0.025788
     3    0.015774 -0.085412   0.712229

Thumbnail pixel data - one entire row represents an image

uniqueID  image_1    image_2   image_3
     1   -0.484456  -0.543310  0.032915
     2    0.666147  0.644655  -0.005733
     3    0.035018  -0.011111  0.841144
desertnaut
  • 1,908
  • 2
  • 13
  • 23
Mathew
  • 31
  • 3
  • Whats is the columns count in each and also the total rows count? – 10xAI Mar 21 '21 at 14:48
  • Thanks for responding. Metadata has 3000 rows and 7 columns, Vectorized title data has 3000 rows and 50 columns, Vectorized descriptions data has 3000 rows and 50 columns, Thumbnail pixel data has 3000 rows and 4000 columns. – Mathew Mar 21 '21 at 20:32

1 Answers1

0

If we avoid a Neural Network,

  • You should reduce the dimensionality of images e.g. PCA [ Assuming no translational variance among images]
  • Can try the same for the other two datasets with 50 features
  • Then try the ML models

If that doesn't work, we may try CNN for the image data.
In that scenario, we will need a mixed architecture of simple Neural Network and CNN.

10xAI
  • 5,454
  • 2
  • 8
  • 24