2

I keep trying to run a new set of data through my KNN Classifier but would recieve the message:

 ValueError: query data dimension must match training data dimension

It then used:

x_new = pd.read_csv('NewFeaturePractice.csv' , names = attributes)    
x_new = x_new.values.reshape(52,84) 

(which is the dimensions of the training data) but would then receive:

ValueError: cannot reshape array of size 672 into shape (52,84)

The second data set doesn't have the same amount of rows as the first meaning that even if I tried reshaping the array I would have several empty spaces. How can I run the code so that no matter the size of my new data set, I can still get results?

  • 1
    I don't know the exact problem but check the *columns*, not the rows: there's probably something different between the columns (features) used in the training data and the ones used in the test data. – Erwan Jun 20 '20 at 12:57

1 Answers1

0

Seems you are making two mistakes -
Reshaping dim. is mismatched with the size i.e. 10 != 3X3
Dim in train and test are different. Trained of 5 features and testing on 4 features.

What I will suggest -
You would be aware of your features count - let's say N
then reshape using reshape(-1,N). It will automatically decide the first dim.

x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
N=2
x.reshape(-1,N)
10xAI
  • 5,454
  • 2
  • 8
  • 24
  • Would I do this even though both the new data set and the training data set have the same amount of features? The only difference is the amount of samples/rows in the new and training data set. – LeeAnn Capistran Jun 22 '20 at 21:26
  • Yes, feature count is assigned to N – 10xAI Jun 23 '20 at 06:51