1

I have created a training dataframe Traindata as following:

dataFile='/content/drive/Colab Notebooks/.../Normal_Anomalous_8Digits.csv'
data8=pd.read_csv(dataFile)

And Traindata looks like the following: Here Output is predicted variable which is not included in test data.

         Col1        Col2     Output
0      0.001655   0.464986      1
1      0.943110   0.902166      0
2      0.071235   0.674283      1
...      ...        ...         ..
1007   0.698048   0.058458      1
1008   0.289333   0.702763      1
1009 rows × 3 columns

Now the model is trained as following commands:

from pgmpy.models import BayesianModel, BayesianNetwork
from pgmpy.estimators import MaximumLikelihoodEstimator
model = BayesianNetwork([('Col1', 'Output'), ('Col2', 'Output')])
model.fit(data8, estimator=MaximumLikelihoodEstimator)

I have created a tested dataframe dataP by following method:

Col1=pickle.load(open('/content/drive/Colab Notebooks/.../col1.pickle', 'rb'))
Col2=pickle.load(open('/content/drive/ColabNotebooks/.../col2.pickle', 'rb'))
dataT=np.array([Col1[:179], Col2]).T
#Col1 has 184 rows while Col2 has 179 rows, so I reduced the rows of col1 to make concatenation easy.
dataP=pd.DataFrame(dataT, columns=['Col1','Col2'])
dataP.reset_index(drop=True, inplace=True)

I have a dataframe dataP as:

       Col1     Col2
0    0.832946  0.583372
1    0.783141  0.583948
2    0.745327  0.587644
3    0.762367  0.585629
4    0.783265  0.590721
..        ...       ...
174  0.686461  0.578358
175  0.689001  0.583951
176  0.683956  0.577511
177  0.687347  0.584231
178  0.695827  0.578313

[179 rows x 2 columns]

When I passed this dataframe to my model for prediction :

Y=model.predict(dataP)

It raises the following index error:

IndexError                      Traceback (most recent call last)
<ipython-input-16-489f2f25f1bc> in <module>()
 ----> 2 Y=model.predict(dataP)
5 frames
/usr/lib/python3.7/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

 IndexError: only integers, slices (`:`), ellipsis (`...`), 
 numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Then I checked the indices of dataframe as

print(dataP.index)

OUTPUT

RangeIndex(start=0, stop=179, step=1)

Then I check the datatype of my index as

dataP.index.is_numeric()
dataP.index.is_integers()

In both of above cases, it gives

TRUE
TRUE

Now if the indices of dataframe dataP is integers, then why it is raising such an error. Kindly guide me in this respect.

Regards,

TariqS
  • 21
  • 3
  • Have you tried resetting the index using [`reset_index`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html)? – Oxbowerce Mar 28 '22 at 15:15
  • Yes! I have tried. Still the same error raises. – TariqS Mar 28 '22 at 15:24
  • Which library and learning method do you use, is it sklearn? Do you pass the features directly as a pandas dataframe? The problem is likely about how `dataP` is created, but the code doesn't show it. – Erwan Mar 28 '22 at 17:38
  • In dataP, I have combine two separate arrays col1 and col2. And then convert it to dataframe. And model is created by bayesianNetwork. I have edited the question above. kindly review it. – TariqS Mar 29 '22 at 05:30

0 Answers0