0

I have 2 classes model and impute. I am defining a function mode_impute inside impute. Now I want to call mode_impute inside impute. How can I call it? I tried the following:

class impute(model):
    
    def __init__(self):
        super().__init__()
        pass
    
    def mode_impute(self):
        mode_val = self.df6[self.var].value_counts().index[0]
        self.df6[self.var].fillna(mode_val, inplace = True)
        
    for i in ['MasVnrType', 'BsmtQual', 'BsmtFinType1', 'GarageType', 'GarageFinish']:
        self.mode_impute(self.x, i)

The above code gives me error NameError: name 'self' is not defined

EDIT 1:

I applied the changes as suggested in the comments:

class impute(model):
    
    def __init__(self):
        
        super().__init__()        
        for i in ['MasVnrType', 'BsmtQual', 'BsmtFinType1', 'GarageType', 'GarageFinish']:
            self.mode_impute(self.x, i)
        
    def mode_impute(self):
        mode_val = self.df6[self.var].value_counts().index[0]
        self.df6[self.var].fillna(mode_val, inplace = True)

m = impute()

The last line where I create an instance of the class gives me the error AttributeError: 'impute' object has no attribute 'x'

PS: I have just started learning OOP's for python so kindly explain your answer in a simple and easy to understand way. Thank you!

EDIT 2: Here is the model class:-

class model:
    
    def __init__(self):
        pass
    
    # LOAD THE DATA
    def load_data(self, file_name = 'train1.csv'):
        
        self.df = pd.read_csv(file_name, index_col = 0)
        self.df1= self.df.copy(deep = True)          
        print(self.df1.info())
        self.desc = self.df1.describe()
        self.nan = self.df1.isnull().sum()
        
        return self.df1, self.desc, self.nan
     
    # CLEAN THE DATA
    def remove_whitespace(self):

        whitespace_list = ['MSZoning', 'Exterior1st', 'Exterior2nd']
        for p in whitespace_list:
            self.df1[p] = self.df1[p].str.replace(' ', '')

    # FEATURE ENGINEERING
    def new_feature(self):
        self.df1['Age'] = (self.df1['YrSold'] - self.df1['YearBuilt']) + (self.df1['MoSold']/12)
        self.df1['Age'] = round(self.df1['Age'], 2)
        
        self.df1['FAR'] = (self.df1['1stFlrSF'] + self.df1['2ndFlrSF']) / self.df1['LotArea']
        self.df1['FAR'] = round(self.df1['FAR'], 2)
        
        self.df1['Remod'] = np.where(self.df1['YearRemodAdd'] == self.df1['YearBuilt'], 0, 1)
        

    # REMOVE REDUNDANT FEATURES
    def remove_features(self):
        nan_list = ['Alley', 'YrSold', 'PoolQC', 'MiscFeature', 'MiscVal', 'GarageYrBlt', 'YearBuilt', 'MoSold', 
                    '1stFlrSF', '2ndFlrSF', 'LotArea', 'YearRemodAdd', 'Street', 'Utilities', 'LandSlope', 
                    'Condition2', 'RoofMatl', 'Heating', 'GarageCond']
        self.new_df = self.df1.drop(nan_list, axis = 1)
    

    # SEPARATE X AND Y
    def x_y(self):
        self.x = self.new_df.drop(['SalePrice'], axis = 1)
        self.y = np.log(self.new_df['SalePrice'])
spectre
  • 1,831
  • 1
  • 9
  • 29

2 Answers2

0

Using self.mode_impute is indeed the correct way of calling the function inside the class. However the issue here is that your call is not part of a function, putting the for loop with the call inside a function (e.g. __init__) should solve the error as self is defined within the function (passed as the first argument).

Oxbowerce
  • 7,077
  • 2
  • 8
  • 22
  • See my updated question – spectre Dec 06 '21 at 04:56
  • You are referring to a class variable `x` using `self.x` which is not defined within your class. It therefore gives you the error that the `impute` object (i.e. your class) does not have an attribute called `x`. – Oxbowerce Dec 06 '21 at 08:05
  • But I am inheriting the class (`model`) which contains the variable `x` into `impute`. So I should be able to access the `x` variable! – spectre Dec 06 '21 at 08:25
  • Can you share the definition of the `model` class? – Oxbowerce Dec 06 '21 at 08:29
  • Updated the question – spectre Dec 06 '21 at 08:51
  • 1
    While you are defining the `x` variable within your `model` class this will only be set when the `model.x_y` function is called. When initializing the `impute` class you are only calling the `__init__` method from the `model` class, meaning that the `model.x_y` function is never called. As a result `self.x` is not defined. – Oxbowerce Dec 06 '21 at 08:56
  • Any possible solution you have in mind? – spectre Dec 07 '21 at 05:49
0

You can move the for loop calling the model_impute() method either to within your __init__() constructor or outside of the impute class. But I don’t see the dataset df6 defined anywhere.

So I would redesign things a bit. Create a fit_transform(X) method within your impute class. This takes in a X data frame from the user, saves it for the class instance & then populates missing values with the for loop invoking mode_impute().

eliangius
  • 331
  • 1
  • 4
  • See my updated question – spectre Dec 06 '21 at 04:56
  • 1
    Regarding your second point of creating a `fit_transform`, can you provide links to articles/blogs/videos that does this, because I am a newbie to OOP's – spectre Dec 07 '21 at 06:03
  • The question is too programming specific for the DataScience community but, the fit/transform design comes from the popular [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) library. I would strongly suggest looking if you can use its components to avoid re-inventing the wheel. Your `model` class seems like their `Pipeline` & your `impute` class seems like their `SimpleImputer`. – eliangius Dec 07 '21 at 14:18