Share

Find-S Algorithm Implementation in Python

Find-S algorithm tends to find out the most specific hypothesis which is consistent with given training data.

Let’s say we are given some training data X. Data instances in X are labelled as positive ( or yes, 1) and negative ( or no, 0).

Consistent Hypothesis

A hypothesis h(x) is said to be consistent with given training data

( \forall x \in X ) [ h(x) = t(x) ]

here, t(x) is the target value or the label.

Let’s understand above statement with the help of an example. We have following data

Example Sky Humidity EnjoySport
1 Sunny Normal Yes
2 Rainy High No
3 Sunny High Yes

Along with this data we have two hypothesis which we would like to check whether they are consistent with training data or not. Our hypothesis are

H1 : ( Sunny \wedge ? )

 H2 : ( Sunny \wedge High )

here ? represent any value for Humidity attribute

Hypothesis H1

First instance is having values Sunny and Normal. Our hypothesis H1 classifies an instance as positive ( yes ) if instance has first value Sunny ( it accepts any value for second attribute). So H1 will classifies first instance as positive ( yes) which is given as positive (yes).

Every instance will be processed in similar manner. Below table showing results of it. Hypothesis H1 has classified each instance correctly. Hence, H1 is consistent with training data.

Example Sky Humidity EnjoySport H1
1 Sunny Normal Yes Yes
2 Rainy High No No
3 Sunny High Yes Yes

Result : H1 is consistent.

Hypothesis H2

Hypothesis H2 classifies each instance as positive if that instance has values Sunny and High. If we look at the first instance, it has values Sunny and Normal. Hypotheis H2 will classify first instance as Negative or No whereas it is given as Yes. So this hypothesis is not consistent with given training data.

Example Sky Humidity EnjoySport H2
1 Sunny Normal Yes No
2 Rainy High No No
3 Sunny High Yes Yes

Result : H2 is not consistent.

Find-S Algorithm Implementation

Training Data

Example Sky AirTemp Humidity Wind Water Forecast EnjoySport
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes

Python Implementation

# Implementation Find-S
# Step-1  : Initiliaze h with most specific hypothesis in H
#           h will reject every instance

h=['phi','phi','phi','phi','phi','phi']   # 'phi' indicate no value is accepted. 'any' indicates every value is accepted.


# Training Data
Data=[['Sunny','Warm','Normal','Strong','Warm','Same','Yes'],
      ['Sunny','Warm','High','Strong','Warm','Same','Yes'],
      ['Sunny','Warm','Normal','Strong','Warm','Same','No'],
      ['Sunny','Warm','High','Strong','Cool','Change','Yes']
     ]

# Step-2 : Iterate over training data and replace constraints in hypothesis with more general constraints


'''
Function: isConsistent(hypothesis,data)
        This function check whether given hypothesis is consitent with given data instance.
        It compare each attribute's value in hypothesis with respective attribute's value in data.
'''
def isConsistent(h,d):
    # Check number of attribute is hypothesis is one less than number of attribute in data. 
    # Since one attribute in data is class attribute which is not considered in hypothesis.
    
    if len(h)!=len(d)-1:
        print('Number of attributes are not same in hypothesis.')
        return False
    else:
        
        # variable 'matched keeps number of attributes which are consistent in hypothesis.
        
        matched=0        
        
        # Iterate over each attribute in hypothesis
        
        for i in range(len(h)):
            
            # Check if attribute in hypothesis is equal to repsective attribute's value in data instance or 
            # it has 'any' value.
            
            if ( (h[i]==d[i]) | (h[i]=='any') ):
                
                # if condition is satisfied then increase matched
                
                matched=matched+1
        
        # Return true if for all attribute's value in data, hypothesis is consistent.
        
        if matched==len(h):
            return True
        else:
            return False

        
'''
Function: makeConsistent(hypothesis,data)
        This function change hypothesis to make it consistent with given data instance.
'''
def makeConsistent(h,d):
    
    # Iterate over each attribute in hypothesis
    
    for i in range(len(h)):
        
        # if ith attribute in hypothesis reject each value. 'phi' indicates that each value is rejected.
        
        if((h[i] == 'phi')):
            
            # Replace ith value in hypothesis with data instance's ith attribute value.
            h[i]=d[i]
            
            
            # if hypothesis ith value is not 'phi' and it is also not equal to ith value in data instance.
            
        elif(h[i]!=d[i]):
            
            # Replace ith value in hypothesis with 'any'. 'any' accept each value for that attribute.
            h[i]='any'
    
    # Return updated hypothesis
    return h




print('Begin : Hypothesis :',h)
print('==========================================')
# Iterate over each data instance in given training data

for d in Data:
    
    # Consider only positive instance ( instance with 'Yes' class)
    
    if d[len(d)-1]=='Yes':
        
        # Check whether hypothesis is consistent with current data instance
        
        if ( isConsistent(h,d)):
            
            # Print hypothesis
            
            print ("Hypothesis :",d)
        else:
            
            # If hypothesis is not consistent then make it consistent with current data instance
            
            h=makeConsistent(h,d)
    
        
        # Print current data instance and updated hypothesis
        print ('Training data         :',d)
        print ('Updated Hypothesis    :',h)
        print()
        print('--------------------------------')
    
print('==========================================')
print('End: Hypothesis :',h)

Output :

# Begin : Hypothesis : ['phi', 'phi', 'phi', 'phi', 'phi', 'phi']

Training data : ['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same', 'Yes'] Updated Hypothesis : ['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same']

* * *

Training data : ['Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same', 'Yes'] Updated Hypothesis : ['Sunny', 'Warm', 'any', 'Strong', 'Warm', 'Same']

* * *

Training data : ['Sunny', 'Warm', 'High', 'Strong', 'Cool', 'Change', 'Yes'] Updated Hypothesis : ['Sunny', 'Warm', 'any', 'Strong', 'any', 'any']

--------------------------------

End: Hypothesis : ['Sunny', 'Warm', 'any', 'Strong', 'any', 'any']

  • Divya

    hello,
    thank you for the code. but can u please help me to know what is the modification that i need to do if i import the data set from a csv file.
    converting Data variable defined in the program to list and array is not working.
    please do help me out