Simple NLP - Template Matching

In this tutorial will explore the simple NLP technique of template matching in order to perform natural language understanding and a basic form of parsing.

As was the case with the last tutorial, lets set up our enviornment.

This time, we are going to be doing something a little different, we are going to be using regular expressions to parse natural language and get the logical information out. Unlike the previous tutorial, this process is slightly more exhaustive so it can take slightly more time to develop and deploy a system like this.

If you don't know what regular expressions are, take a look here. In general, regular expressions are a series of characters that are used for pattern matching in text.

Now, lets define some regular expression templates.

In [7]:
#Here we define the regular expression templates
EXPRESSIONS = [("what.*temp.*kitchen.*", ["kitchen", "temperature"]), 
               ("how.*hot.*kitchen.*",   ["kitchen", "temperature"]), 
               ("how.*cold.*kitchen.*",  ["kitchen", "temperature"]),
               ("what.*temp.*bath.*",    ["bathroom", "temperature"]),
               ("how.*hot.*bath.*",      ["bathroom", "temperature"]),
               ("how.*cold.*bath.*",     ["bathroom", "temperature"]),
               ("what.*temp.*living.*",  ["livingroom", "temperature"]),
               ("how.*hot.*living.*",    ["livingroom", "temperature"]),
               ("how.*cold.*living.*",   ["livingroom", "temperature"]),
               ("what.*temp.*family.*",  ["livingroom", "temperature"]),
               ("how.*hot.*family.*",    ["livingroom", "temperature"]),
               ("how.*cold.*family.*",   ["livingroom", "temperature"]),
               ("what.*temp.*bed.*",     ["bedroom", "temperature"]),
               ("how.*hot.*bed.*",       ["bedroom", "temperature"]),
               ("how.*cold.*bed.*",      ["bedroom", "temperature"]),
               ("what.*temp.*dining.*",  ["diningroom", "temperature"]),
               ("how.*hot.*dining.*",    ["diningroom", "temperature"]),
               ("how.*cold.*dining.*",   ["diningroom", "temperature"])]

#Here we define some indexes to keep track of the format of the expressions data-structure

Again, lets get some input to our system. In the same format as the last tutorial.

In [8]:
inputString = "How cold is it in the kitchen?" # The question that we want to answer
systemInput = {"question": inputString, 
			   "history" : []}                 # A simple datastructure for controlling that data

Next, we are going to write a function to clean the input, in a fairly similar manner as we did in the last tutorial. But this time, we arent going to split the string into words, we are just going to clean the original text.

In [9]:
def cleanText(inputString):
    #First we convert the input to lower case
    loweredInput = inputString.lower() 

    #Then we remove all the characters that are not alphanumeric, or spaces
    cleanedInput = ""
    for character in loweredInput:                   #For every character in the question that has been converted to lower case
        if(character.isalnum() or character == " "):     #Check to see if it is an alpha numeric character (A-Z, a-z, 0-9) or a space and if it is...
            cleanedInput += character                        #Then we add it to the cleaned input string, building it up character by character
        else:                                            #If it isn't alpha numeric, or a space
            pass                                             #Then we ignore it because we no longer need to keep track of it

    #This is what our input question looks like now...
    print("cleanedInput:", cleanedInput)
    #Finally, return the cleaned input
    return cleanedInput

Now that we have our cleaned text, all we need to do is see if it matches any of the templates that we have written. If it matches, then we know what the user is talking about (usually) and if we don't we can ask the user to rephrase their question.

Lets write a function to perform this task.

In [10]:
#We start by importing the python library for regular expressions
import re #You can find support for regular expressions in most modern programming langauges

def extractLogicalForm(inputString):
    #Next we are going to look through all the patterns that we have until we find one that matches a template
    extractedLogicalForm = None

    #For every regex pattern that we have
    for regex, logicalForm in EXPRESSIONS:
        compiledRegex = re.compile(regex)          #compile the pattern (convert it from a string to something python understands)

        result = compiledRegex.match(inputString)  #Check to see if the regular expression matches the question we have asked  

        if(result != None):                        #If there is a match...
            print("We found a match!")
            print("\tRegex:", regex)
            print("\tLogical Form:", logicalForm)
            return logicalForm         #Return the logical form so we are able to use it later

    print("We didn't find a match") #If we didn't find a match, say so
    return None

Next we are going to define a dummy function that returns a fake value when asked for a temperature for a specific room. This is the same dummy function from the previous tutorial. Again you can replace its contents with whatever contents is needed to actually look up the temperature.

In [11]:
def getAttributeValue(room, attribute):
    if (room != None and attribute != None):
        if(room == "livingroom" and attribute == "temperature"):
            return 72
        elif(room == "bathroom" and attribute == "temperature"):
            return 73
        elif(room == "kitchen" and attribute == "temperature"):
            return 81
        elif(room == "bedroom" and attribute == "temperature"):
            return 68
        elif(room == "diningroom" and attribute == "temperature"):
            return 79
            return 50
        raise Exception("There was an error parsing the sentence, got: " + str(room) + ", " + str(attribute))

Now that we have everything, we can plut it into our functions and get the answer that we want

In [12]:
rawInputString = systemInput["question"]
print("Got input:", rawInputString)

cleanedInput = cleanText(rawInputString)
extractedLogicalForm = extractLogicalForm(cleanedInput)

targetAttribute = extractedLogicalForm[LOGICAL_FORM_ATTRIBUTE]
targetRoom      = extractedLogicalForm[LOGICAL_FORM_ROOM]

print("The", targetAttribute, "in the", targetRoom, "is", getAttributeValue(targetRoom, targetAttribute))
Got input: How cold is it in the kitchen?
cleanedInput: how cold is it in the kitchen
We found a match!
	Regex: how.*cold.*kitchen.*
	Logical Form: ['kitchen', 'temperature']
The temperature in the kitchen is 81


You have just written a simple, template based, NLP system

But as was the case with the last tutorial you should be able to see how this system can fail. While this system is a bit more loose and is able to handle more varied types of queries, you need to try and enumerate every possible query that you can get before hand and write a regular expression for it. This can be somewhat daunting in a real world system. Again, spellings of words matter greatly, and if a word is incorrectly spelled the system isn't able to handle it.

How would you handle an input such as "How hot is it in the place where I typically cook?"