These are the experiments for The Neural Conversation Model carried out during Spring 2016, CMU. The (private for now) Github repo is maintained here.

This is the model used for the experiments. The datasets have been developed by me.

Idea

The problem at hand is to model dialogs so that they appear human like. When the system communicates with a person through text, it learns about their knowledge and interests. By using Memory Networks built using LSTMs, a modified form of RNN’s, for the predefined toy tasks T ∈ {“where is” , “what is in”}, we can build a smart conversation agent with added functionality of learning information over the course of the dialogue towards a cognitive agent that is able to converse with a person.

Data

The data is in the form of a dialogue between the user and the system, and the system answers a question based on the context of all the previous information provided to it.

Consider a use case where of different buildings, where there are restaurants serving different food items. To make things simple and create sufficient training data consider a scenario of ten buildings, each one of them having ten different restaurants and ten different dishes.

Buildings Restaurants Foods
BUILDING1, BUILDING2, BUILDING3, BUILDING4, BUILDING5, BUILDING6, BUILDING7, BUILDING8, BUILDING9, BUILDING10 REST1, REST2, REST3, REST4, REST5, REST6, REST7, REST8, REST9, REST10 FOOD1, FOOD2, FOOD3, FOOD4, FOOD5, FOOD6, FOOD7, FOOD8, FOOD9, FOOD10

Creating the Dataset

The data set was created using different possible combinations of the above entities but only one-to-one mapping was considered between them. Also apart from the sentences having the answer noise statements were also added in the dialogue to simulate a real life scenario. The experiments are on Wh questions (i.e. Where is, What is in etc.) and Yes/No questions.

Types of Dataset

The following types of datasets were synthetically generated to carry out various experiments:

  • Randomized dataset with one, two and three noise statements, single turn dialogue and a Wh question.
  • Randomized dataset with one, two, three, four, five and six noise statements, two turn dialogue and Wh questions.
  • Randomized dataset with one, two and three noise statements, single turn dialogue and a Y/N question.
  • Randomized dataset with one, two, three, four, five and six noise statements, two turn dialogue and Y/N questions.
  • Randomized dataset with one, two and three noise statements, two turn dialogue and a combination of Wh and Y/N questions.

Examples of the above datasets are:

Randomized dataset with two noise statements, single turn dialogue and a Wh question
  
1 I am at BUILDING10
2 BUILDING10 has REST9
3 BUILDING1 has REST7
4 BUILDING5 has REST8
5 REST7 serves FOOD1
6 REST9 serves FOOD3
7 REST8 serves FOOD5
8 Which is the closest restaurant?
 
REST9

Randomized dataset with one noise statement, single turn dialogue and a Y/N question

1 I am at BUILDING4
2 BUILDING9 has REST2
3 BUILDING4 has REST5
4 REST2 serves FOOD2
5 REST5 serves FOOD7
6 Is REST2 closest to my current location?    

No

Randomized dataset with two noise statements, single turn dialogue and a Y/N question

1 I am at BUILDING3
2 BUILDING3 has REST2
3 BUILDING9 has REST4
4 BUILDING10 has REST9
5 REST9 serves FOOD6
6 REST4 serves FOOD3
7 REST2 serves FOOD8
8 Is REST2 closest to my current location?    

Yes

Randomized dataset with three noise statements, two turn dialogue and Wh questions

1 I am at BUILDING4
2 BUILDING8 has REST10
3 BUILDING10 has REST4
4 BUILDING4 has REST1
5 BUILDING6 has REST6
6 REST10 serves FOOD10
7 REST4 serves FOOD6
8 REST1 serves FOOD3
9 REST6 serves FOOD2
10 Which is the closest restaurant?

REST1

11 REST1 is closest to my current location
12 What does it serve?

FOOD3

Randomized dataset with three noise statements, two turn dialogue and Y/N questions

1 I am at BUILDING2
2 BUILDING1 has REST2
3 BUILDING3 has REST4
4 BUILDING9 has REST1
5 BUILDING2 has REST10
6 REST10 serves FOOD6
7 REST4 serves FOOD9
8 REST1 serves FOOD10
9 REST2 serves FOOD5
10 Is REST10 closest to my current location?

Yes

11 REST10 is closest to my current location
12 Does it serve FOOD6?

Yes 

Randomized dataset with a noise statement, two turn dialogue having Wh & Y/N question

1 I am at BUILDING8
2 BUILDING6 has REST1
3 BUILDING8 has REST4
4 REST4 serves FOOD2
5 REST1 serves FOOD10
6 Is REST1 closest to my current location?

No

7 REST4 is closest to my current location
8 What does it serve?

FOOD2

Randomized dataset with one noise statement, single turn dialogue and a Wh question
  
1 I am at BUILDING8
2 BUILDING9 has REST9
3 BUILDING8 has REST6
4 REST6 serves FOOD3
5 REST9 serves FOOD10
6 Which is the closest restaurant?

REST6

Training

The memory network is trained on a set of training stories. Each of the story enlists a set of factual statements, a question, its answer and the set of supporting factual statements that help answer that question. Using these as inputs, the network learns how to answer questions.

Consider the single conversation story:

1 I am at BUILDING6
2 BUILDING6 has REST9
3 BUILDING1 has REST2
4 BUILDING4 has REST10
5 which is the closest restaurant? REST9	 1 2

where:

Statement 1-4 represent facts. Statement 5 consists of a question, its answer (REST9) and the 2 supporting statements (1 & 2) that helped derive that answer.

Consider the double conversation story:

1 I am at BUILDING6
2 BUILDING6 has REST9
3 BUILDING1 has REST2
4 BUILDING4 has REST10
5 which is the closest restaurant?	REST9	 1 2
6 REST9 serves FOOD3
7 REST2 serves FOOD9
8 REST10 serves FOOD7
9 what does it serve?	FOOD3	1 2 6

The context for both the questions “which is the closest restaurant” and “what does it serve” is the same:

[I, am, at, BUILDING6], [BUILDING6, has, REST9], [BUILDING1, has, REST2], [BUILDING4, has, REST10], [REST9, serves, FOOD3], [REST2, serves, FOOD9], [REST10, serves, FOOD7]

Experiments

A few screenshots of the final web application. This is done by running the web application on the local host server.

Example 1

The question presented is “Does it serve food1”. To completely answer this question, the memory network must:

  1. Understand what “it” refers to
  2. Relate the entity “food1” to another entity iteratively in different networks (denoted by Mem 1, Mem 2 and Mem 3)

The different steps entailed in calculating the final answer are as follows:

  1. Based on all the sentences present in each compartment of each memory network, first find the most recent sentence, which is present at the last time stamp. This will give the entity that is related to “it”. a. The solution that the memnets find for this step is “rest8” with confidence value of 0.38 calculated by Mem 2. b. For the memnet, the question now becomes “Does rest8 serve food1”?
  2. Next, the memnet needs to find the truth in the question, it can either be yes or no, based on the confidence value of each. a. Mem 3 finds that “rest8 serves food8” with a confidence value of 0.37. b. Mem 1 finds that “rest2 serves food1” with a confidence value of 0.81. c. These three statements coupled together form a transitive relation, which outputs a “no”, which is the final result that we see.

Note that the sum of the confidence values for all the episodic memories in one memory is approximately equal to 1. This indicates the confidence of that memory in finding the current entity relation.

Example 2

The question presented is “Which is the closest restaurant”. To completely answer this question, the memory network must:

  1. Understand “which” refers to what entity
  2. Relate the entity “restaurant” to another entity iteratively in different networks (denoted by Mem 1, Mem 2 and Mem 3)

Note that the memory networks knows “restaurant” is an entity based on the training data and it is able to figure out how “closest” relates to “restaurant”. The network may not know what these entities actually are in the real world.

The different steps entailed in calculating the final answer are as follows:

  1. Based on all the sentences present in each compartment of each memory network, first find the sentence which gives the current location of the use. This will give the entity that is related to “closest”. a. The solution that the memnets find for this step is “building10” with confidence value of 1.00 calculated by Mem 2. b. For the memnet, the question now becomes “Which is the closest restaurant to building10”?
  2. Next, the memnet needs to find the truth value or the answer for the question, it needs to be a word, like “rest10” based on the confidence value of each. As mentioned above, the relation of “Restaurant” to “Rest” is learnt from the training data. a. Mem 3 finds that “building10 has rest10” with a confidence value of 0.98. b. These three statements coupled together form a transitive relation, which outputs the answer “rest10”, which is the final result that we see.

Results

  1. The system performs poorly when the number of irrelevant support sentences in the story are too high. These statements end up adding noise to our model. This could be solved by judging the relevance of a statement by classification. Another possibility is to increase the number of hops or the history limit upto which the memory network can remember the context.
  2. A question whose answer doesn’t lie in the context/story is difficult to handle. This could be possibly alleviated by adding more support statements related to the context.


blog comments powered by Disqus

Published

01 May 2016

Tags