Revolutionizing Personal Assistants Through Understanding Actionable Requests in Human-to-Human Interactions

Intelligent Personal Assistant Apps

Intelligent Personal Assistant Applications (IPAA) are growing in use and becoming essential parts of many people’s lives. IPAAs are designed to help humans with day to day tasks, queries and actions such as initiating a phone call or setting a task reminder.

Most of the popular personal assistant applications today are designed to interact with humans by carrying out commands or answer queries made by a user. The user may convey those commands or queries using natural language speech or text. This type of interaction is referred to as Human to Machine (H2M) interaction. A significant step towards integrating IPAAs further in human lives is enriching them with the ability to understand interpersonal conversation. A new type of IPAAs is aiming to help fulfill user requests that are conveyed in Human to Human (H2H) interactions. These requests often originate from other people interacting with the user (e.g. spouse, team member, friend, etc.), and are transferred through textual communications such as SMS or IM.

A typical example of a human-to-human request may be to meet someone from a specific location at a specific time: “Don’t forget to pick up Noah from school at 4 PM.” In this case, the IPAA’s task is to detect the semantic elements of the request such as who to pick up, from which location and at what time. Finally, the IPAA needs to create a reminder that prompts the user to action. Another example could be a request to make a phone call depending on a given condition: “Call me when you leave work, please.” In this case, the IPAA’s task is to detect the semantic elements: who to call and when, and create a corresponding reminder.

The Intel AI Lab team, in cooperation with Intel Labs, has developed a model for detecting the semantic elements of human to human requests. This model is based on the “intent_extraction” model published as part of Intel’s NLP Architect open source library. The output of this model is handled by the midu application, a personal time management reference app that was developed by Intel’s wearable software team but is no longer commercially available. Midu receives these elements from the human to human request detection through a dedicated API, and further resolve them to the actual places, events, activities, etc. These in turn are used in the application; they are added to the user’s timeline, and triggered as contextual reminders, in accordance with their semantic “meaning” as extracted and resolved through this process. Figures 1 and 2 show the H2H request comprehension process, expressed in the user experience as a timeline entry and reminder.

Figure 1: midu application used to analyze human-to-human textual messages, detect and understand the request and its semantic elements, and create a timeline reminder.

Figure 1: midu application used to analyze human-to-human textual messages, detect and understand the request and its semantic elements, and create a timeline reminder.

Figure 2 - Analyzing an outgoing message containing a bring request. Semantic resolution and contextual triggering is performed by Intel’s midu technology.

Figure 2 – Analyzing an outgoing message containing a bring request. Semantic resolution and contextual triggering is performed by Intel’s midu technology.

The Challenges of Understanding Human-to-Human Interactions

The fundamental challenge of IPAAs is to perform semantic elements resolution.
Although there is no off-the-shelf system that is 100% accurate, this challenge is widely addressed by H2M systems. H2H systems also share this challenge. Our work is built on top of H2M industry knowledge and practices in resolving semantic elements. However, understanding H2H requests raises three additional challenges:

  1. The first challenge is to convert the informal language of text messages to formal language. Text messages may include informal language such as acronyms, abbreviations and misspellings, as in the following informal text message:

    Plz pick John up B4 U arrive

  2. The second challenge is filtering out bot messages. In some text messaging systems, bots send automatic messages containing advertisements or reminders. Those messages may be falsely detected as requests from other humans, and need to be filtered out.
  3. The third challenge is detecting whether the message is indeed a request to perform an action. Since the vast majority of H2H messages are not requests to perform actions, the potential for falsely detecting action requests is high. Note that this challenge is currently bypassed by existing H2M systems that require the user to add a “wakeup word” before the command/request. The “wakeup word” is usually the system’s name, as in: “OK Google, please call John.” Future H2M systems may want to omit the requirement for a “wakeup word”, in which case they will face the same challenge as H2H systems in trying to detect whether the message is indeed a request/command. An effective approach to overcoming this challenge is to break it down to sub-challenges. Table 1 shows a breakdown list of these sub-challenges along with textual examples:
Sub Challenge Example The expected outcome
Past tense “I just picked John up from school” No need for action.
Question “will you send the material?” No need for action – just to answer a y/n question.
Negation “please don’t send the material yet” The request is to not perform an action.
Condition “please pick John up if it rains” The pickup request is dependent on the ability to identify the given condition. For example, by extracting the weather forecast for a given location at a given time using a weather API.
Semantic “don’t forget to bring your thoughts” “Thoughts” are not tangible therefore there is no need for action.

Table 1 – H2H request-detection challenges

System Architecture and Method

The developed system is designed to overcome the H2H semantic comprehension challenges. The system includes 3 modules, with each module containing one or more blocks. Figure 3 illustrates the system’s architecture.

Figure 3 – System Architecture.

Figure 3 – System Architecture

The following is a description of the system’s modules and their functionality:

  1. H2H preprocessing module: The text messages that are the input to this module undergo:

    • Text normalization for converting the informal language of text messages to formal language.
    • Bot filtering to filter out bot messages.

    The text normalization component is based on supervised Neural Machine Translation (NMT) in which the training data comprises pairs of informal text messages and their corresponding formal text messages. The inference stage inputs an informal text message and outputs the predicted formal form of this message.

  2. Semantic elements detection module (also called slot classification): This module’s goal is to detect the main semantic elements of a request: subject, direct object and indirect objects, time and location. This module extracts Part of Speech tags, word embeddings and character embeddings as input to a deep Bidirectional-LSTM neural network classifier.
  3. H2H validation module: This is a post-processing module that is designed to handle the main challenges of H2H comprehension. It verifies that the message is indeed a request to perform an action. This module includes components for validating the tense of the request as well as validating that the request is neither a negation nor conditioned and that it is not a question. In addition, the module includes a component for verifying that the request is semantically valid. This component utilizes Multi-Layer Perceptron (MLP) based Word Sense Disambiguation (WSD) for detecting the meanings of the extracted semantic elements.

Testing Dataset

To test the system, the Intel AI Lab team has assembled a dataset of 500 human-to-human messages. 385 out of the 500 messages include requests to perform actions whereas 115 messages do not include requests to perform actions. The messages were manually generated for the purpose of creating the dataset. The messages were also manually tagged. This dataset can be downloaded from the NLP Architect library, an NLP library we introduced in May 2018. Please note that this is a testing dataset, so in order to train a system, a training dataset should be assembled. Table 2 describes the dataset and its tagging.

Tag Name Tag Description
Request type The type of request (e.g. send/update/submit/etc.)
Message The textual message
Message direction Message direction: incoming or outgoing message
Valid request Indicates whether the message contains a request to perform an action (true) or not (false)
Subject head The head of the subject phrase of the sentence
Subject NP The noun phrase of the subject of the sentence
Direct object head The head of the direct object phrase of the sentence
Direct object NP The noun phrase of the direct object of the sentence
Indirect object head The head of the indirect object phrase of the sentence
Indirect object NP The noun phrase of the indirect object of the sentence

Table 2 – The dataset description

Experiments

The system’s evaluation with the above dataset included two sets of tests. The first test aimed to measure the quality of handling the challenges described in Table 1. Meaning, to what extent the model detects messages that include requests to perform actions and filters out messages that do not include such requests. Table 3 shows the request detection evaluation test results.

Request Detection Evaluation
Precision Recall F1 score
89.7% 68.7% 77.8%

Table 3 – Request Detection Evaluation Results

We see that 89.7% of the messages that were classified by the system as including requests did in fact include requests to perform an action. This high precision rate is mainly achieved by the system’s ability to detect and filter out messages that are false positives.

The second test aimed to measure the quality of the semantic elements detection (i.e., the slot classification task). The system is configured to detect three main semantic elements: subjects, direct objects and indirect objects. For each of those elements the system extracts the head of the element. Table 4 shows the evaluation results of semantic elements detection in requests.

Request Semantic Elements Detection Evaluation
Semantic Element Precision[%] Recall[%] F1 score[%]
Subject Head 86.9 61.6 72.1
Direct Object Head 88.9 60.5 72
Indirect Object Head 73.8 53.4 61.9
Average 83.2 58.5 68.7

Table 4 – Request Semantic Elements Detection Evaluation Results

Future Work

In this project, the request was extracted from a single message that included the
request predicate, i.e., call, bring, send, etc. For future work, we plan to include the
ability to extract a request from the full context of the H2H conversation. This
will enable the extraction of semantic elements that are related to the request but
are mentioned in other messages during the conversation. For example, a message
to pick up someone from a specific location may be followed by another later
message stating the pickup request time.

Conclusions

A large step towards integrating IPAAs more fully into human everyday lives is enabling them to understand natural human-to-human language. In this work, we focused on understanding human to human requests. Understanding such requests raises various natural language processing challenges. We showed that by mapping the challenges and designing a dedicated model for each challenge, it is possible to achieve high precision in detecting requests. This enables the incorporation of human-to-human request comprehension algorithms in next-generation IPAAs that result in autonomous creation of timeline reminders.

Notices and Disclaimers