Intelligent Personal Assistant Applications (IPAA) are growing in use and becoming essential parts of many people’s lives. IPAAs are designed to help humans with day to day tasks, queries and actions such as initiating a phone call or setting a task reminder.
Most of the popular personal assistant applications today are designed to interact with humans by carrying out commands or answer queries made by a user. The user may convey those commands or queries using natural language speech or text. This type of interaction is referred to as Human to Machine (H2M) interaction. A significant step towards integrating IPAAs further in human lives is enriching them with the ability to understand interpersonal conversation. A new type of IPAAs is aiming to help fulfill user requests that are conveyed in Human to Human (H2H) interactions. These requests often originate from other people interacting with the user (e.g. spouse, team member, friend, etc.), and are transferred through textual communications such as SMS or IM.
A typical example of a human-to-human request may be to meet someone from a specific location at a specific time: “Don’t forget to pick up Noah from school at 4 PM.” In this case, the IPAA’s task is to detect the semantic elements of the request such as who to pick up, from which location and at what time. Finally, the IPAA needs to create a reminder that prompts the user to action. Another example could be a request to make a phone call depending on a given condition: “Call me when you leave work, please.” In this case, the IPAA’s task is to detect the semantic elements: who to call and when, and create a corresponding reminder.
The Intel AI Lab team, in cooperation with Intel Labs, has developed a model for detecting the semantic elements of human to human requests. This model is based on the “intent_extraction” model published as part of Intel’s NLP Architect open source library. The output of this model is handled by the midu application, a personal time management reference app that was developed by Intel’s wearable software team but is no longer commercially available. Midu receives these elements from the human to human request detection through a dedicated API, and further resolve them to the actual places, events, activities, etc. These in turn are used in the application; they are added to the user’s timeline, and triggered as contextual reminders, in accordance with their semantic “meaning” as extracted and resolved through this process. Figures 1 and 2 show the H2H request comprehension process, expressed in the user experience as a timeline entry and reminder.
The fundamental challenge of IPAAs is to perform semantic elements resolution.
Although there is no off-the-shelf system that is 100% accurate, this challenge is widely addressed by H2M systems. H2H systems also share this challenge. Our work is built on top of H2M industry knowledge and practices in resolving semantic elements. However, understanding H2H requests raises three additional challenges:
|Sub Challenge||Example||The expected outcome|
|Past tense||“I just picked John up from school”||No need for action.|
|Question||“will you send the material?”||No need for action – just to answer a y/n question.|
|Negation||“please don’t send the material yet”||The request is to not perform an action.|
|Condition||“please pick John up if it rains”||The pickup request is dependent on the ability to identify the given condition. For example, by extracting the weather forecast for a given location at a given time using a weather API.|
|Semantic||“don’t forget to bring your thoughts”||“Thoughts” are not tangible therefore there is no need for action.|
The developed system is designed to overcome the H2H semantic comprehension challenges. The system includes 3 modules, with each module containing one or more blocks. Figure 3 illustrates the system’s architecture.
The following is a description of the system’s modules and their functionality:
The text normalization component is based on supervised Neural Machine Translation (NMT) in which the training data comprises pairs of informal text messages and their corresponding formal text messages. The inference stage inputs an informal text message and outputs the predicted formal form of this message.
To test the system, the Intel AI Lab team has assembled a dataset of 500 human-to-human messages. 385 out of the 500 messages include requests to perform actions whereas 115 messages do not include requests to perform actions. The messages were manually generated for the purpose of creating the dataset. The messages were also manually tagged. This dataset can be downloaded from the NLP Architect library, an NLP library we introduced in May 2018. Please note that this is a testing dataset, so in order to train a system, a training dataset should be assembled. Table 2 describes the dataset and its tagging.
|Tag Name||Tag Description|
|Request type||The type of request (e.g. send/update/submit/etc.)|
|Message||The textual message|
|Message direction||Message direction: incoming or outgoing message|
|Valid request||Indicates whether the message contains a request to perform an action (true) or not (false)|
|Subject head||The head of the subject phrase of the sentence|
|Subject NP||The noun phrase of the subject of the sentence|
|Direct object head||The head of the direct object phrase of the sentence|
|Direct object NP||The noun phrase of the direct object of the sentence|
|Indirect object head||The head of the indirect object phrase of the sentence|
|Indirect object NP||The noun phrase of the indirect object of the sentence|
The system’s evaluation with the above dataset included two sets of tests. The first test aimed to measure the quality of handling the challenges described in Table 1. Meaning, to what extent the model detects messages that include requests to perform actions and filters out messages that do not include such requests. Table 3 shows the request detection evaluation test results.
|Request Detection Evaluation|
We see that 89.7% of the messages that were classified by the system as including requests did in fact include requests to perform an action. This high precision rate is mainly achieved by the system’s ability to detect and filter out messages that are false positives.
The second test aimed to measure the quality of the semantic elements detection (i.e., the slot classification task). The system is configured to detect three main semantic elements: subjects, direct objects and indirect objects. For each of those elements the system extracts the head of the element. Table 4 shows the evaluation results of semantic elements detection in requests.
|Request Semantic Elements Detection Evaluation|
|Semantic Element||Precision[%]||Recall[%]||F1 score[%]|
|Direct Object Head||88.9||60.5||72|
|Indirect Object Head||73.8||53.4||61.9|
In this project, the request was extracted from a single message that included the
request predicate, i.e., call, bring, send, etc. For future work, we plan to include the
ability to extract a request from the full context of the H2H conversation. This
will enable the extraction of semantic elements that are related to the request but
are mentioned in other messages during the conversation. For example, a message
to pick up someone from a specific location may be followed by another later
message stating the pickup request time.
A large step towards integrating IPAAs more fully into human everyday lives is enabling them to understand natural human-to-human language. In this work, we focused on understanding human to human requests. Understanding such requests raises various natural language processing challenges. We showed that by mapping the challenges and designing a dedicated model for each challenge, it is possible to achieve high precision in detecting requests. This enables the incorporation of human-to-human request comprehension algorithms in next-generation IPAAs that result in autonomous creation of timeline reminders.
Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No product or component can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.
Intel, the Intel Logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.