DMR: A Semantic Robotic Control Language

Sebastian Andraos1 1co(l)Labo [email protected]

DMR is a semantic robot-control language that attempts to change our relationship with machines and create true human-robot collaboration through intuitive interfacing. To this end, DMR is demonstrated in the DMR Interface, an Android app, which accepts semantic vocal commands as well as containing a GUI for feedback and verification. This app is combined with a robot-mounted 3D camera to enable robotic interaction with the surroundings or compensate for unpredictable environments. This combination of tools gives users access to adaptive automation whereby a robot is no longer given explicit instructions but instead is given a job to do and will adapt its movements to execute this regardless of any slight changes to the goal or environment. The major advantages of this system come in the vagueness of the instructions given and a constant feedback of task accomplishment, approaching the manner in which we subconsciously control our bodies or would guide another person to achieve a goal.

Keywords: Linguistics, Real-time control, Human-machine interaction, Robotics

Architecture is currently stagnating somewhere tural elements, although it should be noted that in between the digital and physical worlds; using our certain countries, like the UK and Australia, that is be- daily design tools we can fashion almost anything we coming less of a constant than it may previously have want out of thin air while the majority of our con- been. struction techniques have remained relatively un- A potential solution to bridge this gap may be changed for the past hundred years, presentation approaching in the form of architectural robotics. techniques are advancing with virtual reality and From heavy-duty industrial robots to drones, all game engines whilst we still produce two dimen- robots are actors in the physical world controlled en- sional drawings and details for building sites, and tirely by the same 1s and 0s that inhabit the digital we still value the production of a physical prototype one. This paper will focus predominantly on the con- much more highly than we do an algorithm or virtual struction side of architecture as given that the design model. There are obvious reasons for which this state phase has already begun to be upgraded through its of affairs has arisen: we are still relatively digitally illit- digitisation. erate, especially in the upper echelons, it takes time, Industrial robots are precise, strong, fast, robust, and therefore money, to adapt to new techniques have existed in some shape or form since the 1950s and technologies, and we have a workforce of skilled and are the de facto technology in manufacturing labourers that can reliably build standard architec- anything from mobile phones to cars but the con-

Fabrication - Robots - Volume 2 - eCAADe 33 | 261 struction industry still hasn't made this shift. Granted through improved human-robot interactions, allevi- there are more and more schools and researchers in- ating communication issues with semantic instruc- vestigating robots in construction but there are still tions, error correction through real-time control and three key problems that need to be addressed be- adaptability through a highly modular system archi- fore we can start bringing robots onto building sites; tecture allowing easy sensor integration. This solu- communication, correction and adaptability. We still tion is a language called DMR: Dear Mr. Robot. communicate with robots through code which, by any definition other than that of computer science, DMR is designed for secrecy not legibility, is typically writ- DMR is a context-sensitive language heavily derived ten a priori and stored within the robot's controller from English which ties in with a semantic parser to for repeated use rather than modification. Correction enable pseudo-natural language input. DMR is de- and adaptability are similar domains but differ in that signed to enable us to control robots in a similar way correction is taken to be the ability to avoid unfore- to that in which we subconsciously control our limbs seeable errors within a given action whilst adaptabil- or would teach/guide another person to achieve a ity is considered modifying an action based on exter- task. To that end it retains a high level of relativity, nal changes to the robot's environment. On a build- from the declaration of motion with directions and ing site we inevitably have a changing environment distances to the addition of conditionals such as "un- and, although the building site of the future may be til", and easily ties in with sensors to effectively give a increasingly factory-like, there will always be situa- robotic manipulator the sense of touch or sight. tions where we can't guarantee calibration with ab- solute certainty or where we would rather our ma- nipulator adapt to changes making for a much sim- LANGUAGE STRUCTURE (See Figure 1 for a summary of the structure) pler control system, effectively giving goals or tasks rather than explicit instructions. It is clear that cur- Keywords rent robotic technologies are no match for their hu- Keywords are the base of the DMR language and are man counterparts in unpredictable scenarios but this hard coded into algorithms used to interpret DMR. paper suggests a singular solution to these problems They are all based on existing English words but have Figure 1 Complete structure of the DMR Language showing the hierarchical nature of the language

262 | eCAADe 33 - Fabrication - Robots - Volume 2 been manipulated for use in the context of a robotic In the DMR Reference Manual keywords are pre- control language, including the addition of new parts sented as follows: of speech. DMR keyword selection is based on ordi- Left, Direction nary English definitions (the definition that is most WordNet Sense Key: adj1/20 commonly associated with the word) and the seman- Definition: being or located on or directed toward the tic familiarity. Keywords retain an unambiguous En- side of the body to the west when facing north glish definition, taken from Princeton's WordNet se- Pronunciation: Brit. /l￿ft/ mantic dictionary, but have newly assigned parts-of- Use: Describes the direction to the left-hand side of the speech, crucial to the processing of the language. robot or end-effector, from its point of view. On a world In order to understand a given instruction, a key- plane this would typically correlate to the negative x- word's typology is often more important than the axis. Opposite of Right. actual word itself e.g. knowing that we have a dis- Example: Move left 25 cm (Displaces the end effector by tance to move is more important than knowing the 25cm to the robot's left) exact distance until we actually have to execute the action. The parts of speech designed for DMR, as they Commands do in natural languages, denote the potential usage (Figure 2) of such a word in a more complex structure, com- Commands, as the next hierarchical level in DMR, are mands in DMR, its combinatory potential and its in- a set of six robotic actions which are identified to terchangeability. cover the vast majority of use cases; movement, reori- entation, tool actions, delays, stop/start and condi- Figure 2 tionals. Although this is clearly a somewhat reductive Example Command list it still, as it will be demonstrated, allows a great structures showing deal of user freedom whilst avoiding redundancy and all the possible over complication. Each command definition con- input and the tains a list of necessary parts of speech that must be alignment of a DMR filled for the command to be valid and also one of phrase within this optional parts that may be included to better con- structure trol the action. These lists are in constant flux dur- ing the parsing process as certain command types are overloaded, can perform a variety of roles according to their input, and therefore certain inputs open or block the possibility of adding other information, e.g. if one were to define a movement by the angle of the robot's joints it would be redundant to also define a distance or direction. A simple command declaration in DMR could re- semble: Move left 25 Or Activate gripper These can be extended to contain additional in- formation, such as speeds, reference systems and units:

Fabrication - Robots - Volume 2 - eCAADe 33 | 263 Left 25 cm relative to the gripper, at 0.5 m/s peated until we have what we may call a wall. That said the structure of DMR isn't intended for long re- Tasks peated actions. DMR is designed to simplify real- (Figure 3) time communication with a robotic manipulator and Once generated, commands can already perform all approach a state of cobotics, collaborative robotics, the actions afforded to a robot via DMR but defining which enables us to have almost natural interactions a robot's actions command by command is slow and with these manipulators for assistance in tasks we repetitive so DMR adds tasks to the top of the linguis- would find either tedious, strenuous or dangerous, or tic hierarchy, which can almost be considered con- to relieve us of increasingly complex programming tainers for commands or other tasks but also control to achieve tasks that we as humans have spent mil- their combination and sequencing. Tasks enable the lions of years evolving to do incredibly intuitively, like user to name sequences of one or more commands stack stability recognition. and are at the heart of the extensibility of DMR. Figure 3 A single task can be as simple as one move com- Example Task build mand and a name, for example a move that sets all up, from single the robot's axes back to 0 and is named "reset", but Commands to can become increasingly abstract and arborescent multiple, including with the inclusion of sub-tasks, variables and condi- sequencing and tionals. joining, through to Simple tasks could be setup as follows: variables and Reset: Set axes to 0 0 0 0 0 0 conditionals. Align: Left 25 cm then rotate down 90 ° But can be advanced with variables: Mill /Alpha (numeric): Left alpha cm then forwards 1. Right alpha cm then forwards 1 And conditionals: Touch: Reset, rotate down 90 ° and move forward 20 cm then move down 1mm until the force reads 1 As defined by Wahl and Thomas the goal of task- based robotics is to "relieve the from (...) coding every tiny motion/action"(Wahl and Thomas n.d.) and can be combined with sensors, either ex- ternal or connected to the robot, to create seemingly complex adaptive motions. One of the initial tasks implemented through DMR is referred to as "pass", takes two variables and contains a multitude of sub- Passing a user a brick is a menial job for a robot that tasks with their own variables and conditionals but can move faster than a human, with more precision allows the user to simply call "pass me a brick" and and whilst carrying much heavier loads but serves to have the robot search for, approach and pick up a demonstrate the semantic nature of the interactions. brick, then search for and approach the user's hand In a similar vein to this task we could easily foresee before aligning itself appropriately and releasing said robots passing palettes of bricks to users in locations brick. This could obviously be extended to placing that they could never have reached with such a load, the brick on top of or alongside another brick and re- performing high speed wiring or precise façade ele- ment installation.

264 | eCAADe 33 - Fabrication - Robots - Volume 2 PARSING SEMANTIC INPUT (See Figure 4 for a graphical representation of the The disadvantages of semantic languages are most following algorithm) prevalent during the parsing phase of any implemen- Assuming we begin with a phrase comprised ex- tation. Languages with formal grammars, including clusively of DMR keywords, the first step is the break- most computational languages, can be broken down down of this phrase into syntactically separated parts into a set of production rules defining that a symbol which define sequential actions and can be treated on the left can become the one on the right when as separate entities. Subsequently each keyword is generating valid sentences or right can become left replaced by its part of speech and will not be rein- when parsing. Noam Chomsky's hierarchy is a solid troduced until the final stage of the parsing because, basis to demonstrate this problematic and the rea- as previously noted, commands are predicated on sons for which semantics creates difficulties. In the types of values rather than the values themselves. Chomsky Hierarchy of formal grammars there are Once the parts of speech have been introduced they four tiers referred to, in order of increasing complex- are condensed as much as possible, i.e. a number ity, as Type-3, Type-2, Type-1, Type-0. Without going and a rotational unit are merged to become an an- into the language theory side of the Hierarchy, each gle, and we generate commands for all unambigu- level denotes a certain format that the production ous parts of the phrase, i.e. a distance can only be rules must follow and the necessary means of cal- part of a movement command therefore it gener- culation. Type-3 grammars, as the simplest, are cal- ates a movement command. With these unambigu- culable on a finite state automaton and require that ous commands in place we assign them limits within the left-hand side be a single symbol and the right which they can search for extra information, the first be at most a pair of symbols of which at least one command in a phrase will be granted access to any- must be terminal, i.e. not the left-hand side of any thing before it, the last anything after but those in the other production rule. Type-2 grammars are the ba- middle are given access to the parts of the phrase be- sis of most computer languages and are context-free, tween the preceding and following commands. This that is to say the left-hand sides of all the production search range is then used to fill the existing com- rules have only one symbol meaning the generation mands with as much information as possible until, as of a valid string in such a language is entirely unam- is very likely, we come across an element of ambigu- biguous and can therefore be calculated on a non- ity. The preceding parse phase is designed to speed deterministic pushdown automaton. the process up in cases where there is little or no am- DMR has a type-1, context-sensitive grammar biguity but once an ambiguous word is found, such meaning that the left-hand sides of the production as a second direction for a move command, the algo- rules may have multiple symbols, i.e. a rule may only rithm is locally modified, first ensuring that neither be applicable to a symbol if that symbol is preceded existing command with access to this word can use or followed by another. This conditional grammar it, and then generating its own command. Once this greatly increases the calculation complexity involved new command is in place it is given rights in the same in parsing a language to such a point as a Turing ma- way one of the unambiguous commands was previ- chine is necessary and to the extent that, depending ously and attempts to fill itself with information. This on the number of symbols in the language, it is un- is done recursively until we have created a maximum likely that the rules remain scriptable in their entirety. number of valid commands from the original phrase. This linguistic complexity means that standard pars- The only case not set out above is one where two ing strategies are no longer directly applicable and commands have access to a word that they can both therefore the DMR parser has a slightly unique setup use, a frequent occurrence with numbers as they are to balance speed and robustness. applicable to almost all command types. In these cir-

Fabrication - Robots - Volume 2 - eCAADe 33 | 265 Figure 4 A simplified version of the DMR semantic parsing algorithm following the breakdown of a valid yet ambiguous DMR phrase through all the necessary steps to produce machine executable commands.

cumstances we enter a command validation phase gram end-effectors and connect wirelessly to a robot where we weigh up the benefits for each command (currently only ABB robots are supported), supports from its integration of this value. The primary con- vocal input through Google's vocal recognition ser- dition is the validity of the leftmost command given vice. The interface supports two vocal input modes, that, semantically speaking, once a value has been offline and online, during which a user's instructions assigned to a subsequent command no other infor- are converted into DMR executable commands and mation will be added therefore if the leftmost com- tasks. In offline mode the user can dictate com- mand would be invalid without this information it mands, program tasks and modify these graphically is associated with the left-hand command. In more before saving them for future execution in online complex scenarios we prioritise certain contextual mode or, shortly, export these to ABB's RAPID lan- factors and the information already included in the guage for direct execution on a robotic manipula- surrounding commands such that the information is tor. During online mode the user has direct, real-time included in its most probable location. control of the robot and can therefore fully collabo- rate as well as receive audio feedback from command INTERFACE validations or other systems that require user alerts. To make DMR a fully intuitive language it has been There are clearly safety issues inherent with this built into an Android application, referred to as the sort of control but, as collaborative robotics becomes DMRI (DMR Interface), which, as well as allowing a more widely acknowledged field by manufactur- users to graphically setup commands and tasks, pro- ers, many of these are currently being integrated into new manipulators' hardware (YuMi, Baxter, etc.).

266 | eCAADe 33 - Fabrication - Robots - Volume 2 That said the most critical issue specifically related as spheres, planes, boxes and hands, but is also ca- to the execution of a manipulator via DMR is that pable of measuring these objects and, for the time of miscomprehension. Whereas with most means of being, making crude assumptions about the object programming almost all errors are user generated, type based on such observations e.g. a 7 cm x 10 cm with DMR through vocal input, there is a possibility x 21 cm box, through these eyes, is synonymous with that the words spoken are not those fed to the DMR a brick, but this purely geometric approach to object parser. A relatively small mistake in the recognition, vision enables other properties, such as colours, to be e.g. metres instead of millimetres, can have dras- properties of these objects allowing us to request a tic consequences in the execution and therefore the blue brick rather than just any brick. For the time be- DMRI includes a system of command validation be- ing this interaction is an approach, attack and tool ac- yond that included in the parser to ensure that any tivation to pick up the object but the same principles possibly misheard commands are apprehended. This could easily be applied to scenarios such as viewing system is still somewhat under developed but cur- a lever and activating it. rently works through a two stage process, the first The advantage of using DMR for an activity such ensures that the command is actually possible and as object retrieval is that, due to the relativity of its the second compares this action with those preced- movement, we never have to deal with absolute co- ing to ensure that any gross disparities are flagged ordinates and the robot is able to adapt to any mo- before execution, e.g. if the last few commands have tion of the target object. Indeed, the system is setup been dealing with single figure millimetres and the so that even if the targeted object disappears the one currently being checked is suddenly double fig- robot will automatically identify an appropriate re- ures and metres, this is flagged as a drastic jump and placement target. an option to revert to the previous units is offered. An extension to this system would ideally include inte- CONCLUSION grated feedback and a comprehensive vision system This paper proposes a novel means of real-time, including on-board and external cameras to ensure human-robot interaction through an intermediary that the motion is feasible and would modify the tra- language and demonstrates the effectiveness of such jectory of the robot whilst executing to ensure that it a solution with examples of its implementation. DMR remains safe. is shown to reduce the three problems identified with current robotic systems; communication, correction VISION INTEGRATION and adaptability, by enabling semantic input, real- To test the DMR Interface in a more complete archi- time control and an expandable system architecture. tecture and its capacity to connect to external sen- The most pressing issues with this system are sors a small vision server was developed based on those of safety and latency. As previously discussed PCL (Point Cloud Library) and a developer preview certain safety measures are in place to minimise the of Intel's RealSense F200 infra-red camera. This rel- risk of collisions or unintended movements but this atively simple setup is mounted on the robot's end needs to be developed further to ensure safe and ro- effector and effectively allows the robot to see ob- bust use of this system by untrained users. The issue jects and pass their relative positions and orienta- of Latency is likely to be resolved in the immediate tions back to the controller when requested. This al- future and stems from the vocal recognition portion lows the user to add object based variables to their of the calculation. If we break down the calculation tasks and ensures that the robot is capable of inter- time of a typical phrase; speech is in the 1-3s range, acting with them. The vision server implemented vocal recognition, both online and offline, is of the or- here is capable of seeing certain forms, such der of 1s, converting natural language to DMR takes

Fabrication - Robots - Volume 2 - eCAADe 33 | 267 Figure 5 Sequential views from the robot-mounted camera showing object identification, object selection and hand recognition for delivery of the chosen object.

between 0.1-0.4s according to the length and com- ture, with other hardware and software development plexity of the phrase, and the DMR parsing takes 0.05- to combat issues of on-site robotics such-as inter- 0.1s. Advancements in AI are going to enable much machine communication, ambient noise and multi- more responsive vocal recognition systems and nat- modal human-machine interfaces. ural language parsing but what is noteworthy here is the amount of time taken for the actual speech which REFERENCES is clearly the most time consuming part of the pro- Chomsky, N 1957 (2nd Ed. 2002), Syntactic Structures, De cess. The latency is currently perceptible but will nat- Gruyter Mouton urally be reduced as different elements of this pro- Feringa, J 2012 'Implicit Fabrication, Fabrication Beyond cess are individually developed. Craft: The Potential of Turing Completeness in Con- Although the focus here has been on construc- struction', Synthetic Digital Ecologies: Proceedings of the 32nd Annual Conference of the Association for tion, the intuitive extensibility of DMR gives it the po- ComputerAidedDesigninArchitecture, California Col- tential to expand into a multitude of other domains. lege of the Arts, pp. 383-390 Programming tasks with such ease could push DMR Mihalcea, and Radev, D 2006 'Graph-Based Algorithms into customisable domestic robots where making For Natural Language Processing and Information coffee in a new environment could be programmed Retrieval', Proceedings of the Human Language Tech- in a matter of seconds by a lay-person, while the nology Conference of the NAACL, New York City Searle, J 1980, 'Minds, brains and programs', Behavioural collaborative aspect of DMR could benefit artists or and Brain Sciences, 3, pp. 417-424 craftsmen with 3 dimensional moving and orientat- Wahl, FM and Thomas, U 2002 'Robot Programming - ing of an object in front of the user, intuitively, in- From Simple Moves to Complex Robot Tasks', Pro- stantaneously and hands-free, to facilitate work on all ceedings of First International Colloquium “Collabora- sides of an object from a comfortable position. tive Research Centre 562 – Robotic Systems for Mod- As DMR is a language, not a piece of software, it elling and Assembly”, Braunschweig, pp. 249-259 Watt, DA and Findlay, W 2004, is hoped that once a trial version of the current soft- Design Concepts, John Wiley & Sons, Chichester ware has been released to beta testers, developers will be able to integrate DMR into other systems, from drones to hoovers, growing an ecosystem of similarly controlled machines. The project remains under ac- tive development with specific focus on safety and robustness and will be joined, in the very near fu-

268 | eCAADe 33 - Fabrication - Robots - Volume 2